Best hardware and software for deepfakes

GPU and algorithm benchmarks for deep-learning based faceswaps

Contents

There’s a lot of hearsay about what deepfaking software is faster than another, how long it takes to train deepfakes, or how well different algorithms work, but there aren’t too many benchmarks or measurements. I tried to benchmark major pieces of the deepfaking workflow, including hardware performance, software performance, faceswapping quality, and face recognition accuracy. I also tried to detail my approach so others can reproduce or contribute their own numbers if they want.

So what is the best hardware and software for deepfakes? That’s a complicated question depending on your budget, interests, and so forth, but these benchmarks will help address that question.

GPU benchmarks

The availability of GPU benchmark data for deep learning is somewhat sparse. Furthermore, the specific details of the software and hardware implementation are important when discussing GPU performance, so comparing between different benchmarks is not easy either. For this reason, I decided to compare various hardware setups using deepfaking software directly.

Before I dive into the benchmarks, here’s a quick survey of some of the other deep learning GPU benchmarks and selection guides.

  • Tim Dettmer’s guide for choose a deep learning GPU. This is a pretty definitive article on the theory behind choosing a deep learning GPU. However, there is no actual data, and the performance numbers are all estimates using cryptocurrency mining data or other sources.
  • Slav Ivanov’s guide to picking a deep learning GPU. This is less detailed than Dettmer’s guide but constantly shows up in Google searches. The numbers in this article are pure theorycrafting based on the manufacturer’s hardware specifications.
  • Max Woolf’s GPU vs CPU cost efficiency benchmarking. Finally, here is some actual benchmarking data. I have two issues. First, Woolf focuses on cost here and in another followup. Cost is not our focus, but I will point out that Woolf’s cost analysis is outdated as it ignores preemptible GPUs. Second, as another commenter pointed out in the article, the benchmarks may be limited by other processes like data access. This just highlights the need for careful benchmarks. Still, the articles are worth a read.
  • Tensorflow’s official benchmarks. For cloud server GPUs, Tensorflow provides some benchmarks. This is assuming you can max out your GPU performance, of course, which, as we will see, is not always the case.

Method

I used two training data sets of about 1300 and 1100 images drawn from videos in the public domain of government officials President Trump and former Vice President Biden. The training data included a small number of images with permissions for reuse. I used a customized version of the face pre-extraction tool by Tyrannosaurus1234 to crop frames featuring the subject choice . I curated data sets mildly for quality; I left in some photos with microphone obstructions to see how they would be handled. I also did not try to match poses and colors, as I would ordinarily due when trying to faceswap a single target video. I used each software or workflow’s own processes to extract, train, and convert. Some implementations do not provide timing data, such as FakeApp. Others include the long initialization time in the timing calculations. To get around this, I measured iteration timings directly using a stopwatch, only using any data from the console screen as secondary confirmation. All training began from a null initial model.

I used the following hardware setups:

  • GTX 1050-2GB, i7, 8 GB RAM, Windows 10
  • GTX 1060-6GB, i5, 8 GB RAM, Windows 10
  • Google Cloud instance with 4 vCPUs, 32 GB RAM, 1/2 K80 board, SSD persistent drive, Windows 2016 Server
  • Google Cloud instance with 4 vCPUs, 32 GB RAM, P100, SSD persistent drive, Windows 2016 Server (when indicated, I tested 8 vCPUs as well)

Note that Google Cloud is not clear with their terminology for the K80. A K80 consists of two processors. One quota unit from Google is one processor, which is half of a K80 board.

I built faceswap and dfaker using Tensorflow 1.5 with CUDA 9 and cuDNN 7 from mid-February 2018 copies of the respective repositories using Python 3.6.

Faceswap training benchmarks

In order to include the GTX 1050 in the benchmarks, I used a modified LowMem setting provided by qzmenko in this GitHub thread with a batch size of 16. I used the original model (1024 nodes, 4 layers) where indicated with a batch size of 64 for the other GPUs. For the P100, I also tested a batch size of 256. I normalized the images processed per second by the batch size per iteration. If one batch of 16 is processed every second, that corresponds to 16 images processed per second. (I’ve left out the factor of two coming from face A and face B for simplicity, so this is really images processed per second per face.)

Images processed per second

gpu faceswap benchmarks

The benchmarks roughly match what you might expect. The K80 is disappointingly slow, but once you realize that you are actually getting half a K80, it makes more sense. Some in this Hacker News thread have expressed displeasure at Google for not making the half board situation more clear. Although the K80 is slower than the GTX 1060, speed is not everything. The K80 does come with 11 GB of memory, which means you can run larger models that would crash the GTX 1060.

The P100 has 16 GB of memory and can accomodate even larger models or batch sizes. However, batching to 256 images did not increase the effective image processing rate with faceswap.

To test if CPU usage was bottlenecking the performance, I also tested an instance with 8 cores instead of 4, but there was no significant change in the processing time. (The CPU usage was less than 100%, so this was obvious. This test was done mainly to serve as a comparison with the dfaker results.)

Faceswap GPU usage notes

Between the lowest end GTX 1050 and the highest end P100, there is only a factor of 3 difference, which doesn’t seem a bit less than I expected. For reference, the GTX 1050 has 640 CUDA cores and 112 GB/sec of memory bandwidth. Meanwhile the P100 has 3584 CUDA cores and 720 GB/sec of memory bandwidth. What I did notice is that the P100 GPU is not fully used during training with the faceswap scripts.

Here is the GPU usage for the K80 during a sample run:

K80 usage faceswap

Here is the GPU usage for the P100

p100 gpu faceswap

Unfortunately, I could not unlock the full potential of the P100 using the faceswap scripts. Faceswap does implement threading to keep the GPU busy with data, but that only seems to work with the slower graphics cards. Perhaps the CPU load cannot keep up with the GPU in extreme cases. Both the GTX 1050 and GTX 1060 show GPU profiles similar to the K80 at near 100% usage.

Dfaker training benchmarks

The dfaker model is much more memory intensive than the faceswap models, as it works with 128×128 images during its pipeline. That means at some point, there are tensors taking up four times as much memory. The GTX 1050 was not capable of running the dfaker training script even at a batch size of 2. The GTX 1060 could only run at a batch size of 16 or smaller. The K80 is stable up to a batch size of 32, and unstable at 64. The P100 is stable up to a batch size of 64, and unstable at 128. At the unstable settings, the script runs but keeps throwing out cuDNN error messages once in a while.

Images processed per second

dfaker benchmarks

The dfaker benchmarks follow the same trends as before, but most of the GPUs are barely limping along. The training rate is almost 10-fold lower for the GTX 1060, while the K80 is about 8-fold lower. The P100 manages to survive with only a 3-5 fold lower training rate. (This might make sense seeing as the P100 GPU usage was well below 100% for the faceswap case, so there was more to gain to by going up to 100% usage.)

Still, the absolute fastest training obtained with the dfaker model matches the absolute slowest training obtained with the faceswap model. This really highlights the computational cost of more complex models. I avoided testing the GAN models in part because I expect them to be even slower than the dfaker model. Also, I didn’t realize until I measured the numbers here that dfaker was that much slower.

Dfaker GPU usage notes

As noted before for the faceswap models, the P100 also fails to maximize GPU performance with the dfaker code. Shown here is the GPU usage for batch sizes of 16 and 64:

dfaker gpu use

However, I noticed that the CPU was at 100% usage across all four virtual cores. When increasing the number of virtual cores to eight, GPU usage reached its maximum along with a 50% performance boost in the training rate. Here is the GPU usage with eight cores and a batch size of 64:

dfaker 8 cores gpu

Now the P100 is over 6 times faster than the K80. A preemptible P100 with 8 cores and Windows currently costs $1.141 per hour on Google Cloud versus $0.427 for a preemptible K80 with 4 cores. Paying 2.5 times more for 6 times the training speed makes perfect sense. If we are talking about pricing, though, a GTX 1080/1080Ti likely pays for itself in the long run.

GPU conclusions

For faceswap scripts, a GTX 1060 (with 6GB) provides decent performance with enough size to hold decent models. Unfortunately, I did not have access to GTX 1080s for comparison at this time, but based on other benchmarks, they probably do not exceed the P100 by much. Of course, it would be interesting to try multi-GPU options, but if faceswap cannot use all of a P100, I am skeptical that it can make use of more GPU horsepower.

The point that stood out to me is how much GPU power you need to run the “2nd generation” deepfakes scripts. Dfaker already requires 10x more computational power. GANs must be even more expensive. But is it worth it? I”ll try to answer that question later.

Faceswap software benchmarks

The original deepfakes code has evolved into more complicated versions, but the original “1st generation” algorithm still remains the most popular. The faceswap GitHub repository is the main effort of the open source community based on the original algorithm, although it now includes GAN plugins (which we ignore for purposes of this post). In addition, a closed source version of the older code by /u/deepfakes was implemented into FakeApp 1.1 and 2.2. MyFakeApp by Radek is a more recent GUI-based implementation of the faceswap code.

Method

I ran all tests on a Google Cloud instance with half a K80 GPU board, 4 cores, and 32 GB RAM. I built faceswap with Tensorflow 1.4 or 1.5 as indicated, along with the appropriate CUDA and cuDNN libraries. I installed FakeApp 1.1 (Tensorflow 1.4), FakeApp 2.2 (Tensorflow 1.5), and MyFakeApp (Tensorflow 1.5) according to the developer’s instructions. I used models with 1024 nodes and 4 layers. The timing method is the same as before.

Training speed

FakeApp 1.1 and FakeApp 2.2 presumably have the same code except for an update to Tensorflow 1.5 from 1.4. To test the effect of Tensorflow upgrades, I also measured the training rate for faceswap with Tensorflow 1.4.0 and 1.5.0. Finally, the open source GUI MyFakeApp was includes as another comparison.

 

Images processed per second

deepfakes software benchmarks

Some observations:

  • Training shows a significant batch size effect, with faster effective training rates with a higher batch size. (Note that in the previous GPU benchmarks, I used a simplified model for the batch 16 case, so you can’t really compare the batch 16 and batch 64 results directly in that case.)
  • Tensorflow 1.5 with CUDA 9 and cuDNN 7 provides a 5-8% boost over Tensorflow 14 with CUDA 8 and cuDNN 6. Every little bit helps, but it’s not exactly earth-shattering or particularly noticeable.
  • Faceswap is 50-75% faster than FakeApp. I tested faceswap running under python and as a frozen exe package. Radek’s implementaton of faceswap shows the same result as well.

GPU usage notes

Both versions of FakeApp severely underuse the GPU, as is clear from their GPU usage plots. Here is the GPU usage for FakeApp 1.1:

fakeapp 1.1 gpu usage

Here is the GPU usage for FakeApp 2.2:

fakeapp 2.2 gpu usage

Contrast this with faceswap’s GPU usage on the same server during the same hour:

faceswap gpu usage

Note that the K80 GPU usage above is qualitatively different from the one shown under the GPU benchmarks setting. I only compared measurements from the same instance and the same time here, as there was some minor variation between instances, possibly due to someone else using the other “half” of the K80 board. The image processing speeds and main conclusions remained the same in all cases, although the precise shape of the GPU usage plots did slightly change at times. I also confirmed the low GPU use of FakeApp 1.1 on a desktop PC with a GTX 1060 and an i5.

Faceswap software conclusions

If you care at all about your training speed, use the faceswap scripts. Tensorflow 1.5 and CUDA 9 is not as important as simply using the optimized open source code. Of course, if you need a friendly GUI, you might prefer FakeApp, although Radek’s MyFakeApp is a faster alternative (Radek’s MyFakeApp is currently missing GPU support for face extraction, though).

Submit your own training rate

If interested, please submit your own training data benchmarks in the comments below or in the forums. It would be great to put together a community resource of hardware and software combinations, and the resulting training speed.

To report the training rate, you do not need to download the data set here, but you can just use your own projects.

Instead of reporting how long you trained your model, which is highly variable depending on your project and system, please report the number of images processed per second per face.

You can obtain this number by timing how long it takes for each step or iteration when the loss value updates. All of the open source implementations automatically give you this number. Make sure you check whether you are reading iterations/sec or seconds/iter, as the python package tqdm will report both.

Simply take the iterations per second and multiply by the batch size to get the images processed per second.

It would be helpful to also describe your setup like so:

  • Operating system:
  • CPU:
  • System memory (RAM):
  • Graphics card (exact model and VRAM):
  • Software and model/trainer type:
  • Batch size:

Algorithm quality benchmarks

While faceswap’s model with 1024 nodes and 4 layers is the most common standard, several other models are available, including the 128×128 output model by in the dfaker repository, and the GAN model in the shaoanlu repository (partially merged into faceswap). The GAN model is too expensive to test, so instead I performed a head-to-head comparison of the faceswap and dfaker models. This is really a test of first and second generation deepfaking algorithms. I used the main faceswap implementation, as the other derivatives are expected to be similar or worse in quality.

Method

As the two models have different loss functions, training speeds, and other implementation considerations, it would not be fair to compare a poorly trained model with a well-trained one. Instead, I decided to compare models by training on a comparable number of images. The same data set was used as before, consisting of a Trump to Biden swap using the full pipeline of each individual implementation on identical starting data sets. Trump to Biden is not an ideal swap as the face shapes are a bit different, but I didn’t want to use a test sample that was too easy.

I trained a faceswap model over three days on the standard K80 setup, which amounts to 11 million images processed. I trained a dfaker model on the K80 for 2 days and then moved to the 8-core P100 for 2.5 days after I realized training was too slow. This amounts to 8 million images processed. At this point, dfaker’s quality definitively exceeded faceswap even with fewer images processed, so I did not train further.

Since there is no ground truth to an image with a swapped face, I instead used the face_recognition library to compare a reference photo of Biden to Trump-Biden swaps and determine which model was closest to Biden. To do this, I used face_recognition’s built-in face distance function. One hundred images of Trump that had been randomly excluded from the training data were converted into Biden swaps. Another 30 images of Biden that had been randomly excluded from the training data served, one by one, as the reference photo to evaluate the 100 swapped images. This generated 100 x 30 = 3000 comparisons evaluating the closeness of the faceswap and dfaker results to a likeness of Biden.

Faceswap conversion settings were set to a 0 blur and 0 erosion kernel, as this provided the most accurate results according to the face_recognition library. For example, 0 blur and 0 erosion had an average distance to the reference photos of 0.50, while 10 blur and 10 erosion had an average distance of 0.52. The distance metric goes from 0 to 1, with 0 indicating perfect resemblance. The 0 blur and 0 erosion had the lower distance to the reference photos in 78% of head-to-head comparisons with blurred and eroded conversion. Dfaker has fixed conversion settings that make use of its own optimiztion.

Faceswap versus dfaker competition

The head-to-head comparisons of the converted faceswap and dfaker faces is summarized in the plot below. For each comparison, I subtracted faceswap’s distance to the reference from dfaker’s distance to the same reference. A positive value means that faceswap looked more like Biden, while a negative value means that dfaker looked more like Biden.

(dfaker distance to reference) – (faceswap distance to reference)

faceswap dfaker comparison

Dfaker wins 76% of the head-to-head comparisons. Dfaker also beats faceswap by a large margin for some cases. Face_recognition, in those cases, could not detect a face in the converted Biden image from faceswap. These were assigned a distance of 1 (the maximum allowed by face_recognition) from the reference photo, since the face was not even recognizable as a face. Examples images from such a case are further below. None of dfaker’s faces were undetected by face_recognition. Even if these faceswap “catastrophes” are ignored, dfaker still wins 75% of the head-to-head comparisons.

Training preview comparison

Dfaker not only provides a 128×128 output, but it also uses a larger portion of the face, which prevents “double eyebrows”, mouth truncations, and other issues. It also incorporates more facial shapes beyond the central eye-nose-mouth region. This disparity with faceswap is obvious when comparing the training previews, scaled to the correct relative proportions. Dfaker may have other optimizations in its use of face masking at the extraction, training, and conversion steps.

Faceswap and dfaker training previews, correct relative scale

OPEN IN NEW TAB TO VIEW FULL SIZE

training previews

You can open the image in a new tab to view it at full resolution. I saved the file as a jpg to reduce its size, but the quality is mostly preserved.

Converted examples

Note that I did not optimize the blur or erosion. I also did not use any third party software to smooth the face merging. I am only showing the raw output of each implementation, which may leave boundary boxes visible on the faces. However, you can still judge the quality of the faceswap.

Faceswap and dfaker examples

OPEN IN NEW TAB TO VIEW FULL SIZE

faceswap dfaker examples

The “double eyebrow” problem is noticeable for the faceswap examples and particularly grievous for the fifth image”, but absent for the dfaker examples. Dfaker’s large face seems to translate into larger features, such as the wider mouth in the third, fifth, and sixth images. Although subjective, overall, I feel that the dfaker results are more realistic looking and Biden-like, even disregarding the unoptimized blending visible in the last three photos.

Failed faceswap example

failed faceswap example

This image is one example where face_recognition could not identify a face for faceswap in the head-to-head comparison. The smaller face crop may have made the steep angle difficult to train in the model. Dfaker definitely looks more robust in this case.

Algorithm quality conclusions

It’s quite clear that the second generation algorithm dfaker is much better than faceswap. The larger face crop and higher output resolution are significant improvements. Of course, this comes at the cost of requiring about 10x more computational power, but in if you have the necessary resources, this is the obvious way to go. GAN models are still a bit under development, so in my opinion, dfaker is the best option for now for pure quality.

Face recognition benchmarks

(Note: iperov has ported the face-alignment library to faceswap. I performed all tests on the mid-Februrary 2018 commit as described above for the figures, but I did test the latest commit as well in updated notes below.)

There are two main libraries used for face recognition in deepfakes software. One is the face_recognition library by Adam Geitgey used in the main faceswap implementation. The other is the face-alignment library by Adrian Bulat used in the dfaker implementation and putatively in the FakeApp programs. These libraries, in turn, rely on other machine learning toolkits. Face_recognition uses dlib by Davis King, which is fairly easy to set up. Face-alignment uses pytorch, which has been ported to Windows with the help of Jiachen Pu. Faceswap also offers faster but less accurate face detection using the HOG model for those constrained to use only a CPU.

Clorr, one of the main contributors to the faceswap repository, first raised the issue of discrepancies between the different face recognition implementations. The focus was on an uncontrolled zooming effect, which I also demonstrate later. The results show that the two libraries actually have better performance in different scenarios when subject to testing.

Method

I prepared five data sets corresponding to edge cases for challenging scenarios using frames extracted from government videos in the public domain. The five scenarios are sharp face angles, dynamic scenes, bright backgrounds, night-time recording, and a stability test. I measured the speed of frame extraction, the total number of faces detected, and the number of incorrect extractions. For the stability test, I combined all views from the same camera angle into an animated GIF for qualitative assessment. I used a Google Cloud instance with half a K80 GPU board, 4 cores, and 32 GB RAM.

Face extraction speed

FakeApp CPU refers to the CPU setting for face extraction, while FakeApp GPU refers to the GPU setting. Faceswap CPU refers to its HOG face detector, while faceswap GPU refers to the full face_recognition face detector. Dfaker GPU refers to its use of face-alignment, which require commenting out a single line of code in the GitHub respository.

Images processed per second

face extract speed

Face_recognition, used in Faceswap, is the fastest implementation. Dfaker and FakeApp likely use the same face-alignment library, based on their common speed. Both versions of FakeApp have unusably poor performance under CPU mode. Note that the FakeApp CPU setting failed to detect most images at sharp angles or at night, skipping over the frames without full processing, which is why they appear to be faster for those two scenarios. The CPU setting for FaceSwap has reasonable speed.

More important than raw speed, though, is the accuracy of the face extraction, which is what I measure next for the different edge case scenarios.

Update: The latest Keras port of the face-alignment module implemented in faceswap runs at about one-third the speed of Faceswap GPU above.

Sharp angle accuracy

This scene involved the speaker having his face tilted at an angle to the camera, reaching a full side profile in some frames.

Correct faces (blue) and incorrect extractions (red)

sharp angle accuracy

The CPU-based face detectors are a complete disaster in detecting faces at angles. The other face detectors have nearly identical performance with a 1% error rate. The dlib library is mentioned to have difficulty with sharp face angles, but I don’t see significantly worse performance compared to the pytorch implementation.

Below is an example of an incorrectly aligned upside-down extracted face next to successful alignments.

Sharp angle face extraction examples

sharp face angle

Winners: Any GPU-based method. Losers: Any CPU-based method.

Update: The latest Keras port of the face-alignment module implemented in faceswap generated 223 correct faces but with 0 errors, which would make it the winner.

Dynamic scene accuracy

This video clip involved camera motion, subject motion, and a variety of visually complex scenes. There were blurry moving faces, occluded faces, bright objects, multiple faces, frames with no faces, and other challenging elements.

Correct faces (blue) and incorrect extractions (red)

dynamic scene face extraction

The faceswap implementation of face_recognition is by far the most sensitive face detector and the most accurate with the fewest number of mistakes in complex scenes. Strangely, face-alignment performance in dfaker and FakeApp (presumed) is different, with FakeApp having a higher error rate. Faceswap’s CPU version also outperforms FakeApp’s CPU version for sensitivity and accuracy. There must be slightly different implementations of face extraction using the same base library.

In the example below, three errors from FakeApp are shown. The second picture on the top, which has very small face misaligned to the center, is counted as a mistake.

Dynamic scene face extraction examples

dynamic scene examples

The face_recognition library in faceswap pulled out occluded faces that were missed by the other detectors, as shown in the examples below. While training on occluded faces is not recommended for standard modules, the newer GAN-based methods capable of handling facial obstructions may require this level of sensitivity. (There is another MTCNN face detector that I did not use.)

Occluded face extraction examples

dynamic scene face extraction

 

Winner: Faceswap GPU (face_recognition). Losers: All FakeApp methods.

Update: The latest Keras port of the face-alignment module implemented in faceswap generated 463 correct faces but with 33 errors, which would make it the most sensitive method, but also the most error-prone.

Bright background

This video had the speaker standing in front of a whiteboard coupled with some minor motion.

Correct faces (blue) and incorrect extractions (red)

bright background results

Faceswap and face_recognition easily win this round with the highest sensitivity and zero errors. Dfaker and FakeApp GPU methods, which are supposed to use the same face-alignment face detector, again inexplicably diverge, with FakeApp showing much worse performance due to its higher error rate.

FakeApp frequently fails with the uncropped black border images, misidentifying neckties or other inanimate objects, and producing upside down alignments, as shown below. Dfaker primarily has upside down alignments within its error examples.

Bright background face extraction examples

bright background examples

Winner: Faceswap GPU (face_recognition). Losers: FakeApp GPU, CPU methods.

Update: The latest Keras port of the face-alignment module implemented in faceswap generated 579 correct faces but with 5 errors, which makes it the second best method.

Night scene

This video shows an interview in a pitch-black night time scene. The scene starts with a harsh glare on the subject’s face and later progresses to a very dark and dim image.

Correct faces (blue) and incorrect extractions (red)

night scene results

The FakeApp GPU face detector wins this round, with the highest sensitivity and only one error. The CPU-based face detectors miss the vast majority of the faces and again demonstrate their lack of sensitivity to difficult conditions.

The images below show the sole error from the dfaker face detector.

Bright background face extraction examples

night scene face extraction

Winner: FakeApp GPU. Losers: All CPU methods.

Update: The latest Keras port of the face-alignment module implemented in faceswap generated 394 correct faces but with 0 errors, tying for first place.

Stability test

The video clip consisted of a sitting speaker with two fixed camera angles. The frames from a single camera angle were combined. All face detector extracted 233 faces with zero errors, so I am not showing that plot. Instead, I show the animated GIFs that highlight the stability of the different algorithms.

To avoid having too many GIF files slow this page’s loading times, I am only showing several representative examples. The faceswap GPU method produces zooming effects with noticeable flashes at different zoom factors, especially in the latter half of the GIF. The other methods are all relatively similar, with very weak zooming effects in the worst case. FakeApp 2.2 is perhaps just slightly worse than the others.

Stability test results

 faceswap gpufaceswap cpufakeapp 2.2 stabilitydfaker GPU

Faceswap GPU, Faceswap CPU, FakeApp 2.2 GPU, Dfaker GPU

Winners: All but faceswap GPU. Loser: Faceswap GPU (face_recognition).

Face recognition conclusions

The different implementations perform differently under different situations. Having the option to use both may be ideal. Face_recognition seems to perform better under difficult conditions, but in simple scenes where face stability is paramount, face-alignment is likely better.

Update: The face-alignment update to the main faceswap repo is probably the best general face detector, although its error rate is slightly high. However, it quite slow. As this fork is constantly updating, this may change in the near future.

Data sets

If you would like to compare directly or reproduce the data for yourself, you can download all of the test images here. The images are derived from public domain government sources, as well as a small fraction of images from Google Image Search with settings to show images that may be reused.

Download link: https://mega.nz/#!jdBETbxR!elnft7bMsP7uTWAV4BKKz18o6wWaOCkTNRZNzTPjOwU

  • biden_train: training data set for Biden
  • trump_train: training data set for Trump
  • biden_references: references images for face_recognition classification/competition test
  • convert_targets: 100 Trump images to be converted to Biden for quality testing
  • sharp_angle, dynamic_scene, bright_background, night_scene, stability_shot: frames used for face extraction tests

18 thoughts on “Best hardware and software for deepfakes

    • Are all these implementations using the same resolution to train and output results?
      Or is there differences there too?

      • Only the dfaker repo uses a different resolution for the output. The input is still 64×64, but the network may have more information than a 64×64 output can hold (some technical discussion about this was lost to reddit).

  1. “You can obtain this number by timing how long it takes for each step or iteration when the loss value updates. All of the open source implementations automatically give you this number. ”

    Using 1.1 fakeapp I don’t see a “time” to process training. Or are you taking about using repos with command lines give you the info?

    Sorry, I may be confused. I don’t have enough will/free time to use the regularly updated code. I’m a simple guy…. give me the GUI and I go from there kind of person…

    • The repos with command line give you the timing information, like iterations per second, time elapsed, number of iterations passed, and so on (slightly different for each repo).

      Yeah, I know some people need a GUI. The open source GUIs should get better and better… just give a bit more time.

    • MSI Afterburner is a free tool to monitor your GPU. You an also control your power, temperature, and clock settings.

  2. In terms of output quality, it seems like dfaker crushed all the competitions, ignoring performances.
    Do you have any reason to go for others like faceswap?

    • Well, dfaker is still a bit less polished. You have to dive into the code to adjust things. There could be some bugs for difficult cases we haven’t seen yet. The face detection has been continually refined for faceswap, for example, and people are only now starting to try out dfaker. Dfaker may have a bug for very extreme angles like 90 degrees that still needs to be fixed. You have to play around with the code for custom merging, or use video editing software like After Effects.

  3. Really great to see a robust comparison being performed, I do also like the head-to-head comparison method could be useful in quantifying model changes in the future.

    • Glad you liked it! I hoped this would also bring more exposure to the dfaker model.

      Yeah, we need more good metrics. Have a few more things I would like to measure later related to training performance.

  4. I really like the benchmark. Can I repost it on my website and translate it into Chinese?
    I will note the original link

Leave a Comment