Spotting the deepfakes in this year of elections: how AI detection tools work and where they fail

The authors tested publicly accessible AI detectors in February 2024. Here they discuss these tools’ limitations and how to decide when to use them

When we lowered the resolution and edited out the end of the video clip of President Obama’s deepfake, it was detected as “not a deepfake”.

An AI detection tool analyses a famous clip of President Obama’s deepfake. Whenthe resolution is lowered and the end is edited out, the tool judges it as “not a deepfake”.

With the progress of generative AI technologies, synthetic media is getting more realistic. Some of these outputs can still be recognised as AI-altered or AI-generated, but the quality we see today represents the lowest level of verisimilitude we can expect from these technologies moving forward.

We work for WITNESS, an organisation that is addressing how transparency in AI production can help mitigate the increasing confusion and lack of trust in the information environment. However, disclosure techniques such as visible and invisible watermarking, digital fingerprinting, labelling, and embedded metadata still need more refinement to address at least issues with their resilience, interoperability, and adoption. WITNESS has also done extensive work about the socio technical aspects of provenance and authenticity approaches that can help people identify real content. See, for instance, this report or these articles here and here.

A complementary approach to marking what is real is to spot what is fake. Simple visual cues, such as looking for hands with anomalous hand features or unnatural blinking patterns in deep fake videos, are quickly outdated by ever-evolving techniques. This has led to a growing demand for AI detection tools that can determine whether a piece of audio and visual content has been generated or edited using AI without relying on external corroboration or context.

While not a perfect fit as a term, in this article we use “real” to refer to content that has not been generated or edited by AI. Yet it is crucial to note that the distinction between real and synthetic is increasingly blurring. This is due in part to the fact that many modern cameras already integrate AI functionalities to direct light and frame objects. For instance, iPhone features such as Portrait Mode, Smart HDR, Deep Fusion, and Night mode use AI to enhance photo quality. Android incorporates similar features and further options that allow for in-camera AI-editing.

This piece aims to offer preliminary insights on how to evaluate and understand the outcomes provided by publicly accessible detectors that are available for free or at low cost. Based on testing conducted in February 2024 of a sample of online detectors that included Optic, Hive Moderation, V7, Invid, Deepware Scanner, Illuminarty, DeepID and open-source AI image detector, we discuss where these tools may currently experience limitations, as well as what factors need to be considered when deciding to use them. It is important to acknowledge that, akin to generative technologies, the development of detection models is ongoing. Therefore, a tool's performance may vary over time: it might struggle to accurately identify certain manipulations at one point but excel in detecting others as it evolves. This dynamic nature underscores the challenges in synthetic media detection tools and the necessity of staying informed about their latest advancements.

How understandable are the results?

AI detection tools provide results that require informed interpretation, and this can easily mislead users. Computational detection tools could be a great starting point as part of a verification process, along with other open-source techniques, often referred to as OSINT methods. This may include reverse image search, geolocation, or shadow analysis, among many others.

However, it is essential to note that detection tools should not be considered a one-stop solution and must be used with caution. We have seen how the use of publicly available software has led to confusion, especially when used without the right expertise to help interpret the results. Moreover, even when an AI-detection tool does not identify any signs of AI, this does not necessarily mean the content is not synthetic. And even when a piece of media is not synthetic, what is on the frame is always a curation of reality, or the content may have been staged.

Most of the results provided by AI detection tools give either a confidence interval or probabilistic determination (e.g. 85% human), whereas others only give a binary “yes/no” result. It can be challenging to interpret these results without knowing more about the detection model, such as what it was trained to detect, the dataset used for training, and when it was last updated. Unfortunately, most online detection tools do not provide sufficient information about their development, making it difficult to evaluate and trust the detector results and their significance.

How accurate are these tools?

AI detection tools are computational and data-driven processes. These tools are trained on using specific datasets, including pairs of verified and synthetic content, to categorise media with varying degrees of certainty as either real or AI-generated. The accuracy of a tool depends on the quality, quantity, and type of training data used, as well as the algorithmic functions that it was designed for. For instance, a detection model may be able to spot AI-generated images, but may not be able to identify that a video is a deepfake created from swapping people’s faces.

Similarly, a model trained on a dataset of public figures and politicians may be able to identify a deepfake of Ukraine's President, Volodymyr Zelenskyy, but may falter with less public figures like a journalist who lacks a substantial online footprint.

Additionally, detection accuracy may diminish in scenarios involving audio content marred by background noise or overlapping conversations, particularly if the tool was originally trained on clear, unobstructed audio samples.

A screenshot of Putin’s deepfake was detected as likely to be real, using a detection tool trained for detecting AI images but not for spotting deepfake videos created by swapping people’s faces.

We tested a detection plugin that was designed to identify fake profile images made by Generative Adversarial Networks (GANs), such as the ones seen in This Person Does Not Exist project. GANs are particularly adept at producing high-quality, domain-specific outputs, such as lifelike faces, in contrast to diffusion models, which excel in generating intricate textures and landscapes. These diffusion models power some of the most talked-about tools of late, including DALL-E, Midjourney, and Stable Diffusion.

Detection tools calibrated to spot synthetic media crafted with GAN technology might not perform as well when faced with content generated or altered by diffusion models. In our testing, the plugin seemed to perform well in identifying results from GAN, likely due to predictable facial features like eyes consistently located at the centre of the image.

An AI image made with Midjourney was detected to “likely be a real person”.

An alternative approach to determine whether a piece of media has been generated by AI would be to run it by the classifiers that some companies have made publicly available, such as ElevenLabs. Classifiers developed by companies determine whether a particular piece of content was produced using their tool. This means classifiers are company-specific, and are only useful for signalling whether that company's tool was used to generate the content. This is important because a negative result just denotes that the specific tool was not employed, but the content may have been generated or edited by another AI tool.

These classifiers face the same detection challenges. For instance, adding music to an audio clip might confuse the classifier and lower the likelihood of the classifier identifying the content as originating from that AI tool. At the moment, no company publicly offers classifiers for images.

How can you trick a detection tool?

It’s important to keep in mind that tools built to detect whether content is AI-generated or edited may not detect non-AI manipulation.

Editing content: In May 2023, an image showing an explosion in the Pentagon went viral. Though quickly debunked, the image managed to cause a brief panic and even a dip in the stock market. News channels also picked up the story, reporting it as real news.

AI image detection tools were able to identify the image as AI-generated, but after cropping and scaling the right-hand side of the image, the detector suggested the content was real. A similar test was made with one of President Obama's famous deepfakes. After decreasing the resolution and editing out part of the clip, the result came as "No Deepfake Detected".

At the top: the image that was shared on social media. At the bottom: a cut out of the same image, with the bottom right of the original image cropped and scaled.

Creating versions: Online detection tools might yield inaccurate results with a stripped version of a file (i.e. when information about the file has been removed). This action may not be necessarily malicious or even intentional. For instance, social media platforms may compress a file and eliminate certain metadata during upload.

Even when the training sets may include cropped, blurred or compressed materials, the act of compressing, cropping or resizing a file can affect its quality and resolution and impact the detectors–partly because the training materials may not cover all options in which metadata stripping can occur.

Therefore, detection tools may give false results when analysing such copies. Likewise, when using a recording of an AI-generated audio clip, the quality of the audio decreases, and the original encoded information is lost. For instance, we recorded President Biden’s AI robocall, ran the recorded copy through an audio detection tool, and it was detected as highly likely to be real.

At the top, a recording of President Biden’s robocall detected as “highly likely real”. By contrast, at the bottom screenshot shows the results with a downloaded version of the file, detected as 74% as “likely to be fake”.

Similarly, taking a screenshot of an AI-generated image would not contain the same visible and invisible information as the original. We took screenshots from a known case where AI avatars were used to back up a military coup in West Africa. More than half of these screenshots were mistakenly classified as not generated by AI.

A screenshot with an AI avatar from one of the clips was detected as 70% human.

A screenshot with an AI avatar from one of the clips was marked with a 8.5% probability of being AI-generated.

Can these tools detect all types of AI generation and manipulation?

Openly available AI detection software can be fooled by the very AI techniques they are meant to detect. Here is how:

Style prompting: Detection tools are often trained on datasets that were created in a controlled lab environment, where the content is clear and organised. This allows the detection model to learn and identify the necessary features accurately during the training process. In real-world scenarios, though, images may be blurry and out of focus, videos may be shaky and tilted, and audio may be noisy. This can make it challenging for the detection tools to accurately identify and classify real life content.

While detection tools may have been trained with content that imitates what we may find in the ‘wild’, there are easy ways to confuse a detector.

For instance, we built the blur and compression into the generation through the prompt to test if we could bypass a detector. We used Open AI's DALL-E 2 to generate realistic images of a number of violent settings, similar to what a bystander may capture on a phone. We gave DALL-E 2 specific instructions to decrease the resolution of the output, and add blur and motion effects. These effects seem to confuse the detection tools, making them believe that the photo was less likely to be AI-generated.

AI-generated image of a speculated attack in a subway station, generated by DALL-E 2, and detected as not likely to be AI-generated.

AI-generated image of a fake explosion at the White House, generated by DALL-E 2, and detected with low confidence to be AI-generated.

Advanced editing: Text-to-image tools have opened the door to easy editing techniques that allow users to edit elements within the frame of an image, or to add new details outside the frame, regardless of the image being real or AI-generated in its origin. These are also known as ‘in-painting’ and ‘out-painting’ techniques, respectively. When they are applied to real images, online tools seem to fail at detecting these manipulations.

In this in-paint example, we substituted a car for a tank in a real image of Ukrainian refugees, using DALL-E 2 text-to-image editing. The new image with the tank came back as “not likely to be AI-generated”.

In-painting example using a real image showing Ukrainian refugees, adding a tank with DALL-E 2. Original photograph by Peter Lazar/AFP/Getty Images.

The image with the in-painted tank was detected as “likely not to be AI-generated”.

As an example of out-painting, we took a real image from the Gaza-Israel war and used DALL-E 2 to add the extra context of “smoke”. DALL-E 2 also extended the buildings in the image. The final output was detected as real.

Outpaint example using a real image showing soldiers in the Gaza-Israel war. Original photograph by Reuters.

Optic tool detecting the out-painted output as “likely human”. — This tool detects the out-painted output as “likely human”.

So what can journalists, fact-checkers and researchers do?

Although this piece identifies some of the limitations of online detection tools, they can still be a valuable resource as part of the verification process or an investigative methodology, as long as they are used thoughtfully.

It is essential to approach them with a critical eye, recognising that their efficacy is contingent upon the data and algorithms they were built upon. Additionally, it is important to keep in mind that, with the exception of audio content, most of the footage we see online tends to be mis-contextualized material (circulated with the wrong date, time or location) or images edited with software that has already been available for a few years.

The biggest threat brought by audiovisual generative AI is that it has opened up the possibility of plausible deniability, by which anything can be claimed to be a deepfake.

Detection tools should be used with caution and skepticism, and it is always important to research and understand how a tool was developed, but this information may be difficult to obtain.

As technology advances, previously effective algorithms begin to lose their edge, necessitating continuous innovation and adaptation to stay ahead. As soon as one method becomes obsolete, new, more sophisticated techniques must be developed to counteract the latest advancements in synthetic media creation. While more holistic responses to the threats of synthetic media are addressed across the information pipeline, it is essential for those working on verification to stay abreast of both generation and detection techniques.

It is also key to consider that AI technology is increasingly incorporated across all kinds of software, from phone cameras to social media apps; and it is also involved in multiple stages of production, from stabilising cameras and applying filters, to erasing unwanted objects and subjects from the frame.

AI may not necessarily generate new content, but it can be applied to affect a specific region of the content, and a specific keyframe and time. The complex and wide range of manipulations compounds the challenges of detection. New tools, versions and features are constantly being developed, leading to questions about how well, and how frequently detectors are being updated and maintained.

Security and ethical considerations matter too. Given the uncertainty surrounding the storage and use of analysed content by these platforms, it is vital to weigh the privacy and security risks, especially when dealing with material that could impact the privacy and safety of real individuals depicted in an image, audio or video.

In light of these considerations, when disseminating findings, it is essential to clearly articulate the verification process, including the tools used, their known limitations, and the interpretation of their confidence levels. This openness not only bolsters the credibility of the verification but also educates the audience on the complexities of detecting synthetic media. For content bearing a visible watermark of the tool that was used to generate it, consulting the tool's proprietary classifier can offer additional insights. However, remember that a classifier's confirmation only verifies the use of its respective tool, not the absence of manipulation by other AI technologies.

Join our free newsletter on the future of journalism

In every email we send you'll find original reporting, evidence-based insights, online seminars and readings curated from 100s of sources - all in 5 minutes.

Twice a week
More than 20,000 people receive it
Unsubscribe any time