Over 3.5 million documents, 180,000 images and 2,000 videos. The Jeffrey Epstein files, released by the US Department of Justice (DOJ) in several tranches, constituted a disclosure of rare magnitude. This trove of documents opened a window into the ecosystem surrounding a powerful, well-connected convicted child sex offender. 

The release offered journalists an opportunity to interrogate a sprawling evidentiary record and trace networks of access and influence stretching across politics, academia, finance and royalty.

So far, journalists have broken stories on Epstein’s connections to powerful figures such as Peter MandelsonNoam ChomskySteve BannonSultan Ahmed bin Sulayem, and many others. Yet those revelations account for only a small fraction of what the files contain. As reporting teams continue to excavate the archive, more disclosures are almost certainly to come.

But how are journalists identifying patterns of power and proximity in such a huge trove? What, precisely, are they looking for? And how do they search for it? 

To answer those questions, I spoke with five editors and newsroom leaders from the BBC, the New York Times, the Guardian, the Miami Herald and Bellingcat, who are coordinating coverage of the Epstein files in their newsrooms across multiple beats.

1. How to dig through three million files

One of the transparency requirements governing the DOJ’s release was that the files be publicly available and searchable at a basic level. But while the portal meets that threshold, it is not built for the needs of investigative journalists. 

The editors I spoke with agreed that the search function on the DOJ site is far from user-friendly, often difficult to navigate and sometimes nearly incomprehensible.

To solve this issue, newsrooms embarked on employing their own proprietary technology, sometimes with the help of AI, to interrogate the files more easily. At the BBC, Ravin Sampat, Executive News Editor for UK Content, explained that they extracted the released documents into their own database and built a custom search system because the official portal was too difficult to navigate.

“This really comes down to searching terms and being clever. You can’t rely on Boolean search here – it doesn’t exist – so you have to be strategic about keywords. We’ve found things by being a bit unconventional,” said Sampat. For example, instead of searching for generic words like ‘investment’ between business leaders,  we look for terms they would use less frequently to help narrow the search. That’s essentially how we’ve been approaching it.”

2. Searching in “a vast dark cave”

Shirsho Dasgupta, a data reporter with the Miami Herald, told me that they are now downloading each document and uploading it into two main tools: Google’s Pinpoint, which offers strong optical character recognition, transcription and searchable AI features, and Everlaw, a legal discovery platform which is more stable for document review. 

It was a reporter from the Miami Herald, Julie K. Brown, who broke the Jeffrey Epstein story back in 2018 and she is still guiding the newsroom towards what to look for in the files. 

“She has such deep subject-matter expertise, so she flags the angles she’s particularly interested in,” Dasgupta said. “For the rest of us, who may not know all the nuts and bolts of the case, we are approaching it more as a massive record dump and looking for unique stories within it.”

George Zornick, who works as US political enterprise editor for the Guardian, told me that they employed an internal system called Giant, which allows reporters to upload documents and use enhanced search tools, such as keyword searches and file-type filtering, to better organise and navigate the material.

It is up to each reporter how they choose to search through the files, Zornick said. Some use Giant while others might use AI tools or even venture through the DOJ website directly. Zornick does treat the use of AI with caution and thinks that using AI in such a legally and reputationally sensitive investigation carries significant risks since the technology can misinterpret context or overstate connections to Epstein, creating legal and editorial exposure. 

“There’s all kinds of risks with using AI, particularly in a reporting area that is so fraught," he said. “Given some of the limitations of AI, you want to use it just as a starting point to point you in the right direction of something that you then have to look at, verify, and report out.”

New York Times Senior Investigations Editor Kirsten Danis told me they’ve built a proprietary, highly sophisticated search tool with advanced features such as semantic search and labelling. They use AI as a tool to help them identify clusters, patterns and themes, but the brunt of the work is being done by a team of editors and reporters across beats and bureaus who have been preparing to dig into the files long before they were published. 

In a recent interview, the team at the Times explained how they used AI: they had engineers build AI-assisted tools that scraped DOJ search results into spreadsheets for rapid analysis and verification, and created a system to scan all three million pages. They also implemented semantic search and AI-powered tagging to surface concepts, categorize documents and extract text from images, audio and video. All in all, however, it is reporters, not the technology, who made final editorial judgments.

While AI is integral in easing the process of going through the files, the team at the Times has stressed its limits. While useful for processing and structuring large datasets, AI cannot determine newsworthiness and is prone to error and hallucination, particularly around sensitive issues like redactions. Reporters treat AI outputs as tips, relying on human judgment and verification to avoid misinformation and confirmation bias.

“There are dozens of terms we know to search for because this information has surfaced in other ways over the course of this long saga,” said Danis. “But the real challenge is finding the unknowns. You can’t find something in this vast dark cave without knowing that you’re looking for it.”

3. How to look for the unknowns 

To know what to look for, Danis told me that her team at the Times began by compiling a list of about 30 names and key questions they wanted to investigate, drawing on years of prior reporting on Epstein. 

Since they knew the data dump was coming, reporters collaborated with each other to generate and prioritise search terms in advance. They started with high-profile figures, looking for new information or details that might deepen, complicate or change their understanding of the story and the people involved.

The Guardian went through a similar process. Ahead of the release, Zornick said, they had held multiple planning meetings and created a detailed strategy document outlining what they expected to find. 

Reporters were assigned specific targets, such as searching for mentions of Trump or including variations like property names and addresses. Others focused on figures already known to be in Epstein’s orbit. Each reporter had clear marching orders and defined search tasks before the files were released, allowing the team to move quickly once the documents became available.

“Segmenting the reporters was a challenge,” said Zornick. “We are UK-based, so there was a lot of interest in which British figures might appear in the files, alongside a whole other set of US political implications. So people each had their own lanes.” 

The BBC is such a juggernaut broadcaster that Sampat said that they had to organise their reporting on the files by beats where specific teams focus only on their own beats.

For example, the Royals team focused on figures like former Prince Andrew; the Money and Work team investigated business leaders and transactions; and the Politics team looked into figures like Peter Mandelson. 

“That’s what you concentrate on. You don’t need to touch anything else. They only focus on those things,” said Sampat. “Now, if they find something that’s relatable to another team, they’ll pass that information.”

4. How online sleuths approached the story 

One of the purposes of the Epstein Files Transparency Act, the law passed by Congress mandating the release of records related to Jeffrey Epstein, was to make those files accessible to the public. That mandate has allowed not only journalists but also ordinary citizens to comb through the archive themselves, searching for connections and drawing their own conclusions.

In response to this interest, technologists have built open web projects that make the material unusually navigable. Jmail presents Epstein’s emails in a searchable interface that mimics a Gmail inbox; Jikipedia converts email data into searchable dossiers on named associates; and EpsteIn scans a user’s LinkedIn connections against names appearing in the released files. 

The creation of these tools and the scale of public engagement with them (Jmail has reported 25 million unique visitors since the most recent release) underscores the breadth of public interest about the case. Across social media platforms, users are also publishing their own findings.

One of those platforms is Discord, where the investigative journalism outlet Bellingcat hosts a dedicated channel on its server where its community discusses the files, exchange findings, flag documents and debate interpretations drawn from the archive.

Charley Maher, Bellingcat’s social media editor, describes their Discord server as a collaborative hub for open-source investigations across a range of global topics. This community of roughly 40,000 members operates with a high degree of autonomy: members themselves created the channel devoted to the Epstein files in order to organise and pursue their own document-based inquiries.

Bellingcat’s dedicated Epstein channel operates under strict guidelines. Members are prohibited from engaging in speculation or political debate, as the space is reserved exclusively for open-source research and factual information. Every piece of information shared must be backed up with a direct source link. Furthermore, the community enforces a strict no-doxing rule to protect individuals’ privacy. 

“People have been quite respectful. For example, a lot of the files that were released had child sexual abuse material in them so it’s really important that we avoid sharing that within the Discord and people have been really good about that,” said Maher. “Most of the discussion within the channel has mostly focused on collecting names that are repeatedly featured and looking at locations in the photos.”

While Bellingcat doesn’t necessarily dip into the Discord for tips and pitches, Maher told me that sometimes investigations arise from the server, not only from them but from other journalists who are also members of the community. However, the point of the channel is not to draw journalism from it, but to provide an avenue for individuals to hone their OSINT skills

“People come in with their own ideas and contribute in their own way. Sometimes we give them a little bit of guidance if they feel like they’re branching into troublesome territory in terms of what exists within the rules of the server. But we try to let people explore their own avenues of investigation. That’s the point of the server: that people have a space that has been eroded since Twitter kind of fell away and hasn't really been replaced,” she said. 

5. How a sleuth got Prince Andrew arrested

With more than three million files released, online sleuths and ordinary citizens have been able to scrutinise the archive for details journalists might have overlooked. 

BBC’s Sampat told me that one significant story  – the one reporting that Andrew Mountbatten-Windsor, the former Prince Andrew, appeared to share confidential documents with Jeffrey Epstein while serving as a trade envoy – actually originated from a listener who had examined the files herself and sent a tip to the newsroom. 

Earlier today Andrew was arrested on his 66th birthday on suspicion of misconduct in public office, precisely for allegedly sharing confidential documents with Epstein. The arrest took place at Wood Farm on the Sandringham estate. A few hours after the operation became public, King Charles III, Andrew’s older brother, published a brief statement. “The law must take its course,” he said. “As this process continues, it would not be right for me to comment further on this matter.”

In an interview conducted with BBC Radio 4 on 9 February, the listener said she had been searching for initials, shortened names and place references when she came across the email in question. At the same time, Sampat cautioned that social media is rife with misinformation about what the files do and do not contain, underscoring the need for careful verification.

6. How to debunk Epstein’s deluge of falsehoods

The release of these files present the perfect storm for misinformation: emotionally charged documents, an unprecedented amount of public interest, and an overwhelming volume of material where the truth can drown long before claims are properly checked. 

Alongside real screenshots and emails, people are sharing fake, misleading, heavily manipulated or AI-generated content on digital platforms. Danis from the New York Times said the scale of the archive makes crowdsourced scrutiny almost inevitable, but that openness has fuelled widespread disinformation.

“There's a lot of disinformation out there, ranging from people just misinterpreting lines that are taken out of context to absolute AI-generated fakes,” she said. “It's frustrating for those of us who have spent a lot of time with this story. This was tailor-made for people to look around and see what they find, but the very unfortunate upshot is that a lot of people are using it for their own agendas.”

Zornick from the Guardian said he and his team take screenshots of emails circulating on social media, use them as a starting point to find the original document, build out the necessary context, and potentially report on it. He also stressed he thinks their job is not just to publish what is found on the emails at face value, but to conduct the necessary checks and balances to make a story reportable.

“It’s one of the craziest reporting challenges I have ever faced,” he said. “There’s such intense interest and it’s so politically contested. The legal and editorial risks are very high. If you are going to report that someone was associated with Jeffrey Epstein in any way, you must be very precise about the nature of that relationship and what's known and not known.”

7. What’s next for the Epstein files? 

At the time of this writing, less than a month has passed since the files were released and new revelations continue to emerge. But these stories represent only a small fraction of what the archive contains.

To make a story reportable, journalists must verify the material through external research. For example, confirming whether a meeting referenced in an email can be corroborated by flight logs or other publicly available records. That process is time-consuming, meaning even a single email or potential Epstein connection can require substantial reporting before publication.

“So many news organisations are covering this that being first isn’t always what matters. It’s not really about winning. It’s about what you can find and how careful and precise you can be, so it leads to another story or helps you chase one properly,” said Sampat from the BBC. “The way we’ve approached it, even though this is breaking news, is that slow and steady is actually important.”

For the Miami Herald, the newsroom that actually broke the Epstein story back in 2018, being first for this one is not necessarily the priority. Being a local paper, Dasgupta noted, coverage has been highly competitive, with their newsroom up against larger outlets which have more reporters and greater resources.

“Like with every story we do, we are trying to sort of move the needle a little bit,” said Dasgupta. “We are not looking to just report out what everyone else is reporting.”

The stories coming out of the Epstein files are far from over. Zornick from the Guardian notes that, because the files are so vast, newsrooms are still looking through data and uncovering new jigsaw pieces that connect to previous reporting or uncover new revelations on the Epstein universe. 

“I’d love to be first on every story coming out of three and a half million files, but that’s not realistic,” he said. “I’m confident there are still stories to be found – real pieces of news we can report out – but this is going to have a long tail.”

Meet the authors

Gretel Kahn

What I do  I am a digital journalist with the Reuters Institute's editorial team, mainly focusing on reporting and writing pieces on the state of journalism today. Additionally, I help manage the Institute’s digital channels, including our daily... Read more about Gretel Kahn