How Police Exploited the Capitol Riot's Digital Records

The group of well-dressed young men who gathered on the outskirts of Baltimore on the night of 5 January 2021 hardly looked like extremists. But the next day, prosecutors allege, they would all breach the United States Capitol during the deadly insurrection. Several would loot and destroy media equipment, and one would assault a policeman.

No strangers to protest, the men, members of the America First movement, diligently donned masks to obscure their faces. None boasted of their exploits on social media, and none of their friends or family would come forward to denounce them. But on 5 January, they made one piping hot, family-size mistake: They shared a pizza.

According to charging documents, at 10:57 that evening, a PayPal account registered to a Gmail address paid US $84.72 to Domino’s Pizza in Arbutus, Md. Minutes later, that email account received Venmo payments from users called Thomas Carey, Gabe Chase, and Jon Lizak. A separate Venmo email showed a payment from “Broseph Broseph,” a nickname of another friend, Joseph Brody.

After the horrific events of the next day, the Federal Bureau of Investigation swung into action. It served cell service and tech companies with geofence warrants—search warrants demanding details on every device and app active within a specified geographic area. One of these warrants, served on Google and covering the interior of the Capitol, showed that a device associated with the Gmail account in question entered the Senate Wing door at 2:18 p.m. on 6 January.

Connecting that Gmail account to a phone number and then to its owner, Paul Lovley of Halethorpe, Md., was just a matter of a few keystrokes on law-enforcement databases. All that remained was for an FBI agent on stakeout to observe Lovley taking out the trash one night and match his photo to one of a figure captured by Senate surveillance cameras during the riot. Lovley and his four compatriots were charged with a range of federal crimes in September 2022.

The riot was an unprecedented attack on American democracy, with thousands of citizens, most of them previously unknown to federal investigators, violently storming the seat of government. The resulting investigations were the largest in U.S. history, offering a snapshot of the rapidly evolving nature of law enforcement and how heavily it now relies on data provided, wittingly or not, by suspects themselves.

While it might seem as though the Capitol-riot investigations represent state-of-the-art digital forensics, “those surveillance technologies are being used in even minor low-level criminal cases across the country every single day,” says Jennifer Lynch, surveillance litigation director at the Electronic Frontier Foundation (EFF). “The FBI did not use anything new. They just used it at a much larger scale.”

IEEE Spectrum analyzed hundreds of criminal complaints and other legal filings from the Capitol attacks to understand that reach and scale, and to consider the legal and social consequences of the government’s power to delve into its citizens’ digital lives. That power might seem reassuring when applied to a mob intent on overturning a presidential election, but perhaps less so when brought to bear on people protesting, say, human-rights violations.

Social media provides clues for digital forensics

Police work has always involved the connecting of dots, whether photos, phone calls, testimony, or physical evidence. The 6 January investigation showed the power of seeking the digital connections between those dots.

Over the past two years, the U.S. Department of Justice and the Program on Extremism at George Washington University have made available thousands of legal documents about those charged in connection with the 6 January riot. Spectrum analyzed all those containing details of how alleged perpetrators were identified and investigated: 884 individuals by mid-December. Many were identified using time-honored techniques: Wanted posters remain a powerful tool, these days reaching a global audience via news organizations, the FBI’s website, and social media. Nearly two-thirds of all those people were first identified via tips from witnesses, friends, family, and other human sources. The FBI ultimately received more than 300,000 such tips.

But the ways in which those sources spotted the alleged perpetrators have changed enormously. Only a tiny fraction of sources were on the ground in Washington, D.C., on 6 January. And although some suspects were recognized in TV reports or news stories, most were spotted on social media.

In almost two-thirds of the cases, evidence was cited from one or more social-media platforms. Facebook appeared in almost half of all cases, cited 388 times, followed by Instagram and Twitter with a combined total of 188 mentions. But almost every major social-media app was mentioned in at least one case: LinkedIn, MeWe, Parler, Signal, Snapchat, Telegram, TikTok, even dating app Bumble and shopping-focused Pinterest.

Investigators immediately exploited the rioters’ use of Facebook. On the day of the attack, the FBI requested that Facebook identify “any users that broadcasted live videos which may have been streamed and/or uploaded to Facebook from physically within the building of the United States Capitol during the time on January 6, 2021, in which the mob had stormed and occupied the Capitol building.” Complying with this request was possible because Facebook records the latitude and longitude of every uploaded photo and video by default.

Facebook responded the very same day, and again over the next few weeks, with an unknown number of user IDs—unique identifiers assigned to accounts on Facebook and Instagram (which Facebook’s parent company, Meta, also owns). The legal documents suggest that about 35 rioters were identified this way, without first being named by witnesses. In many cases, the FBI then requested that Facebook send it the relevant images and videos and other account data.

Investigators gleaned further clues from many hours of professional news footage, as well as 14,000 hours of high-resolution video from dozens of fixed security cameras and 2,000 hours of video from body-worn cameras operated by police responding to the riot. Surveillance cameras were referenced in 63 percent of DOJ cases, open-source videos and social-media images in 41 percent, and body-camera and news footage each in about 20 percent of cases.

Processing these files involved a huge amount of human effort. The body-camera footage alone required a team of 60, who laboriously completed a 752-page spreadsheet detailing relevant clips.

Shortly after the 6 January riot, Spectrum reported on how automated image –recognition systems could be brought to bear on this flood of audiovisual information. The FBI assigned its FACE Services Unit to compare suspects’ faces with images in state and federal face-recognition systems. However, according to the legal documents, only 25 rioters appear to have been first identified through such automated image searches, mostly after comparisons with state driver’s license photos and passport applications.

Hoan Ton-That, CEO of Clearview AI, a face-recognition search engine that indexes 30 billion images from the open Internet, told Spectrum that the court filings do not necessarily reflect how often such technology was used. “Law enforcement don’t always have to disclose that they found a certain person’s information through facial recognition,” he says.

Photo of crowds of people standing in front of the U.S. Capitol Building. Some are holding flags.

Ton-That notes that Clearview’s algorithm is not yet admissible in court, and that any identification it makes from open-source imagery requires further vetting and confirmation. Without providing specifics, he suggested that Clearview’s system was used by the FBI. “As a company, it was gratifying for us to play a small role in helping apprehend people who caused damage and stormed the Capitol,” he told Spectrum. The Capitol riot wouldn’t have been the first time that such technology was applied in this way. Facial recognition was reportedly used to identify protestors at a Black Lives Matter event in New York City in 2020 and at similar protests across the United States.

Computers are generally much better at recognizing letters and numbers than faces; automatic license plate reader (ALPR) technology was cited in 20 of the DOJ cases. There are likely tens of thousands of fixed and mobile ALPR systems in the United States alone, at toll plazas, bridge crossings, and elsewhere, capturing hundreds of millions of car journeys each month.

How digital data makes it easier to connect the dots

A single stream of data may help a little, but the integration of many such streams can do wonders. Take the case of William Vogel. He was first named by a tipster who sent the FBI a Snapchat video filmed by someone, unpictured, inside the Capitol building. Sure enough, a Facebook account associated with the Snapchat account listed Vogel as its owner and included a cellphone number.

But maybe someone stole Vogel’s cellphone and his Snapchat login to shoot and upload the video. Vogel’s phone number led to an address in Pawling, N.Y., and to a car registered to Vogel. The FBI then logged on to ALPR systems across several states, revealing that Vogel’s vehicle had taken the Henry Hudson Bridge from the Bronx into Manhattan at 6:06 a.m. on 6 January, entered New Jersey at 7:54, and proceeded southbound through Baltimore at 9:15. The car made its return journey late that afternoon, eventually crossing back into New York a minute before midnight.

But, again, perhaps someone had borrowed Vogel’s car? Not according to an ALPR photo snapped in rural Maryland at 8:44 a.m. It shows a distinctive large red “Make America Great Again” hat on the car’s dashboard, just like one that Vogel was wearing when he was filmed on a news broadcast outside the Capitol later that day, and in a Facebook selfie.

“They’re trying to report me to the FBI/DOJ and put me away for 10 years for domestic terrorism, because of my Snapchat story,” Vogel complained later via Facebook Messenger, after admitting to a friend that he had in fact shot the Capitol video, charging documents allege. Vogel’s case goes to trial in February 2023, when he will face charges of violently entering the Capitol and disorderly conduct.

Investigators also homed in on people by looking at data from their cellphones. At least 2,000 digital devices were searched by the FBI for images, data, and messages. The FBI’s Cellular Analysis Survey Team is dedicated to locating cellphones based on which cell towers they access. Although the FBI got rough locations for about one-fifth of the Capitol-riot defendants this way, it’s too imprecise to reliably indicate whether someone actually breached the Capitol itself or remained outside the building.

Far more accurate are the geolocation data gathered by Google Maps and other apps, on both Android and Apple devices. By bolstering cell-tower data with information from nearby Wi-Fi routers and Bluetooth beacons, these apps can locate a target to within about 10 meters (better in urban areas, worse in the countryside). They can even work on phones that have been put in airplane mode.

Until the 6 January attacks, geofence search warrants served on Google—for example, by agents investigating a bank robbery—might produce just a dozen suspect devices. The Capitol breach resulted in 5,723, by far the largest such production. It took until early May 2021 for Google to hand over the data to the FBI; when it did so, the results were comprehensive. That data included the latitude and longitude of each device to seven decimal places, and how long it was inside the Capitol. After narrowing the results to only those most likely to have breached the Capitol, Google eventually delivered the names, phone numbers, and emails associated with the accounts—everything investigators needed to identify and track someone inside the Capitol that day.

And track they did. The legal documents indicate that the Google geofence warrants yielded more initial identifications—50 individuals—than did any other technology, and they were cited in a total of 128 cases. Investigators were able to match interior surveillance footage of one suspect, Raul Jarrin, with a photo he was taking on his Samsung cellphone at the exact same moment. They later acquired the photo from Google under a separate warrant. Jarrin was arrested in March 2022.

On top of the Google data, the FBI served geofence search warrants for anonymized location data from 10 data-aggregation companies. But none of these companies were cited in a criminal complaint, and there are no further details.

The EFF sees the tremendous scope and power of geofence warrants as
a bug, not a feature. “We believe that geofence warrants are unconstitutional because they don’t start with a suspect,” says Lynch. “They don’t rely on individualized suspicion, which is what’s required under the Fourth Amendment [to the U.S. Constitution]. In the January 6th context, it’s likely that there were many journalists whose data was provided to the police.”

Lynch points out that geofence warrants were also used to investigate possible arsons that occurred during protests over police brutality in Seattle, in 2020. Even though the fires were set at a known location at a known time, the warrants sought location data for all devices on an entire city block over a 75-minute period, during a Black Lives Matter protest. “I think that we would all agree that [the protest] was constitutionally protected First Amendment activity,” she says. “That information should never be in the hands of law enforcement, because it chills people from feeling comfortable speaking out against the government.”

Google told
Spectrum that it examines all geofence warrants closely for legal validity and constitutional concerns. It says it routinely pushes back on overbroad demands, and in some cases refuses to produce any information at all.

Geofences target places, not people—and that’s a problem

Of course, the idea of staking out a particular area for scrutiny is old hat. “Look at every car parked on Elm Street,” says the detective, in just about any procedural, ever. What’s new is the ability to survey any area immediately, easily, and over a wide range of databases—every phone call placed, car parked, person employed, credit-card transaction made, and pizza sold.

And indeed, the high-tech investigations around the Capitol breach went far beyond suspects’ phones to include Uber rides, users’ search history, Apple iCloud, and Amazon. The FBI noted that one suspect, Hatchet Speed, a U.S. Navy reserve officer assigned to the U.S. National Reconnaissance Office, had purchased a black face mask and black “Samurai Tactical Wakizashi Tactical” backpack on Amazon, both of which he was seen wearing in Capitol CCTV footage on 6 January. Speed was arrested in June 2022.

Illustration consisting of 3 black-and-white photos of people inside the U.S. Capitol building during the 6 January 2021 riot. Some of the faces are overlaid with a green mesh.
Gabriel Zimmer

Unsurprisingly, after the deadly riot, some of those present deleted their social-media posts, pictures, and accounts. One suspect threw his phone into the Atlantic Ocean. Annie Howell of Swoyersville, Pa., allegedly posted videos of her clashes inside the Capitol with law enforcement. According to her charging document, on 26 January 2021, Howell conducted a factory reset of her Apple iPhone, without backing up data from her online iCloud account. In a Facebook conversation with her father from her computer, he told her, “Stay off the clouds! They are how they are screwing with us.”

The legal documents allege that around 150 others also attempted to delete data and accounts. For many, it was far too late. “The FBI’s really good at finding information that’s deleted, because, as you might know, if you delete a text or an app on a cellphone, it’s not really deleted,” an FBI agent told a January 6 suspect during an interrogation, as reported in one court filing. Investigators were indeed able to recover chats, social-media posts, call records, photos, videos, and location data from many devices and accounts that suspects thought they had permanently consigned to the digital trash can. The FBI even used such efforts to identify suspects: It asked Google to single out those devices in the geofence warrant whose users had attempted to delete their location history in the days following the siege. That process netted an additional 37 people. In March 2022, Howell was sentenced to 60 days in jail.

Raising a hue and cry—digitally

Perhaps the biggest innovation in the 6 January investigations was nothing that law enforcement itself did, but rather the general public’s response. Using tools and processes pioneered by open-source investigation organizations like
Bellingcat, websites such as Jan6attack.com and Sedition Hunters provided a forum for ordinary people in the United States and around the world to analyze and speculate (sometimes correctly, sometimes wrongly) on the identity of rioters. The FBI cited such efforts in 63 legal documents.

Nonprofit investigative newsroom ProPublica became involved when a source provided 30 terabytes of video—over a million video clips—that had been scraped from the social-media network Parler. “One thing that was really helpful was that Parler wasn’t built very well,” says Al Shaw, deputy editor on ProPublica’s News Application Team. “There was all this metadata still attached to the files when they were leaked. We had geo information, what cellphone they were using, time stamps, and a bunch of other data.”

ProPublica filtered the videos by geolocation and other metadata, but soon realized that not all the data was accurate. So journalists went through videos manually to check that those that appeared to have been shot inside the Capitol actually were. ProPublica ended up with 2,500 videos that it could definitively place in the Senate complex on 6 January.

It quickly
published 500 of these videos online. Scrolling through the videos is like fast-forwarding through that chaotic day all over again. “One of the design ideas was, can we build a ‘sad TikTok’?” says Shaw. “It’s got a similar interface to TikTok or Instagram, where you’re seeing what’s going on generally in chronological order.” ProPublica’s videos were cited by the DOJ in at least 24 cases.

The remaining 2,000 Parler videos shot from 6 January are now languishing on ProPublica’s servers and could almost certainly help identify more rioters. And the hundreds of thousands of videos discarded in the filtering process could very well contain evidence of further crimes and misdemeanors, as could the thousands of unsearched smartphones and unscraped social-media accounts of other people who went to Washington that day.

But at some point, says EFF’s Lynch, we should ask what we’re really fighting for. “We could, of course, solve more crime if we let police into everybody’s house,” she says. “But that’s not the way our country is set up, and if we want to maintain a democracy, there have to be limits on surveillance technologies. The technology has advanced faster than the law can keep up.”

In practice, that means that some federal courts have found geofence warrants unconstitutional, while others continue to permit their use. Similarly, some jurisdictions are limiting the retention of ALPR data by law-enforcement agencies and the use of facial-recognition technologies by police. Meanwhile, though, private companies are mining ever more open-source images and location information for profit.

In the eternal struggle between security and privacy, the best that digital-rights activists can hope for is to watch the investigators as closely as they are watching us.

Source: IEEE Spectrum Telecom Channel

IEEE Student Branch

How Police Exploited the Capitol Riot’s Digital Records

Social media provides clues for digital forensics

How digital data makes it easier to connect the dots

Geofences target places, not people—and that’s a problem

Raising a hue and cry—digitally