Tesla's Autopilot Depends on a Deluge of Data

In 2019, Elon Musk stood up at a Tesla day devoted to automated driving and said, “Essentially everyone’s training the network all the time, is what it amounts to. Whether Autopilot’s on or off, the network is being trained.”

Tesla’s suite of assistive and semi-autonomous technologies, collectively known as Autopilot, is among the most widely deployed—and undeniably the most controversial—driver assistance system on the road today. While many drivers love it, using it for a combined total of more than 5 billion kilometers, the technology has been involved in hundreds of crashes, some of them fatal, and is currently the subject of a comprehensive investigation by the National Highway Traffic Safety Administration.

This second story—in IEEE Spectrum’s series of three on Tesla’s empire of data—focuses on how Autopilot rests on a foundation of data harvested from the company’s own customers. Although the company’s approach has unparalleled scope and includes impressive technological innovations, it also faces particular challenges—not least of which is Musk’s decision to widely deploy the misleadingly named “Full Self-Driving” feature as a largely untested beta.

“Right now automated vehicles are one to two magnitudes below human drivers in terms of safety performance.”
—Henry Liu, Mcity

Most companies working on automated driving rely on a small fleet of highly instrumented test vehicles, festooned with high-resolution cameras, radars, and laser-ranging lidar devices. Some of these have been estimated to generate 750 megabytes of sensor data every second, providing a rich seam of training data for neural networks and other machine learning systems to improve their driving skills.

Such systems have now effectively solved the task of everyday driving, including for a multitude of road users, different weather conditions, and road types, says Henry Liu, director of Mcity, a public-private mobility research partnership at the University of Michigan.

“But right now automated vehicles are one to two magnitudes below human drivers in terms of safety performance,” says Liu. “And that’s because current automated vehicles can’t handle the curse of rarity: low frequency, long tail, safety critical events that they just don’t see enough to know how to handle.” Think of a deer suddenly springing into the road, or a slick of spilled fuel.

Tesla’s bold bet is that its own customers can provide the long tail of data needed to boost self-driving cars to super-human levels of safety. Above and beyond their contractual obligations, many are happy to do so—seeing themselves as willing participants in the development of technology that they have been told will one day soon allow them to simply sit back and enjoy being driven by the car itself.

For a start, the routing information for every trip undertaken in a recent model Autopilot-equipped Tesla is shared with the company—see the the previous installment in this series. But Tesla’s data effort goes far beyond navigation.

In autonomy presentations over the last few years, Musk and Tesla’s then-head of AI, Andrej Karpathy, detailed the company’s approach, including its so-called Shadow Mode.

The back of a Tesla Model S seen in shadow
Philipp Mandler/Unsplash

In Shadow Mode, operating on Tesla vehicles since 2016, if the car’s Autopilot computer is not controlling the car, it is simulating the driving process in parallel with the human driver. When its own predictions do not match the driver’s behavior, this might trigger the recording of a short “snapshot” of the car’s cameras, speed, acceleration and other parameters for later uploading to Tesla. Snapshots are also triggered when a Tesla crashes.

After the snapshots are uploaded, a team may review them to identify human actions that the system should try to imitate, and input them as training data for its neural networks. Or they may notice that the system is failing, for instance, to properly identify road signs obscured by trees.

In that case, engineers can train a detector designed specifically for this scenario and download it to some or all Tesla vehicles. “We can beam it down to the fleet, and we can ask the fleet to please apply this detector on top of everything else you’re doing,” said Karpathy in 2020. If that detector thinks it spots such a road sign, it will capture images from the car’s cameras for later uploading,

His team would quickly receive thousands of images, which they would use to iterate the detector, and eventually roll it out to all production vehicles. “I’m not exactly sure how you build out a data set like this without the fleet,” said Karpathy. Amateur Tesla hacker Green told Spectrum that he identified over 900 Autopilot test campaigns, before the company stopped numbering them in 2019.

For all the promise of Tesla’s fleet learning, Autopilot has yet to prove that it can drive as safely as a human, let alone be trusted to operate a vehicle without supervision.

Liu is bullish on Tesla’s approach to leveraging its ever-growing consumer base. “I don’t think a small… fleet will ever be able to handle these [rare] situations,” he says. “But even with these shadow drivers, and if you deploy millions of these fleet vehicles, that’s a very, very large data collection. I don’t know whether Tesla is fully utilizing them because there’s no public information really available.”

One obstacle is the sheer cost. Karpathy admitted that having a large team to assess and label images and video was expensive, and said that Tesla was working on detectors that can train themselves on video clips captured in Autopilot snapshots. In June, the company duly laid off 195 people working on data annotation at a Bay Area office.

While the Autopilot does seem to have improved over the years, with Tesla allowing its operation on more roads and in more situations, serious and fatal accidents are still occurring. These may or may not have purely technical causes. Certainly, some drivers seem to be over-estimating the system’s capabilities, or are either accidentally or deliberately failing to supervise it sufficiently.

Other experts are worried that Tesla’s approach has more fundamental flaws. “The vast majority of the world generally believes that you’re never going to get the same level of safety with a camera-only system that you will based on a system that includes lidar,” says Dr. Matthew Weed, senior director of product management at Luminar, a company that manufacturers advanced lidar systems.

He points out that Tesla’s Shadow Mode only captures a small fraction of each car’s driving time. “When it comes to safety, the whole thing is about… your unknown unknowns,” he says. “What are the things that I don’t even know about that will cause my system to fail? Those are really difficult to ascertain in a bulk fleet” that is down-selecting data.

For all the promise of Tesla’s fleet learning and the enthusiastic support of many of its customers, Autopilot has yet to prove that it can drive as safely as a human, let alone be trusted to operate a vehicle without supervision. And there are other difficulties looming. Andrej Karpathy left Tesla in mid-July, while the company continues to face the damaging possibility of NHTSA issuing a recall for Autopilot in the US. This would be a terrible PR (and possibly economic) blow for the company, but would likely not halt its harvesting of customer data to improve the system, nor prevent its continued deployment overseas.

Tesla’s use of fleet vehicle data to develop Autopilot echoes the user-fueled rise of Internet giants like Google, YouTube, and Facebook. The more its customers drive, so Musk’s story goes, the better the system performs.

But just as tech companies have had to come to terms with their complicated relationships with data, so Tesla is beginning to see a backlash. Why does the company charge $12,000 for a so-called “full self driving” capability that is utterly reliant on its customers’ data? How much control do drivers have over data extracted from their daily journeys? And what happens when other entities, from companies to the government, seek access to it? These are the themes for our third story.

Source: IEEE Spectrum Computing

IEEE Student Branch

Tesla’s Autopilot Depends on a Deluge of Data