To anyone living in a city where autonomous vehicles operate, it would seem they need a lot of practice. Robotaxis travel millions of miles a year on public roads in an effort to gather data from sensors—including cameras, radar, and lidar—to train the neural networks that operate them.
In recent years, due to a striking improvement in the fidelity and realism of computer graphics technology, simulation is increasingly being used to accelerate the development of these algorithms. Waymo, for example, says its autonomous vehicles have already driven some 20 billion miles in simulation. In fact, all kinds of machines, from industrial robots to drones, are gathering a growing amount of their training data and practice hours inside virtual worlds.
According to Gautham Sholingar, a senior manager at Nvidia focused on autonomous vehicle simulation, one key benefit is accounting for obscure scenarios for which it would be nearly impossible to gather training data in the real world.
“Without simulation, there are some scenarios that are just hard to account for. There will always be edge cases which are difficult to collect data for, either because they are dangerous and involve pedestrians or things that are challenging to measure accurately like the velocity of faraway objects. That’s where simulation really shines,” he told me in an interview for Singularity Hub.
While it isn’t ethical to have someone run unexpectedly into a street to train AI to handle such a situation, it’s significantly less problematic for an animated character inside a virtual world.
Industrial use of simulation has been around for decades, something Sholingar pointed out, but a convergence of improvements in computing power, the ability to model complex physics, and the development of the GPUs powering today’s graphics indicate we may be witnessing a turning point in the use of simulated worlds for AI training.
Graphics quality matters because of the way AI “sees” the world.
When a neural network processes image data, it’s converting each pixel’s color into a corresponding number. For black and white images, the number ranges from 0, which indicates a fully black pixel, up to 255, which is fully white, with numbers in between representing some variation of grey. For color images, the widely used RGB (red, green, blue) model can correspond to over 16 million possible colors. So as graphics rendering technology becomes ever more photorealistic, the distinction between pixels captured by real-world cameras and ones rendered in a game engine is falling away.
Simulation is also a powerful tool because it’s increasingly able to generate synthetic data for sensors beyond just cameras. While high-quality graphics are both appealing and familiar to human eyes, which is useful in training camera sensors, rendering engines are also able to generate radar and lidar data as well. Combining these synthetic datasets inside a simulation allows the algorithm to train using all the various types of sensors commonly used by AVs.
Due to their expertise in producing the GPUs needed to generate high-quality graphics, Nvidia have positioned themselves as leaders in the space. In 2021, the company launched Omniverse, a simulation platform capable of rendering high-quality synthetic sensor data and modeling real-world physics relevant to a variety of industries. Now, developers are using Omniverse to generate sensor data to train autonomous vehicles and other robotic systems.
In our discussion, Sholingar described some specific ways these types of simulations may be useful in accelerating development. The first involves the fact that with a bit of retraining, perception algorithms developed for one type of vehicle can be re-used for other types as well. However, because the new vehicle has a different sensor configuration, the algorithm will be seeing the world from a new point of view, which can reduce its performance.
“Let’s say you developed your AV on a sedan, and you need to go to an SUV. Well, to train it then someone must change all the sensors and remount them on an SUV. That process takes time, and it can be expensive. Synthetic data can help accelerate that kind of development,” Sholingar said.
Another area involves training algorithms to accurately detect faraway objects, especially in highway scenarios at high speeds. Since objects over 200 meters away often appear as just a few pixels and can be difficult for humans to label, there isn’t typically enough training data for them.
“For the far ranges, where it’s hard to annotate the data accurately, our goal was to augment those parts of the dataset,” Sholingar said. “In our experiment, using our simulation tools, we added more synthetic data and bounding boxes for cars at 300 meters and ran experiments to evaluate whether this improves our algorithm’s performance.”
According to Sholingar, these efforts allowed their algorithm to detect objects more accurately beyond 200 meters, something only made possible by their use of synthetic data.
While many of these developments are due to better visual fidelity and photorealism, Sholingar also stressed this is only one aspect of what makes capable real-world simulations.
“There is a tendency to get caught up in how beautiful the simulation looks since we see these visuals, and it’s very pleasing. What really matters is how the AI algorithms perceive these pixels. But beyond the appearance, there are at least two other major aspects which are crucial to mimicking reality in a simulation.”
First, engineers need to ensure there is enough representative content in the simulation. This is important because an AI must be able to detect a diversity of objects in the real world, including pedestrians with different colored clothes or cars with unusual shapes, like roof racks with bicycles or surfboards.
Second, simulations have to depict a wide range of pedestrian and vehicle behavior. Machine learning algorithms need to know how to handle scenarios where a pedestrian stops to look at their phone or pauses unexpectedly when crossing a street. Other vehicles can behave in unexpected ways too, like cutting in close or pausing to wave an oncoming vehicle forward.
“When we say realism in the context of simulation, it often ends up being associated only with the visual appearance part of it, but I usually try to look at all three of these aspects. If you can accurately represent the content, behavior, and appearance, then you can start moving in the direction of being realistic,” he said.
It also became clear in our conversation that while simulation will be an increasingly valuable tool for generating synthetic data, it isn’t going to replace real-world data collection and testing.
“We should think of simulation as an accelerator to what we do in the real world. It can save time and money and help us with a diversity of edge-case scenarios, but ultimately it is a tool to augment datasets collected from real-world data collection,” he said.
Beyond Omniverse, the wider industry of helping “things that move” develop autonomy is undergoing a shift toward simulation. Tesla announced they’re using similar technology to develop automation in Unreal Engine, while Canadian startup, Waabi, is taking a simulation-first approach to training their self-driving software. Microsoft, meanwhile, has experimented with a similar tool to train autonomous drones, although the project was recently discontinued.
While training and testing in the real world will remain a crucial part of developing autonomous systems, the continued improvement of physics and graphics engine technology means that virtual worlds may offer a low-stakes sandbox for machine learning algorithms to mature into functional tools that can power our autonomous future.
Image Credit: Nvidia
发布者:Aaron Frank,转转请注明出处:https://robotalks.cn/ai-is-gathering-a-growing-amount-of-training-data-inside-virtual-worlds/