Waymo Develops a Machine Learning Model to Predict the Behavior of Other Road Users for its Self-Driving Vehicles

Waymo Develops a Machine Learning Model to Predict the Behavior of Other Road Users for its Self-Driving Vehicles

Author: Eric Walz   

The emerging field of machine learning has important uses in a variety of fields, such as finance, healthcare and business. For self-driving cars, machine learning is an important tool that can be used to predict the behavior of other road users, such as other human drivers, pedestrians and bicyclists.

For Alphabet subsidiary Waymo, machine learning is one of its most important tools in its arsenal to build the world's best autonomous driving system it calls "Waymo Driver" that performs better than a human driver. 

However, unlike a computer, human drivers have the ability to anticipate and predict what others on the road might do and can learn from past experiences, something that's difficult to train a computer to do and requires lots of processing power. So Waymo developed is own machine learning model that can do the same job and with less compute.

For example, when approaching an intersection a human driver might anticipate that another driver traveling in the opposite direction will make a left turn in their path or a pedestrian may enter the roadway. By anticipating this behavior the driver can mentally prepare to brake if needed. Predicting these types of behaviors for a computer is challenging for engineers working on self-driving vehicles. 

That's where machine learning comes into play, allowing Waymo's autonomous vehicles to make better decisions. Machine learning is commonly used to model and reduce some of this complexity, thereby enabling the self-driving system to learn new types of behavior.

Waymo collects data from the real world from driving millions of miles and billions more miles in computer simulation built from data collected from its fleet. To navigate, Waymo's autonomous vehicles rely on highly complex, high-definition maps and vehicle sensor data. However, this data alone is not enough to make predictions, according to Waymo.

Simplifying a Complex Scene

The behavior of other road users is often complicated and difficult to capture with just map-derived traffic rules because driving patterns vary and human drivers often break the rules they're supposed to follow. 

The most popular way to incorporate highly detailed, centimeter-level maps into behavior prediction models is by rendering the map into pixels and encoding all of the scene information, such as traffic signs, crosswalks, road lanes, and road boundaries, with a convolutional neural network (CNN). 

However, this method requires a tremendous amount of processor power and takes time (latency), which is not ideal for a self-driving vehicle that needs to make decisions in a fraction of a second.

To address these issues and make better predictions Waymo developed a new model it calls "VectorNet", that provides more accurate behavior predictions while using less compute than CNNs, the company says. VectorNet essentially takes a highly complex scene and simplifies it using vectors so it can be processed with less computing power. 

Map features can be simplified into vectors, which are easier for machine learning models to process.

Screen Shot 2020-05-14 at 6.11.24 PM.png

This complex scene can be broken down into vectors to make is easier to process. 

For example, an intersection crosswalk can be represented as a polygon defined by several points and a stop sign can be represented by a single point. Road curves can be approximately represented as polylines by "connected the dots." These polylines are then further split into vector fragments. 

In this way, Waymo's engineers are able to represent all the road features and the trajectories of the other objects as a set of simplified vectors instead of a highly complex scene, which is much more difficult to work with. With this simplified view, Waymo designed VectorNet to effectively process its vehicle sensor data and map inputs.

Screen Shot 2020-05-14 at 6.13.25 PM.png

The neural network is implemented to capture the relationships between various vectors. These relationships occur when, for example, a car enters an intersection or a pedestrian approaches a crosswalk. Through learning such interactions between road features and object trajectories, VectorNet's data-driven, machine learning-based approach allows Waymo to better predict other agents' behavior by learning from different behavior patterns. 

Waymo proposed a novel hierarchical graph neural network. The first level is composed of polyline subgraphs. Then VectorNet gathers information within each polyline. In the second level called "global interaction graph", VectorNet exchanges information among polylines.

Screen Shot 2020-05-14 at 6.15.16 PM.png

Here is the simplfied intersection with the input vectors that are converted to polyline subgraphs.

To further boost VectorNet's capabilities and understanding of the real world, Waymo trained the system to learn from context clues to make inferences about what could happen next around the vehicle to make improved behavior predictions. 

For example, important scene information can often be obscured while driving, such as a tree branch blocking a stop sign. When this happens to a human driver, they can draw upon past experiences about the possibility of the stop sign being there, although they cannot see it. Machine learning makes these types of predictions using inference. 

To further improve the accuracy of VectorNet, Waymo randomly masks out map features during training, such as a stop sign at a four-way intersection and requiring the CNN to complete it. 

In this way, VectorNet can further improve the Waymo Driver's understanding of the world around it and be better prepared for any unexpected situations.

pasted image 0.png

The intersection is broken down to create a global interaction graph.

Waymo validated the performance of VectorNet with the task of trajectory prediction, an important task for a self-driving vehicle that interacts with human drivers on the road. Compared with ResNet-18, one of the most advanced and widely used CNNs, VectorNet achieves up to 18% better performance while using only 29% of the parameters and consuming just 20% of the computation when there are 50 agents (other vehicles, pedestrians) per scene, Waymo reported.

Also this week, Waymo announced its latest funding round of $750 million. With the new funding, Waymo has raised $3 billion since March. The company is working on self-driving cars, commercial robotaxis and self-driving trucks that will all be powered by its advanced AI software.

Eric Walz
Eric Walz
Originally hailing from New Jersey, Eric is a automotive & technology reporter covering the high-tech industry here in Silicon Valley. He has over 15 years of automotive experience and a bachelors degree in computer science. These skills, combined with technical writing and news reporting, allows him to fully understand and identify new and innovative technologies in the auto industry and beyond. He has worked at Uber on self-driving cars and as a technical writer, helping people to understand and work with technology.
Prev:Tesla is Working on a New ‘Million Mile’ Battery That Will Lower the Cost of its Electric Vehicles Next:Ford Announces Improved Ranges for the Mach-E Electric SUV After 10 Minutes of DC Fast Charging
    view more