Search Google

Sunday, 20 September 2015

A look at the rocket science technologies that power Google Self-Driving Cars


This article was written by Aaron Yip, graduate TA for a course on self-driving cars)

The Google Self-Driving Car is one of the most popular self-driving vehicles, often synonymous with self-driving car, and a great example of the some of the technologies required for autonomous navigation. To understand the technologies, we need to first understand the problems.

Problem #1: What is the world? This is called mapping and refers to understanding the structures of the world around you. Imagine waking up in a strange, perfectly dark room where you can turn on lights gradually to see what's inside the room.

Problem #2: Where am I? This is called localization and refers to understanding the relative location you are in the world. Imagine being blindfolded while inside your own house. You know where everything is, but you only get a better, rough approximation through slowly feeling your way around.

Robotic navigation is like being in a strange, perfectly dark room with a blindfold on. That's a tough problem to have. And to make it even harder...

Problem #3: Did I just run over someone? This is an absolute no-no for self-driving cars for hopefully obvious reasons. Strangely enough, humans tend to seemingly ignore this problem all the time.

As we look at the technologies, think about them in terms of how they solve the above problems. Okay, got it? Let's drive right in!

1. The car itself Self-driving cars are still cars, and Google's fleet has historically included Toyota Prius, Audi TT, and Lexus RX450h. Its current fleet is about two dozen Lexus SUVs outfitted with an array of hardware and software.

raw output from the HDL-64E LIDAR
2. Lasers! Velodyne's HDL-64E is the car's famous LIDAR system, a.k.a. the whirling laser range finder mounted on top of the car. This system works as you might think, via measuring the distance of its surroundings by bouncing its 64 lasers off of objects as it spins 360 degrees. The HDL-64E reads up to 1.3 million points per second, accurate to 2cm within a radius of ~100-120 meters. This ability comes with a hefty price tag ($70,000), but the price of the eventual consumer model will be reduced in the next handful of years.

3. Position tracking: good ol' GPS and others
Just by pinging GPS satellites and triangulation of the received radio signals, we can find out where we are in the world. One example found in Google cars is the Applanix POS LV GPS navigation system. Add in standard vehicle navigation instruments like tachometers and gyroscopes (rotational measurements), altimeters (altitude), and odometers (distance traveled), and we now have a solid package. Many modern day consumer vehicles are outfitted with these gadgets already -- and that's great news. Google's self-driving car also includes an additional nifty gadget called the wheel encoder to measure lateral movement from the left rear wheel. GPS and the other components are designed to be only accurate within a few meters, unfortunately. A few meters is a big difference. We are going to have to do much, much better.

4. Google Maps
The laser mapping, its on-board GPS, and lateral movement sensors all feed into a data bucket that gets poured into the high resolution maps Google has from its extensive mapping initiative. By high resolution, I mean detailed down to the height of the curbs and the dimensions of the lane the car is currently traveling in. Google Maps work pretty well on your phone. Google Maps for self-driving cars works even better. We're adding another layer of mapping and map alignment, so imagine the entire map rotating to fit the raw data of your surrounding. If you can't solve mapping via just lasers, just throw one of the best map systems in the world at it.
The Retrofitted Lexus RX450h SUVs

5. Video cameras 
Pairs of digital video cameras are mounted, slightly separated by a known distance, around the exterior of the car. The high performance stereo cameras typically have 50 degrees FOV and accuracy up to 30 meters.

 The neat part here are the algorithms rather than the specific camera technologies -- which can vary, but Google does favor Point Grey (http://www.edmundoptics.com/came...). A single image can only reliably offer 2D information, but two images of the same scene offset by a known distance allows depth information (distance away) to be derived. This concept is called stereo vision and aids in the mapping process.

 Notably, one more additional camera is mounted near the rear-view mirror specifically for traffic lights, signs, and pedestrians in front of the car.


6. Echo technology: radar and sonar 
While LIDAR and stereo camera help solve mapping, a set of four radar sensors attached to the front and back bumpers are used for accurately estimating distance and speed of the obstacles in real-time. Radar is a well established technology that bounces high frequency radio waves to do its object tracking, and this particular radar system is good for about 200 meters. Google likes Bosch bosch-Driver assistance systems - Adaptive cruise control (ACC). For some car prototypes, a smaller sonar setup that echoes sound waves out 6 meters have been tested. These systems automatically apply the brakes, pre-tension seat belts for impact, or swerve to avoid obstacles.

At this point the minimalist in you may be thinking: Wow, this is a lot of stuff. Some of these systems sound redundant -- radar, sonar, GPS, laser range finders, and cameras? Do we really need everything? The answer is definitely yes. Redundancy addresses unusual driving conditions like fog or snow -- when visibility for some sensors including the LIDAR or GPS/network setups are essentially blind, and other feeds like the video cameras will spam confused inputs.

7. Humans

To the likely chagrin of the vehicle itself, Google Self-Driving Cars still need human babysitters. Humans are overwhelmingly the highest source of accidents for Google's self-driving cars (100% as of June 2015, in over 1.8 million miles). Humans are running with some highly advanced technologies like eyes that can capture 180 degrees horizontal FOV with dynamic range and movement, biological neural networks that are some of the most complex systems in the known universe, and even handheld smart phones. Yet despite of their decades of individual training data and millions of years of evolution, humans still make silly mistakes.

 Thankfully, autonomous mode is activated for most of the time that humans are in the car. A pair of humans are tasked with monitoring the road and normal car operations when necessary. One of the earliest modifications: Grabbing that wheel works as an "OH SHI- oh whew, we're okay".

8. Software algorithms
What the car sees via probabilistic maps

Finally the car needs to understand all of its knowledge from the gathered data sources. This is where fun algorithms come in! We're going to throw these algorithms at the onboard processors; an example setup from the car would be two Xeon computers running Linux: a 12-core server for vision and laser algorithms while a 6-core server takes care of planning, control, and low-level communication. We're not going to go too deeply into any of the specific algorithms here, but we can mention some of the software challenges.

Unsupervised multi-beam laser calibration - we need to understand optimal extrinsic parameters from each sensor (accounting for and to calculate things like distance, pose, orientation). Think of it as standardizing the input against all of the other input sources. Turns out, for dozens of input sources, this is pretty hard.

Mapping and localization - Remember the fundamental problems we considered back at the beginning? The data from LIDAR and other sensors go into algorithms like Particle filter and Kalman filter and Recursive Bayesian estimation, essentially iterative passes for positioning weighted based on Bayesian inference principles, to generate real-time probabilistic mappings of the car's surroundings. This may sound complicated. All you need to know is that the car definitely knows what it's doing -- probably.

The functional driverless prototype
Object recognition - to know how to behave correctly, depending on whether the object is another vehicle, pedestrian, bicyclists, sign, traffic light, etc. Most of the objects are labeled through pre-trained, deep-learning, boosted classification models Boosting methods for object categorization -- one for shape of the object and one for motion of the object -- and classified objects can be then tracked using Kalman filtering.


Trajectory planning - to deal with crazy human drivers and driving physics: merging into traffic flow, passing on-coming traffic, changing lanes, avoiding other vehicles, etc. For control/optimization theory, we start getting into the Bellman equation for decision-making and modeling. For the moving frame analysis, we can rely on the Frenet–Serret formulas, and create lateral and longitudinal cost functionals for different tasks as well as to mimic human-like driving behavior.

 Dynamic modeling and control - feeds from the previously mentioned trajectory output to control the system of the car itself. The Google car blends multiple strategies from Model predictive control, well established physically based car physics models, and PID controller for operating low-level actions like applying torque on the wheels. The planner and the control algorithms cycle to understand the motions of the world and what ought to be done next.

 All that plus several others major algorithmic challenges like interpreting traffic signals and signs. Pretty neat stuff to explore further, right? Here's a course on Udacity by Sebastian Thrun, one of the inventors of Google's Self-Driving Car along with his Stanford team: Artificial Intelligence for Robotics Course

 9. Information Lastly, the difference between Google's self-driving car project and all other similar projects is the scale of information: Google computing, its onboard processors as well as remote servers, processes up to 1GB/sec from these sensors. Remember all of the databases/architecture that folks like Jeff Dean and thousands of talented Googlers have developed over the years to handle petabytes of training data and maps, remote server farms dedicated to supporting some heavy processes that the car needs, optimized algorithms for data crunching, and innumerable other ideas? Good computer science is the true heart of the self-driving car.


Thanks for reading! And feel free to point out corrections. :)