KineticFlow: the universal vision neural network

KineticFlow™ is Ghost's neural network that processes raw camera inputs to develop scene understanding. Grounded in physics, the network detects surfaces and objects in a scene and calculates distance, velocity, and motion vector for every pixel.
Unlike other algorithms that rely upon image-based object recognition, KineticFlow detects obstacles and road users based upon the physical properties of objects and light in motion, enabling object detection without requiring object recognition.
Just as humans use multiple cues to establish depth with vision, KineticFlow fuses together multiple mono and stereo computer vision algorithms into a single neural network. The network analyzes video sequences from multiple cameras over time to improve object detection and manage occlusions.

Why existing perception techniques won’t deliver L4 autonomy

Autonomous driving maneuvers require an accurate understanding of the driving scene – the arrangement of roads and objects and how they move relative to the car. This understanding begins with detection and measurement – discovering the objects and road elements, and estimating the distance, velocity, and motion direction of each object.

To produce this, most systems leverage several computer vision algorithms organized into a perception pipeline that builds upon each successive algorithm’s outputs.
Traditional computer vision pipeline
This legacy pipeline has been widely adopted for L2 advanced driver-assistance systems (ADAS). ADAS features are built with the fundamental assumption that a backup driver will remain attentive at all times, which significantly relaxes the requirements for object recognition reliability, distance estimation accuracy, and overall resiliency to failures. Many systems operate on just a single camera, using a single processing chip. This technology was never designed for L4 autonomous driving without a human safety backup.

To address some of its shortcomings, developers have tried adding sensors such as LiDAR to verify distance and velocity measurements. However, this simply ends up trading unresolved computer vision challenges for other issues, adding expense, reliability concerns, and sensor fusion complexity.
  • Limitations of image recognition
    Image-based object recognition cannot be trained for every possible object, and struggles to recognize objects that are uncommon, rotated, flipped, or partially occluded.
  • Accuracy
    Distance estimations are subject to the quality of object recognition - mistaking a 1.7-meter-wide car for a 2.1-meter-wide car will result in an estimation error of more than 30%.
  • Resiliency
    Susceptible to failure on component loss if implemented with a single camera or processing chip.
  • Low light & occlusion performance
    Degrades with distance, poor visibility, and low-lighting conditions because image-based object recognition requires high-quality images.
  • Power efficiency
    Requires high-performance compute and/or specialized ASICs to run in real-time in the car, adding cost and reducing flexibility.
A new approach

KineticFlow: next-generation AI for autonomy

Based on the universal laws of physics, Ghost is developing a new type of artificial intelligence designed to overcome the limitations of traditional computer vision algorithms. KineticFlow analyzes pixels from multi-camera video to detect surfaces and objects and calculate their distance, velocity, and motion with high speed, accuracy, and reliability.
KineticFlow Infographic Demo from inside a car
Universal detection, no recognition required
KineticFlow is trained on the universal physics properties that dictate how all objects behave, move, and how light bounces off them, removing the dependency on traditional image-based object recognition.
Reduces risk of misrecognized or unrecognized objects.
Reduces measurement errors from estimates based on object type.
Low light and occlusion performance
By eliminating the need for explicit recognition, obstacles can be perceived in low light or states of partial reveal.
Spotlight: Physics-based detection
Real-time performance
on standard SoCs
Complex computer vision algorithms that require minutes per frame on high-powered CPUs are trained in the data center and converted into the KineticFlow neural network, enabling real-time execution on the road in just μs per frame with low-power system-on-a-chip processors (SoCs).
Intensive algorithms like high-definition stereo vision can now run in real-time in the car.
Low power
Mobile SoCs require a fraction of the power of CPUs/GPUs typically used for autonomy, increasing vehicle range and fuel efficiency.
By foregoing custom ASICs, algorithm updates can be regularly performed over-the-air for evergreen improvement.
Scalable training with auto-labeled data
The universal nature of KineticFlow’s physics-based algorithms makes automated training in the data center possible with training data sets that can be verified for completeness and are orders-of-magnitude smaller than other approaches.
Physics-based approach enables smaller training sets that can be assembled and verified for completeness across all hyperspace dimensions.
No human labeling
All training data is automatically labeled in the data center and validated versus ground truth.
Rapid cycle times
New sensor generations, vehicle configurations, and features can be rapidly re-trained and validated.
Spotlight: Automated data labeling
KineticFlow Infographic Demo
Combining complementary algorithms
By integrating multiple computer vision algorithms - including physics-based detection, stereo disparity, and a 3D variant of optical flow - into a single neural network, KineticFlow detects the objects in a scene and the distance, velocity, and motion path of every pixel.
Multiple algorithms naturally reinforce one another to collectively provide higher-confidence outputs than each can generate alone.
As the strengths of each algorithm vary across distances, speeds, and lighting conditions, every output includes a confidence range, enabling the driving program to take accuracy into account.
While mono or stereo vision fail completely with the loss or occlusion of a single camera, KineticFlow can fallback to single-camera mode, producing a subset of information that still enables safe driving in most operational design domains when coupled with radar.
High-definition video
KineticFlow leverages the resolution, dynamic range, and speed of modern 48-megapixel camera sensors, enabling long-distance perception and obviating the need for multiple cameras at different fields of view.
Automatically corrects for occluded objects and unusual or missing frames by combining data over time, including managing temporary occlusions by analyzing stereo video streams.
48-megapixel sensors equip computer vision algorithms to operate at distances required for high-speed highway driving.
Explore the platform
Learn how the Ghost Autonomy Engine works
Explore our approach to safety
Learn how we’re building safe self-driving for everyone
Tech Spotlight

Universal, physics-based object detection

For traditional approaches to computer vision, it all starts with the object. AI algorithms are trained to recognize objects using millions of images of what might be on the road – cars, trucks, motorcycles, cones, and various pieces of debris – in every color, shape, rotation, and lighting condition. Unfortunately, the infinite list of things that may be on the road makes this an intractable problem – straightforward to get to 99.99…% functional, but impossible to get to 100%.

This can lead to several problems on the road, including the risk of colliding with a misrecognized or unrecognized object and the risk of misestimating an object’s size and therefore its distance. More nuanced challenges include the difficulty of object recognition in low lighting conditions or when objects are partially occluded, as half of a car doesn’t always look like a car.
KineticFlow takes an entirely different approach: universal detection without requiring object recognition.
KineticFlow detection DemoKineticFlow detection demo with heatmap
KineticFlow relies upon the physics properties of objects:
  • Detects the planes of objects, i.e., flat surfaces that point in a given direction and reflect light consistently. A nearby object may be deconstructed into five to ten planes, whereas a faraway object may appear as a single plane.
  • Multiple planes that move together can be grouped together and detected as an object by observing them over time.
  • The road, i.e., the ground plane, can be detected and disambiguated from the scene.
  • Objects on the road surface such as painted features can then be disambiguated from objects above the road, i.e., vehicles, obstacles, and bridges.

Automated data labeling and training

To reduce the risk of the “long tail” of image recognition – unusual and infrequently seen objects – autonomous software developers are exponentially increasing the size of their training data. This not only leads to data growth, but also growth in the GPU farms required to train networks, and in the human armies of manual data labelers who annotate and verify ground truth images, with no end in sight.

Because of its physics-based approach to AI, Ghost can not only train KineticFlow on a substantially smaller data corpus, but it can also use mathematical processes to reason about the completeness of the training data set.
Automated data labeling and training infographic
Ghost’s fleet of vehicles collect ground truth video for training with the same stereo cameras that are used by the Ghost Autonomy Engine for perception and driving. This data is automatically uploaded to Ghost Train, the centralized AI training infrastructure, where it is automatically labeled using cloud compute and high-fidelity computer vision algorithms and validated for labelling accuracy. This process is continuously building Ghost’s training data repository.

To train a neural network, a training data set is assembled by pulling data from the Ghost repository to cover every aspect of each dimension of the network’s hyperspace, instead of just amassing training data randomly and hoping to achieve full coverage. This ensures the completeness of the network’s training data set, safeguarding against gaps or the risk of overfitting. During the training process, the neural network is trained with a primary training data set, validated against an independent validation data set, and finally validated again against a third holdout data set to protect against overfitting. This entire process is automated – no human data labeling is required.
Get the Autonomy Engine Tech Brief
Thank you
If the download did not start, click here to view the pdf.
Oops! Something went wrong while submitting the form.