Generating Safety-Critical Digital Twins

Overview of the Safety-critical Scenario Generation

Wenhao Ding, Chejian Xu, Haohong Lin, Bo Li, Ding Zhao, ''A Survey on Safety-critical Scenario Generation from Methodological Perspective,'' Preprint arXiv:2202.02215.  |

Abstract — Autonomous driving systems have witnessed a significant development during the past years thanks to the advance in machine learning-enabled sensing and decision-making algorithms. One critical challenge for their massive deployment in the real world is their safety evaluation. Most existing driving systems are still trained and evaluated on naturalistic scenarios collected from daily life or heuristically-generated adversarial ones. However, the large population of cars, in general, leads to an extremely low collision rate, indicating that the safety-critical scenarios are rare in the collected real-world data. Thus, methods to artificially generate scenarios become crucial to measure the risk and reduce the cost. In this survey, we focus on the algorithms of safety-critical scenario generation in autonomous driving. We first provide a comprehensive taxonomy of existing algorithms by dividing them into three categories: data-driven generation, adversarial generation, and knowledge-based generation. Then, we discuss useful tools for scenario generation, including simulation platforms and packages. Finally, we extend our discussion to five main challenges of current works – fidelity, efficiency, diversity, transferability, controllability – and research opportunities lighted up by these challenges.

Causal Discovery in Safety-critical Scenarios

Wenhao Ding, Haohong Lin, Bo Li, Ding Zhao, ''CausalAF: Causal Autoregressive Flow for Goal-Directed Safety-Critical Scenes Generation,'' Preprint arXiv:2110.13939.  |

Abstract — Goal-directed generation, aiming for solving downstream tasks by generating diverse data, has a potentially wide range of applications in the real world. Previous works tend to formulate goaldirected generation as a purely data-driven problem, which directly searches or approximates the distribution of samples satisfying the goal. However, the generation ability of preexisting work is heavily restricted by inefficient sampling, especially for sparse goals that rarely show up in offthe-shelf datasets. For instance, generating safety-critical traffic scenes with the goal of increasing the risk of collision is critical to evaluate autonomous vehicles, but the rareness of such scenes is the biggest resistance. In this paper, we integrate causality as a prior into the safety-critical scene generation process and propose a flow-based generative framework - Causal Autoregressive Flow (CausalAF). CausalAF encourages the generative model to uncover and follow the causal relationship among generated objects via novel causal masking operations instead of searching the sample only from observational data. By learning the cause-and-effect mechanism of how the generated scene achieves the goal rather than just learning correlations from data, CausalAF significantly improves the learning efficiency. Extensive experiments on three heterogeneous traffic scenes illustrate that CausalAF requires much fewer optimization resources to effectively generate goal-directed scenes for safety evaluation tasks.

Generative models with Explicit knowledge

Wenhao Ding, Bo Li, Kim Ji Eun, Ding Zhao, ''Semantically Controllable Scene Generation with Guidance of Explicit Knowledge,'' Preprint arXiv:2106.04066.  |

Abstract — Deep Generative Models (DGMs) are known for their superior capability in generating realistic data. Extending purely data-driven approaches, recent specialized DGMs may satisfy additional controllable requirements such as embedding a traffic sign in a driving scene, by manipulating patterns implicitly in the neuron or feature level. In this paper, we introduce a novel method to incorporate domain knowledge explicitly in the generation process to achieve semantically controllable scene generation. We categorize our knowledge into two types to be consistent with the composition of natural scenes, where the first type represents the property of objects and the second type represents the relationship among objects. We then propose a tree-structured generative model to learn complex scene representation, whose nodes and edges are naturally corresponding to the two types of knowledge respectively. Knowledge can be explicitly integrated to enable semantically controllable scene generation by imposing semantic rules on properties of nodes and edges in the tree structure. We construct a synthetic example to illustrate the controllability and explainability of our method in a clean setting. We further extend the synthetic example to realistic autonomous vehicle driving environments and conduct extensive experiments to show that our method efficiently identifies adversarial traffic scenes against different state-of-the-art 3D point cloud segmentation models satisfying the traffic rules specified as the explicit knowledge.

Generate Safety-critical Scenarios for Vehicle-to-Vehicle Encounters using Deep Generative Models

Wenhao Ding, Baiming Chen, Bo Li, JiEun Kim, Ding Zhao, ''Multimodal Safety-Critical Scenarios Generation for Decision-Making Algorithms Evaluation,'' IEEE Robotics and Automation Letters (RA-L), 2021.  |

Wenhao Ding, Baiming Chen, Minjun Xu, Ding Zhao, ''Learning to Collide: An Adaptive Safety-Critical Scenarios Generating Method,'' IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.  |

Wenhao Ding, Mengdi Xu, Ding Zhao, ''CMTS: Conditional Multiple Trajectory Synthesizer for Generating Safety-critical Driving Scenarios,'' IEEE International Conference on Robotics and Automation (ICRA), 2020.  |

Wenhao Ding, Wenshuo Wang, Ding Zhao, ''Multi-Vehicle Trajectories Generation for Vehicle-to-Vehicle Encounters,'' IEEE International Conference on Robotics and Automation (ICRA), 2019.  |

Abstract — Generating multi-vehicle trajectories analogous to these in real world can provide reliable and versatile testing scenarios for autonomous vehicles. This paper presents an unsupervised learning framework to achieve this. First, we implement variational autoencoder (VAE) to extract interpretable and controllable representations of vehicle encounter trajectory. Through sampling from the distribution of these representations, we are able to generate new meaningful driving encounters with a developed Multi-Vehicle Trajectory Generator (MTG). A new disentanglement metric is also proposed to comprehensively analyze and compare models. It can reveal the robustness of models and the dependency among latent codes, thus providing guidance for practical application to improve system performance. Experimental results demonstrate that our proposed MTG outperforms baselines in terms of traffic rationality and disentanglement. These generative samples can provide abundant and controllable driving scenarios, thus providing testbeds and algorithm design insights for autonomous vehicle development.

Understanding Vehicle-to-Vehicle Encounter Scenarios through Machine Learning Approaches

Wenshuo Wang, Weiyang Zhang, Ding Zhao, ''Understanding V2V Driving Scenarios through Traffic Primitives,'' Transportation Research Part C: Emerging Technologies, 2018.  |

Wenshuo Wang, Aditya Ramesh, Ding Zhao, ''Clustering of Driving Scenarios Using Connected Vehicle Datasets,''IEEE Transactions on Intelligent Transportation Systems, 2018.  |

Abstract — Semantically understanding complex drivers’ encountering behavior, wherein two or multiple vehicles are spatially close to each other, does potentially benefit autonomous car’s decision-making design. This paper presents a framework of analyzing various encountering behaviors through decomposing driving encounter data into small building blocks, called driving primitives, using nonparametric Bayesian learning (NPBL) approaches, which offers a flexible way to gain an insight into the complex driving encounters without any prerequisite knowledge. The effectiveness of our proposed primitive-based framework is validated based on 976 naturalistic driving encounters, from which more than 4000 driving primitives are learned using NPBL – a sticky HDP-HMM, combined a hidden Markov model (HMM) with a hierarchical Dirichlet process (HDP). After that, a dynamic time warping method integrated with k-means clustering is then developed to cluster all these extracted driving primitives into groups. Experimental results find that there exist 20 kinds of driving primitives capable of representing the basic components of driving encounters in our database. This primitive-based analysis methodology potentially reveals underlying information of vehicle-vehicle encounters for self-driving applications.

Unifing Heterogeneous Traffic Data using Traffic Primitives

Jiacheng Zhu, Wenshuo Wang, Ding Zhao, ''Attempt to Unify Heterogeneous Driving Databases using Traffic Primitives,'' Proceedings of the IEEE 21th International Intelligent Transportation Systems Conference (ITSC), Maui Hawaii, US, November 4-8, 2018.  |

Abstract — A multitude of publicly-available driving datasets and data platforms have been raised for autonomous vehicles (AV). However, the heterogeneities of databases in size, structure and driving context make existing datasets practically ineffective due to a lack of uniform frameworks and searchable indexes. In order to overcome these limitations on existing public datasets, this paper proposes a data unification framework based on traffic primitives with ability to automatically unify and label heterogeneous traffic data. This is achieved by two steps: 1)Carefully arrange raw multidimensional time series driving data into a relational database and then 2) automatically extract labeled and indexed traffic primitives from traffic data through a Bayesian nonparametric learning method. Finally, we evaluate the effectiveness of our developed framework using the collected real vehicle data.

Trip, Scenario, Primitive- In order to explicitly represent autonomous driving data, three terms as Trip, Scenario and Primitive are used to describe driving behavior based from different scale of view. Trip, It is used to describe the partial of data during which the sensor are collecting data continuously. Usually, one dataset contains at least one trip. Due to the continuity, some attributes like driving environment, weather and road condition do not change rapidly. Scenario, Common $scenarios$ can be found from different datasets. Scenarios do not vary a lot from driving environments, road conditions or vehicle and sensor configuration. Usually, a trip contains a set of scenarios. Primitive, Primitive is the smallest element of driving data. It is the result of HDP-HSMM which . Scenarios are composed by primitives.

Fuel Economy and Emission Testing for Connected and Automated Vehicles

Yan Chang, Weiqing Yang, Ding Zhao, ''Fuel Economy and Emission Testing for Connected and Automated Vehicles Using Real-world Driving Datasets,'' Proceedings of the IEEE 21st International Intelligent Transportation Systems Conference (ITSC), Maui Hawaii, US, November 4-8, 2018.  |

Abstract — By using the onboard sensing and external connectivity technology, connected and automated vehicles (CAV) could lead to improved energy efficiency, better routing, and lower traffic congestion. With the rapid development of the technology and adaptation of CAV, it is more critical to develop the universal evaluation method and the testing standard which could evaluate the impacts on energy consumption and environmental pollution of CAV fairly, especially under the various traffic conditions. In this paper, we proposed a new method and framework to evaluate the energy efficiency and emission of the vehicle based on the unsupervised learning methods. Both the real-world driving data of the evaluated vehicle and the large naturalistic driving dataset are used to perform the driving primitive analysis and coupling. Then the linear weighted estimation method could be used to calculate the testing result of the evaluated vehicle. The results show that this method can successfully identify the typical driving primitives. The couples of the driving primitives from the evaluated vehicle and the typical driving primitives from the large real-world driving dataset coincide with each other very well. This new method could enhance the standard development of the energy efficiency and emission testing of CAV and other off-cycle credits.

Traffic Primitives: Extract Fundamentional Traffic Elements using Bayesian Unsupervised Learning

Wenshuo Wang, Ding Zhao, ''Extracting Traffic Primitives Directly from Naturalistically Logged Data for Self-Driving Applications,'' IEEE IEEE Robotics and Automation Letters, 2017.  |

Wenshuo Wang, Junqiang Xi, Ding Zhao, ''Driving Style Analysis Using Primitive Driving Patterns With Bayesian Nonparametric Approaches,''IEEE Transactions on Intelligent Transportation Systems, 2017.  |

Abstract — Developing an automated vehicle, that can handle the complicated driving scenarios and appropriately interact with other road users, requires the ability to semantically learn and understand the driving environment, oftentimes, based on the analysis of massive amount of naturalistic driving data. An important paradigm that allows automated vehicles to both learn from human drivers and develop deeper insights is understanding traffic primitives, representing principal compositions of the entire traffic. However, the exploding driving data growth presents a great challenge in extracting primitives from a long-term multidimensional time-series traffic scenario data with multiscale varieties of road users get involved. Therefore, automatic primitive extraction is becoming one of the cost-efficient ways to help autonomous vehicles understand and predict the complex traffic scenarios. In addition, the extracted primitives from raw data should 1) be appropriate for automated driving applications and also 2) be easily combined to generate new driving scenarios. Existing literature does not provide a method to automatically learn these primitives from large-scale traffic data. The contribution of this paper has two manifolds. One is that we proposed a new framework to generate new traffic scenarios from a handful of limited traffic data. The other one is that we introduce a nonparametric Bayesian learning method - a sticky hierarchical Dirichlet process hidden Markov model - that can automatically extract primitives from multidimensional driving data without prior knowledge of the primitive settings. The developed method is validated using one day of naturalistic driving data. Experiment results show that the nonparametric Bayesian learning method extracts primitives from traffic scenarios where both the binary and continuous events coexist.

TrafficNet: An Open Naturalistic Driving Scenario Library

Ding Zhao, Yaohui Guo, Yunhan Jack Jia, ''TrafficNet: An Open Naturalistic Driving Scenario Library, '' Proceedings of the IEEE 20th International Intelligent Transportation Systems Conference (ITSC), Yokohama, Japan, October 16-19, 2017.  |

Abstract — The enormous efforts spent on collecting naturalistic driving data in the recent years has resulted in an expansion of publicly available traffic datasets, which has the potential to assist the development of the self-driving vehicles. However, we found that many of the attempts to utilize these datasets have failed in practice due to a lack of usability concern from the organizations that host these collected data. For example, extracting data associated with certain critical conditions from naturalistic driving data organized in chronological order may not be convenient for a vehicle engineer that doesn't have big data analytics experiences. To address the general usability challenges of these publicly available traffic datasets, we propose TrafficNet, a large-scale and extensible library of naturalistic driving scenarios, aiming at bridging the gap between research datasets and practically usable information for vehicle engineers and researchers. The proposed web-based driving scenario database preprocesses massive raw traffic data collected in chronological order into an organized scenario-based dataset by applying a set of categorization algorithms to label the naturalistic driving data with six different critical driving scenarios. TrafficNet opens not only the scenario library but also the source code of these categorization methods to the public, which will foster more sophisticated and accurate scenario-based categorization algorithms to advance the intelligent transportation research. The source code and the scenario database can be accessed at

Evaluation of Connected Vehicle Communication with Three-year Real World Data

Xianan Huang, Ding Zhao, Huei Peng, ''Empirical Study of DSRC Performance Based on Safety Pilot Model Deployment Data,'' IEEE Transactions on Intelligent Transportation Systems, 2017.  |

Abstract — Dedicated Short Range Communication (DSRC) was designed to provide reliable wireless communication for intelligent transportation system applications. Sharing information among cars and between cars and the infrastructure, pedestrians, or "the cloud" has great potential to improve safety, mobility and fuel economy. DSRC is being considered by the US Department of Transportation to be required for ground vehicles. In the past, their performance has been assessed thoroughly in the labs and limited field testing, but not on a large fleet. In this paper, we present the analysis of DSRC performance using data from the world's largest connected vehicle test program - Safety Pilot Model Deployment lead by the University of Michigan. We first investigate their maximum and effective range, and then study the effect of environmental factors, such as trees/foliage, weather, buildings, vehicle travel direction, and road elevation. The results can be used to guide future DSRC equipment placement and installation, and can be used to develop DSRC communication models for numerical simulations.