An Integrated Approach for Traffic Scene Understanding from Monocular Cameras

Typ: Fortschritt-Berichte VDI
Erscheinungsdatum: 28.09.2021
Reihe: 12
Band Nummer: 815
Autor: M. Sc. Malte Oeljeklaus
Ort: Essen
ISBN: 978-3-18-381512-8
ISSN: 0178-9449
Erscheinungsjahr: 2021
Anzahl Seiten: 154
Anzahl Abbildungen: 77
Anzahl Tabellen: 24
Produktart: Buch (paperback, DINA5)

Produktbeschreibung

This thesis investigates methods for traffic scene perception with monocular cameras for a basic environment model in the context of automated vehicles. The developed approach is designed with special attention to the computational limitations present in practical systems. For this purpose, three different scene representations are investigated. These consist of the prevalent road topology as the global scene context, the drivable road area and the detection and spatial reconstruction of other road users. An approach is developed that allows for the simultaneous perception of all environment representations based on a multi-task convolutional neural network. The obtained results demonstrate the efficiency of the multi-task approach. In particular, the effects of shareable image features for the perception of the individual scene representations were found to improve the computational performance.

Contents
Nomenclature VII
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Outline and contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Related Work and Fundamental Background 8
2.1 Advances in CNN architectures for image processing . . . . . . . . . . . 8
2.2 Traffic scene representations from monocular cameras . . . . . . . . . . 9
2.3 Fundamental principles and general framework . . . . . . . . . . . . . . 14
3 Experimental Setup and Data Acquisition 20
3.1 Outline of the camera system and test platforms . . . . . . . . . . . . . . 20
3.2 Inferring scene points from image space measurements . . . . . . . . . . 24
4 Network Architecture for Multi-task Feature Sharing 28
4.1 General design considerations . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Multi-task learning and architectural implications . . . . . . . . . . . . . 31
4.3 Comparison and choice of the feature encoder architecture . . . . . . . . 35
5 Global Road Topology from Scene Context Recognition 38
5.1 Use and taxonomies of the traffic scene context . . . . . . . . . . . . . . 38
5.2 Recognition decoder and architecture integration . . . . . . . . . . . . . 40
5.3 Road-topology recognition experiments . . . . . . . . . . . . . . . . . . . 42
6 Drivable Road Area from Semantic Image Segmentation 50
6.1 Traffic scene segmentation as dense classification . . . . . . . . . . . . . 51
6.2 Segmentation decoder architecture and spatial priors . . . . . . . . . . . 52
6.3 Experiments on drivable road area segmentation . . . . . . . . . . . . . 58
7 Road Users from Bounding Box Detection 64
7.1 Classification and localization of 2D bounding boxes . . . . . . . . . . . 64
7.2 Auxiliary regressands and decoder architecture for spatial reconstruction 69
7.3 Object detection and reconstruction experiments . . . . . . . . . . . . . . 78
8 Multi-task Integration and Conclusive Experimental Analysis 84
8.1 Multi-task decoder and architecture integration . . . . . . . . . . . . . . 84
8.2 Practical strategy for the joint training of all perceptual tasks . . . . . . 85
8.3 Experimental results and comparison . . . . . . . . . . . . . . . . . . . . 87
9 Summary, Conclusion, and Outlook 97
A Appendix 100
A.1 Road topology dataset statistics . . . . . . . . . . . . . . . . . . . . . . . . 100
A.2 Technical specifications of the camera system . . . . . . . . . . . . . . . . 100
A.3 Single-task pre-rec curves for all road topologies . . . . . . . . . . . . . . 101
A.4 Overview of the segmentation decoder with Hadamard layer . . . . . . 103
A.5 Detailed breakdown of the single-task KITTI road segmentation results 104
A.6 Overview of the SSD decoder with auxiliary regressands . . . . . . . . . 105
A.7 Dual-task Rec+Seg pre-rec curves for road topology recognition . . . . . 106
A.8 Dual-task Rec+Det pre-rec curves for road topology recognition . . . . . 108
A.9 Multi-task pre-rec curves for road topology recognition . . . . . . . . . . 110
A.10 Dual-task road topology confusion matrices . . . . . . . . . . . . . . . . 112
A.11 Detailed breakdown of the multi-task KITTI road segmentation results 113
A.12 Full runtime measurement data . . . . . . . . . . . . . . . . . . . . . . . . 114
Bibliography 115

Keywords: Szenenverständnis – Umfeldrepräsentation – 3D Rekonstruktion – tiefe neuronale Netze – Multi-task Lernen – geteilte Bildmerkmale – eingebettete Bildverarbeitung – Fortschrittliche Fahrerassistenzsysteme – Automatisiertes Fahren, Scene Understanding – Environment Representation – 3D Reconstruction – Convolutional Neural Networks – Multi-task Learning – Feature Sharing – Embedded Computer Vision – Advanced Driver Assistance Systems – Automated Driving

57,00 € inkl. MwSt.
VDI-Mitgliedspreis:*
51,30 € inkl. MwSt.

* Der VDI-Mitgliedsrabatt gilt nur für Privatpersonen