Human Pose Estimation from Monocular Images

Typ: Fortschritt-Berichte VDI
Erscheinungsdatum: 29.09.2020
Reihe: 10
Band Nummer: 869
Autor: Bastian Wandt M. Sc.
Ort: Hannover
ISBN: 978-3-18-386910-7
ISSN: 0178-9627
Erscheinungsjahr: 2020
Anzahl Seiten: 130
Anzahl Abbildungen: 47
Anzahl Tabellen: 8
Produktart: Buch (paperback DINA5)

Produktbeschreibung

Abstract

This dissertation deals with the problem of capturing human motions and poses using a single camera. The frst part of the thesis proposes two closely related approaches for the 3D reconstruction of human motions from image sequences. To resolve inherent ambiguities in monocular 3D reconstruction the main idea of this part is to exploit temporal properties of human motions in combination with a human body model learned from training data. The second part of the thesis tackles the problem of reconstructing a human pose from a single image. A human body model is learned by training a deep neural network that covers nonlinearities and anthropometric constraints.

C O N T E N T S
1 Introduction …..
1
1.1 Applications and Commercial Systems . . . . . . . . . . . 1
1.2 Image-based Motion Capture . . . . . . . . . . . . . . . . 2
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Time Consistent Human Motion Reconstruction . 6
1.3.2 RepNet . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . 7
1.5 List of Publications . . . . . . . . . . . . . . . . . . . . . . 10
1.5.1 Human Motion Capture . . . . . . . . . . . . . . . 10
1.5.2 Other Publications . . . . . . . . . . . . . . . . . . 13
2 Related work ….. 17
2.1 Non-rigid Structure-from-Motion . . . . . . . . . . . . . . 17
2.2 Single Image Approaches . . . . . . . . . . . . . . . . . . 18
2.2.1 Reprojection Error Optimization . . . . . . . . . . 19
2.2.2 Direct Inference using Neural Networks . . . . . . 19
2.3 Time Consistent Human Motion Capture . . . . . . . . . 20
3 Fundamentals ….. 22
3.1 Camera Models . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1 Projective Transformations . . . . . . . . . . . . . 23
3.1.2 Intrinsic Parameters . . . . . . . . . . . . . . . . . 24
3.1.3 Extrinsic Parameters . . . . . . . . . . . . . . . . . 25
3.1.4 Simplified Camera Models . . . . . . . . . . . . . . 26
3.2 Human Pose Representations . . . . . . . . . . . . . . . . 28
3.2.1 Coordinate-based Representations . . . . . . . . . 28
3.2.2 Surface Mesh-based Representations . . . . . . . . 30
3.2.3 Subspaces of Human Poses . . . . . . . . . . . . . 31
3.3 Non-Rigid Structure from Motion . . . . . . . . . . . . . . 33
3.4 Error Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Exploiting temporal properties ….. 40
4.1 Periodic and Non-periodic Constraints . . . . . . . . . . . 41
4.1.1 Factorization model . . . . . . . . . . . . . . . . . 44
4.1.2 Camera Parameter Estimation . . . . . . . . . . . 45
4.1.3 Periodic Motion . . . . . . . . . . . . . . . . . . . 47
4.1.4 Non-Periodic Motion . . . . . . . . . . . . . . . . . 48
4.1.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.6 Experimental Results . . . . . . . . . . . . . . . . 51
4.1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . 63
4.2 A Novel Kinematic Chain Space . . . . . . . . . . . . . . 65
4.2.1 Estimating Camera and Shape . . . . . . . . . . . 66

4.2.2 Kinematic Chain Space . . . . . . . . . . . . . . . 67
4.2.3 Trace Norm Constraint . . . . . . . . . . . . . . . 68
4.2.4 Camera . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.6 Experiments . . . . . . . . . . . . . . . . . . . . . 71
4.2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . 79
5 Single image reconstruction using adversarial
training………..
80
5.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Pose and Camera Estimation . . . . . . . . . . . . . . . . 83
5.3 Reprojection Layer . . . . . . . . . . . . . . . . . . . . . . 83
5.4 Critic Network . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . 86
5.7 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.8.1 Quantitative Evaluation on Human3.6M . . . . . . 87
5.8.2 Quantitative Evaluation on MPI-INF-3DHP . . . . 91
5.8.3 Plausibility of the Reconstructions . . . . . . . . . 92
5.8.4 Noisy observations . . . . . . . . . . . . . . . . . . 93
5.8.5 Qualitative Evaluation . . . . . . . . . . . . . . . . 94
5.8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . 94
6 Conclusions ….. 97
Bibliography ….. 101

Keywords: 386910, Bastian Wandt, Fortschritt-Berichte VDI, Reihe 10, Band 869, Uni Hannover, tnt, Human Pose Estimation, 3D Reconstruction, Monocular Cameras, Structure From Motion

52,00 € inkl. MwSt.
VDI-Mitgliedspreis:*
46,80 € inkl. MwSt.

* Der VDI-Mitgliedsrabatt gilt nur für Privatpersonen

This dissertation deals with the problem of capturing human motions and poses using a single camera. The frst part of the thesis proposes two closely related approaches for the 3D reconstruction of human motions from image sequences. To resolve inherent ambiguities in monocular 3D reconstruction the main idea of this part is to exploit temporal properties of human motions in combination with a human body model learned from training data. The second part of the thesis tackles the problem of reconstructing a human pose from a single image. A human body model is learned by training a deep neural network that covers nonlinearities and anthropometric constraints.

C O N T E N T S
1 Introduction …..
1
1.1 Applications and Commercial Systems . . . . . . . . . . . 1
1.2 Image-based Motion Capture . . . . . . . . . . . . . . . . 2
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Time Consistent Human Motion Reconstruction . 6
1.3.2 RepNet . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . 7
1.5 List of Publications . . . . . . . . . . . . . . . . . . . . . . 10
1.5.1 Human Motion Capture . . . . . . . . . . . . . . . 10
1.5.2 Other Publications . . . . . . . . . . . . . . . . . . 13
2 Related work ….. 17
2.1 Non-rigid Structure-from-Motion . . . . . . . . . . . . . . 17
2.2 Single Image Approaches . . . . . . . . . . . . . . . . . . 18
2.2.1 Reprojection Error Optimization . . . . . . . . . . 19
2.2.2 Direct Inference using Neural Networks . . . . . . 19
2.3 Time Consistent Human Motion Capture . . . . . . . . . 20
3 Fundamentals ….. 22
3.1 Camera Models . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1 Projective Transformations . . . . . . . . . . . . . 23
3.1.2 Intrinsic Parameters . . . . . . . . . . . . . . . . . 24
3.1.3 Extrinsic Parameters . . . . . . . . . . . . . . . . . 25
3.1.4 Simplified Camera Models . . . . . . . . . . . . . . 26
3.2 Human Pose Representations . . . . . . . . . . . . . . . . 28
3.2.1 Coordinate-based Representations . . . . . . . . . 28
3.2.2 Surface Mesh-based Representations . . . . . . . . 30
3.2.3 Subspaces of Human Poses . . . . . . . . . . . . . 31
3.3 Non-Rigid Structure from Motion . . . . . . . . . . . . . . 33
3.4 Error Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Exploiting temporal properties ….. 40
4.1 Periodic and Non-periodic Constraints . . . . . . . . . . . 41
4.1.1 Factorization model . . . . . . . . . . . . . . . . . 44
4.1.2 Camera Parameter Estimation . . . . . . . . . . . 45
4.1.3 Periodic Motion . . . . . . . . . . . . . . . . . . . 47
4.1.4 Non-Periodic Motion . . . . . . . . . . . . . . . . . 48
4.1.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.6 Experimental Results . . . . . . . . . . . . . . . . 51
4.1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . 63
4.2 A Novel Kinematic Chain Space . . . . . . . . . . . . . . 65
4.2.1 Estimating Camera and Shape . . . . . . . . . . . 66

4.2.2 Kinematic Chain Space . . . . . . . . . . . . . . . 67
4.2.3 Trace Norm Constraint . . . . . . . . . . . . . . . 68
4.2.4 Camera . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.5 Algorithm . . . . . . . . . . . . . . . . . . . . . . . 71
4.2.6 Experiments . . . . . . . . . . . . . . . . . . . . . 71
4.2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . 79
5 Single image reconstruction using adversarial
training………..
80
5.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Pose and Camera Estimation . . . . . . . . . . . . . . . . 83
5.3 Reprojection Layer . . . . . . . . . . . . . . . . . . . . . . 83
5.4 Critic Network . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . 86
5.7 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.8.1 Quantitative Evaluation on Human3.6M . . . . . . 87
5.8.2 Quantitative Evaluation on MPI-INF-3DHP . . . . 91
5.8.3 Plausibility of the Reconstructions . . . . . . . . . 92
5.8.4 Noisy observations . . . . . . . . . . . . . . . . . . 93
5.8.5 Qualitative Evaluation . . . . . . . . . . . . . . . . 94
5.8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . 94
6 Conclusions ….. 97
Bibliography ….. 101

", "genre": "Informatik/Kommunikationstechnik", "isbn": "978-3-18-386910-7", "name": "Human Pose Estimation from Monocular Images", "numberOfPages": "130", "publisher": { "@type": "Organization", "name": "VDI Verlag GmbH", "url" : "https://www.vdi-verlag.de" }, "author" : { "@type" : "Person", "name" : "Bastian Wandt M. Sc." }, "offers": { "@type": "Offer", "availability": "http://schema.org/InStock", "price": "52,00", "priceCurrency": "EUR" } }