PoseNet-Mobile-Robot.github.io/index.html at master · PoseNet-Mobile-Robot/PoseNet-Mobile-Robot.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
<!DOCTYPE html>
<html lang="en" class="wf-firasans-n4-active wf-active">
	<head>

    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <!-- Enable responsiveness on mobile devices -->
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1">
    <meta name="description" content="DeepLearning in SLAM">
    <meta name="generator" content="Hugo 0.37-DEV" />
    <title>PoseNet++</title>
    <!-- CSS -->
    <link rel="stylesheet" href="css/print.css" media="print">
    <link rel="stylesheet" href="css/poole.css">
    <link rel="stylesheet" href="css/hyde.css">
    <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Fira+Sans:300,300i,400,400i,500">

    <script defer src="https://use.fontawesome.com/releases/v5.0.6/js/all.js"></script>
    <!-- highlight.js-->
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/ocean.min.css">
    <!-- Customised CSS -->
    <link rel="stylesheet" href="css/custom.css">

    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
    <!-- Icons -->

    <link rel="apple-touch-icon-precomposed" sizes="144x144" href="etc/apple-touch-icon-144-precomposed.png">
    <link rel="shortcut icon" href="etc/favicon.png">

	</head>
    <body>
        <div class="sidebar">
	<div class="container text-center sidebar-sticky">
		<div class="sidebar-about text-center">
			<a href="https://posenet-mobile-robot.github.io/"><h1 class="brand">PoseNet++</h1></a>
			<p class="lead">
				 Deep Learning, SLAM, GTSAM, VSfM
			</p>
		</div>

<div>
	<ul class="sidebar-nav" style="text-align: left; margin-left: 80px">


				<li><a href="#intro"> <span>Introduction</span></a></li>
                <li><a href="#pn"><span>PoseNet</span></a></li>
                <li><a href="#vsfm"><span>Visual SfM</span></a></li>
                <li><a href="#gts"><span>GTSAM</span></a></li>
				<li><a href="#team"> <span>Team</span></a></li>
                <li><a href="etc/Team 16 - ROB 530.pdf"> <span>Report</span></a></li>
                <li><a href="https://github.com/PoseNet-Mobile-Robot/Mobile-Robotics"> <span>Code</span></a></li>
		</li>
	</ul>
</div>

        <p>
		<section class="row text-center">

<!-- 	<a href="#"><i class="fab fa-twitter fa-lg" aria-hidden="true" ></i></a>
	&nbsp;<a href="#"><i class="fab fa-github fa-lg" aria-hidden="true"></i></a>
	&nbsp;<a href="#"><i class="fab fa-bitbucket fa-lg" aria-hidden="true"></i></a>
    &nbsp;<a href="#"><i class="fab fa-linkedin fa-lg" aria-hidden="true" title="Sahib Dhanjal"></i></a>
    &nbsp;<a href="#"><i class="fab fa-stack-overflow fa-lg" aria-hidden="true"></i></a> -->


	<!-- &nbsp;<a href="#"><i class="fas fa-at fa-lg" aria-hidden="true"></i></a> -->

</section>

        </p>
		<p class="copyright">&copy; 2018 UMich.
        <a href="https://creativecommons.org/licenses/by/4.0">All Rights Reserved</a>.<br/>Built by <a href="https://sahibdhanjal.github.io">Sahib Dhanjal</a> | Template - <a href="https://gohugo.io/">Hugo</a>.
        </p>
	</div>
	<div>
	</div>

    </div>

        <div class="content container">
            <div class="posts">

        <div class="post">
        <h1 class="post-title"><a href="#" id="intro">Introduction</a></h1>
        <span class="post-date">
            <i class="fas fa-calendar-alt"></i> Apr 14, 2018
        </span>
        <p>This project outlines our implementation of <strong>PoseNet++</strong>, a deep learning framework for passive SLAM. It is a full robot re-localization pipeline which uses <strong>PoseNet</strong> as the <strong>sensor model</strong>, <strong>GPS/Odometry Data</strong> as the <strong>action model</strong> and <strong>GTSAM</strong> as the <strong>backend</strong> to generate the trajectory of the robot <em>(and subsequently the map of the environment)</em>. The framework is able to re-localize the robot within 2m of it's ground truth in medium scale environments. Our aim in this project is to demonstrate that using <strong>Convolutional Neural Networks (CNNs)</strong> in combination with a mono-camera system, provides a viable solution for the kidnapped robot problem and an accurate solution to the SLAM problem in general.</p>
        <p>To run the code from the repository, the following dependencies are required:
            <ul>
                <li><a href="https://www.python.org/download/releases/2.7/">Python 2.7 (with TensorFlow/ Open CV 3/ MatPlotlib/ NumPy)</a></li>
                <li><a href="https://www.cc.gatech.edu/grads/j/jdong37/files/gtsam-tutorial.pdf">GTSAM 4.0</a></li>
                <li><a href="http://ccwu.me/vsfm/">Visual SFM Tool</a></li>
                <li><a href="https://play.google.com/store/apps/details?id=com.pas.webcam&hl=en_US">IP WebCam</a></li>
            </ul>
            Please follow respective documentations for correct installation of the above. The code is tested on the following datasets:
            <ul>
                <li><a href="http://robots.engin.umich.edu/nclt/">North Campus Long Term <strong>(NCLT Dataset)</strong></a></li>
                <li><a href="http://mi.eng.cam.ac.uk/projects/relocalisation/">University of Cambridge Dataset <strong>(Shop Facade/ King's College)</strong></a></li>
                <li>Fetch Robot Dataset <strong>(provided by <a href="http://robots.engin.umich.edu/">PeRL Lab</a>)</strong></li>
            </ul>
        </p>
        </div>

        <div class="post">
        <h1 class="post-title"><a href="https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Kendall_PoseNet_A_Convolutional_ICCV_2015_paper.pdf"  id="pn">PoseNet</a></h1>
        <span class="post-date">
            <i class="fas fa-calendar-alt"></i> Apr 14, 2018
        </span>
        <p>PoseNet is a Deep Learning Framework which can regress the 6 DOF pose of a monocular camera from a single RGB Image in an end-to-end manner with no additional engineering or graph optimization. It can operate both indoor and outdoor in real time, and takes about 5ms to compute a frame. The original implementation of PoseNet uses <a href="https://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf">GoogLeNet</a> from the ImageNet Challenge and obtains approximately 2m and 3&#176 accuracy for large scale outdoor scenes and 0.5m and 5&#176 accuracy indoors. It leverages transfer learning from large scale classification data and is robust in several conditions such as difficult lighting, motion blur, different camera intrinsics and even the same scene captured in different seasons(eg. Winter, Summer, Monsoon). Due to this robustness in different conditions, it has an advantage over SIFT based feature registration. </p>
        <p>Our implementation of PoseNet is based on the <a href="https://arxiv.org/pdf/1409.1556.pdf">VGG-16</a> Network, mainly due to the ease of implementation. The accuracy obtained using the VGG-16 is almost the same as GoogLeNet, with a Top-5 error rate of 7.3% compared to 6.67% (on the GoogLeNet). The only major drawback is that training and testing on VGG-16 is slower due to the number of parameters involved. The only change made to the network is that we replace the last SoftMax layer to a Fully Connected Regression Layer. The loss function is given as follows:
            <img src="etc/lossfunc.jpg" style="height: 100px">

            The <strong>x</strong> denotes the position in the cartesian frame, whereas, <strong>q</strong> denotes the orientation. The orientation is taken in <strong>quaternion</strong> for the <strong><em>University of Cambridge</em></strong> and <strong><em>Fetch Dataset</em></strong>, and <strong>euler angles</strong> for the <strong><em>NCLT dataset</em></strong>. The procedure followed for training is:
            <ul>
                <li>Scale Down each of the <strong>1920 x 1080</strong> images till the smallest dimension is <strong>256px</strong></li>
                <li>Make <strong>128</strong> random crops of size <strong>224 x 224</strong> using the scaled down image</li>
                <li>Associate corresponding <strong>ground-truth labels</strong> (pose/ quaternion) to each of the images</li>
                <li>The labels are in the following format - <strong>pose</strong>: (x, y, z) , <strong>orientation</strong>: (w, x, y, z)</li>
            </ul>
        once all the images are generated, with it's corresponding ground-truth labels, they are fed to PoseNet for training.
        </p>
        <p>For testing purposes, no pre-processing needs to be done on the images. The original frequency of the images from the NCLT dataset is <strong>5Hz</strong>. To obtain real-time performance, we down-sample the images to a frequency of <strong>2Hz</strong>. Once downsampled, the corresponding <strong>odometry labels</strong> are obtained from the dataset. The ground-truth labels are generated using PoseNet and are added to GTSAM as a <em>"measurement factor"</em>. The odometry labels from the dataset (or VSFM when used) are added as an <em>"odometry factor"</em> to GTSAM.
        </p>
        </div>

        <div class="post">
        <h1 class="post-title"><a href="http://ccwu.me/vsfm/vsfm.pdf" id="vsfm">Visual Structure from Motion (VSfM)</a></h1>
        <span class="post-date">
            <i class="fas fa-calendar-alt"></i> Apr 14, 2018
        </span>
        <p>
        Visual SfM is a technique used in Photogrammetry used to generate the pose of a camera from an image of the scene. The basic approach of Visual SfM is as follows:
        <ul>
            <li>Take images of a scene from multiple angles/ positions</li>
            <li>Generate all the SIFT feature points for each of the taken images</li>
            <li>Match all the images using the generated feature points</li>
            <li>Generate the <strong><em>Essential Matrix</em></strong> using which further <strong>6-DOF poses</strong> can be regressed.</li>
        </ul>
        For our use case, we directly use one of the <a href="http://ccwu.me/vsfm/vsfm.pdf">available tools</a> to generate the labels. We use the Fetch Robot DataSet for generation of the ground-truth labels to train PoseNet. The odometry is obtained from the robot itself.
        </p>
        </div>

        <div class="post">
        <h1 class="post-title"><a href="https://borg.cc.gatech.edu/"  id="gts">GTSAM 4.0</a></h1>
        <span class="post-date">
            <i class="fas fa-calendar-alt"></i> Apr 14, 2018
        </span>
        <p>The <strong>Georgia Tech Smoothing and Mapping (GTSAM)</strong> is an opensource <srong>SAM toolbox</srong> written in C++, with wrappers for Pyhton/ MATLAB. The classes implement smoothing and mapping (SAM) in robotics and vision, using factor graphs and Bayes networks as the underlying computing paradigm rather than sparse matrices. Factor graphs are graphical models that are well suited to modeling complex estimation problems, such as <strong>Simultaneous Localization and Mapping (SLAM)</strong> or <strong>Structure from Motion (SfM)</strong>.  They are bipartite graphs consisting of factors connected to variables. The variables represent the unknown random variables in the estimation problem, whereas the factors represent probabilistic information on those variables, derived from measurements or prior knowledge. Bayes networks are another type of graphical models which are directed acyclic graphs.  </p>
        <p>GTSAM provides state of the art solutions to the SLAM and SfM problems, but can also be used to model and solve both simpler and more complex estimation problems. It exploits sparsity to be computationally efficient. Typically measurements only provide information on the relationship between a handful of variables, and hence the resulting factor graph will be sparsely connected. This is exploited by the algorithms implemented in GTSAM to reduce computational complexity. Even when graphs are too dense to be handled efficiently by direct methods, GTSAM provides iterative methods that are quite efficient regardless. In our project we use incremental smoothing and mapping using the Bayes tree (iSAM2) as our back-end optimization tool. Based on Bayes Tree structure, the first order least square matrix can be updated efficiently according to the factor dependencies.</p>
        <p>In our project, as discussed previously, we use the odometry factor from the dataset<em>(or robot)</em> and the PoseNet factor in GTSAM for mapping the result obtained in real-time.</p>
        </div>

        <div class="post">
            <h1 class="post-title" >Team</h1>
            <span class="post-date" id="team">
                <i class="fas fa-calendar-alt"></i> Apr 14, 2018
            </span>
            The team working on this project comprises of:
            <ul>
                <li>
                    Christopher Schmotzer
                </li>

                <li>
                    Dante Luo
                    &nbsp;<a href="https://github.com/DanteLuo"><i class="fab fa-github fa-lg" aria-hidden="true" title="GitHub"></i></a>
                </li>

                <li>
                    Ray Zhang
                    &nbsp;<a href="https://github.com/zeroAska"><i class="fab fa-github fa-lg" aria-hidden="true" title="GitHub"></i></a>
                </li>

                <li>
                    Sahib Dhanjal
                    <a href="https://www.linkedin.com/in/sahibdhanjal/"><i class="fab fa-linkedin fa-lg" aria-hidden="true" title="LinkedIn"></i></a>
                    &nbsp;<a href="https://github.com/sahibdhanjal"><i class="fab fa-github fa-lg" aria-hidden="true" title="GitHub"></i></a>
                </li>

                <li>
                    Snigdhaa Hasija
                    <a href="https://www.linkedin.com/in/snigdhaahasija/"><i class="fab fa-linkedin fa-lg" aria-hidden="true" title="LinkedIn"></i></a>
                    &nbsp;<a href="https://github.com/snigdhaahasija"><i class="fab fa-github fa-lg" aria-hidden="true" title="GitHub"></i></a>
                </li>
            </ul>
        </div>

        </div>
        <div class="footer">
        </div>

    </div>


<script>
    </body>
</html>