# Exercises

- Submission
- đź‘¤ Individual
- đź“¨ Deliverable 1 - Spy Game [20 pts]
- đź“¨ Deliverable 2 - Well-begun is Half Done [10 pts]
- đź“¨ Deliverable 3 - Feature-based methods for SLAM [10 pts]
- đź“¨ Deliverable 4 [Optional] - Direct methods for SLAM [+5 pts]
- đź“¨ Deliverable 5 [Optional] - From landmark-based SLAM to rotation estimation [+15 pts]

- đź‘Ą Team

# Submission

In this lab, there are 6 deliverables. Deliverables 1 to 5 are individual exercises. Deliverable 6 is a team task which requires programming. To submit your solutions, create a folder called `lab9`

containing required deliverables and push to your repository.

### Individual

Please push the deliverables to your personal repository. For math-related questions, PDFs typeset with LaTeX or other word processing software are required.

### Team

Please push a PDF to the `lab9`

folder of your team repository with the relevant plots and discussion.

### Deadline

**Deadline:** the VNAV staff will clone your repository on **November 8th** at 1pm EST.

# đź‘¤ Individual

## đź“¨ Deliverable 1 - Spy Game [20 pts]

Consider the following spy-style plot of an information matrix (i.e., coefficient matrix in Gauss-Newtonâ€™s normal equations) for a landmark-based SLAM problem where dark cells correspond to non-zero blocks:

Assuming robot poses are stored sequentially, answer the following questions:

- How many robot poses exist in this problem?
- How many landmarks exist in the map?
- How many landmark have been observed by the current (last) pose?
- Which pose has observed the most number of landmark?
- What poses have observed the 2nd landmark?
- Predict the sparsity pattern of the information matrix after marginalizing out the 2nd feature.
- Predict the sparsity pattern of the information matrix after marginalizing out past poses (i.e., only retaining the last pose).
- Marginalizing out which variable (chosen among both poses or landmarks) would preserve the sparsity pattern of the information matrix?
- The following figures illustrate the robot (poses-poses) block of the information matrix obtained after marginalizing out (eliminating) all landmarks in bundle adjustment in two different datasets. What can you say about these datasets (e.g., was robot exploring a large building? Or perhaps it was surveying a small room? etc) given the spy images below?

## đź“¨ Deliverable 2 - Well-begun is Half Done [10 pts]

Pose graph optimization is a non-convex problem. Therefore, iterative solvers require a (good) initial guess to converge to the right solution. Typically, one initializes nonlinear solvers (e.g., Gauss-Newton) from the odometric estimate obtained by setting the first pose to the identity and chaining the odometric measurements in the pose graph.

Considering that chaining more relative pose measurements (either odometry or loop closures) accumulates more noise (and provides worse initialization), propose a more accurate initialization method that also sets the first pose to the identity but chains measurements in the pose graph in a more effective way. A 1-sentence description and rationale for the proposed approach suffices.

Hint: consider a graph with an arbitrary topology. Also, think about the problem as a graph, where you fix the â€śrootâ€ť node and initialize each node by chaining the edges from the root to that node..

## đź“¨ Deliverable 3 - Feature-based methods for SLAM [10 pts]

Read the ORB-SLAM paper (available here) and answer the following questions:

- Provide a 1 sentence description of each module used by ORB-SLAM (Fig. 1 in the paper can be a good starting point).
- Consider the case in which the place recognition module provides an incorrect loop closure. How does ORB-SLAM check that each loop closure is correct? What happens if an incorrect loop closure is included in the pose-graph optimization module?

## đź“¨ Deliverable 4 [Optional] - Direct methods for SLAM [+5 pts]

Read the LSD-SLAM paper (available here, see also the introduction below before reading the paper) and answer the following questions:

- Provide a 1 sentence description of each module used by LSD-SLAM and outline similarities and differences with respect to ORB-SLAM.
- Which approach (between feature-based or direct) is expected to be more robust to changes in illumination or occlusions? Motivate your answer.

LSD-SLAM is a *direct method* for SLAM, as opposed to *feature-based* methods we have worked with previously. As you know, feature-based methods detect and match features (keypoints) in each image and then use 2-view geometry (and possibly bundle adjustment) to estimate the motion of the robot. Direct methods are different in the fact that they do not extract features, but can be easily understood using the material presented in class.

In particular, the main difference is the way the 2-view geometry is solved. In feature-based approaches one uses RANSAC and a minimal solver (e.g., the 5-point method) to infer the motion from feature correspondences. In direct methods, instead, one tries to estimate the relative pose between consecutive frames by minimizing directly the mismatch of the pixel intensities between two images:

\[E(\xi) = \sum_i (I\_{ref}(p_i) - I(\omega (p_i, D\_{ref}(p_i), \xi)))^2 = r_i(\xi)^2\]Where the objective looks for a pose $\xi$ (between the last frame $I_{ref}$ and the current frame $I$ that minimizes the mismatch between the intensity $I_{ref}(p_i)$ at pixel $p_i$ for each pixel $p_i$ in the image, and intensity of the corresponding pixel in the current image $I$. How do we retrieve the pixel corresponding to $p_i$ in the current image $I$? In other words, what is this term?:

\[I(\omega (p_i, D\_{ref}(p_i), \xi))\]It seems mysterious, but itâ€™s nothing new: this simply represents a perspective projection. More specifically, given a pixel $p_i$, if we know the corresponding depth $D_{ref}(p_i)$ we can get a 3D point that we can then project to the current camera as a function of the relative pose. The $\omega$ is typically called a **warp function** since it â€śwarpsâ€ť a pixel in the previous frame $I_{ref}$ into a pixel at the current frame $I$. What is the catch? Well.. the depth $D_{ref}$ is unknown in practice, so you can only optimize $E(\xi)$ if at a previous step you have some triangulation of the points in the image. In direct methods, therefore these is typically an â€śinitialization stepâ€ť: you use feature-based methods to estimate the poses of the first few frames and triangulate the corresponding points, and then you can use the optimization of $E(\xi)$ to estimate later poses. The objective function $E(\xi)$ is called the *photometric error*, which quantifies the difference in the pixel appearance in consecutive frames.

**NOTE**: LDSO (Direct Sparse Odometry with Loop Closures) which you are going to use in Part 2 of this handout is simply an evolution of LSD-SLAM (by the same authors). We are suggesting you to read the LSD-SLAM paper since it provides a simpler introduction to direct methods, while we decided to use LDSO for the experimental part since it comes with a more recent implementation.

## đź“¨ Deliverable 5 [Optional] - From landmark-based SLAM to rotation estimation [+15 pts]

Consider the following landmark-based SLAM problem:

\[\min_{t_i \in \mathbb{R}^3,\ R_i \in \SO{3},\ p_i \in \mathbb{R}^3} \sum_{(i,k) \in \mathcal{E}_l} \| R_i^T(p_k - t_i) - \bar{p}_{ik} \|_2^2 + \sum_{(i,j) \in \mathcal{E}_o} \|R_i^T(t_j - t_i) - \bar{t}_{ij}\|_2^2 + \|R_j - R_i\bar{R}_{ij}\|_F^2\]Where the goal is to compute the poses of the robot $(t_i, R_i),\ i=1,\ldots,N$ and the positions of point-landmarks $p_k, k= 1, \ldots, M$ given odometric measurements $(\bar{t}_{ij}, \bar{R}_{ij})$ for each odometric edge $(i,j) \in \mathcal{E}_o$ (here $\mathcal{E}_o$ denotes the set of odometric edges), and landmark observations $\bar{p}_{ik}$ of landmark $k$ from pose $i$ for each observation edge $(i,k) \in \mathcal{E}_l$ (here $\mathcal{E}_l$ denotes the set of pose-landmark edges).

- Prove the following claim: â€śThe optimization problem (1) can be rewritten as a nonlinear optimization over the rotations $R_i,\ i=1,\ldots,N$ only.â€ť Provide an expression of the resulting rotation-only problem to support the proof.
- Hint: i) Euclidean norm is invariant under rotations, and (ii) translations/positions variables appear â€¦! Consider also using a compact matrix notation to rewrite the sums in the cost function otherwise it will be tough to get an expression of the rotation-only problem.

- The elimination of variables discussed at the previous point largely reduces the size of the optimization problem (from 6N+3L variables to 3N variables).However, the rotation problem is not necessarily faster to solve. Discuss what can make the rotation-only problem more computationally-demanding to solve.
- Hint: What property makes optimization-based SLAM algorithms fast to solve?

# đź‘Ą Team

## Install Docker Engine

Follow the installation instructions listed here. Specifically, you should follow the instructions listed in the â€śInstall Using the Repositoryâ€ť section.

## Building Docker Images

We are going to build two open source SLAM approaches. The first is ORB-SLAM3, a more modern version of the system described in the ORB-SLAM paper. The second is Kimera-VIO.

## Building Containers

The next steps will take a while (each docker image will take at least 10+ minutes to build)! Do NOT try and build both containers at once!

First, weâ€™ll build ORB_SLAM3:

```
cd /path/to/lab9
sudo docker build -f- -t orbslam:latest . < Dockerfile_ORBSLAM3
```

Next, weâ€™ll build Kimera:

```
sudo docker build -f- -t kimera:latest . < Dockerfile_KIMERA
```

## Prepare the Dataset

You should have the `MH_01_easy`

sequence of the EuRoC dataset downloaded already from the last lab, but if you havenâ€™t, download it from here and place it under the path `/home/$USER/datasets/vnav/`

(i.e., such that `/home/$USER/datasets/vnav/MH_01_easy`

exists).

## Running the comparison

First, you can run ORB-SLAM3 via

```
./run_docker.sh orbslam:latest
```

You should see something like this:

Wait for ORB-SLAM3 to finish running the entire sequence (it will exit automatically when finished). You should now see a file named `CameraTrajectory.txt`

under `output/orbslam`

.

Next, you can run Kimera via

```
./run_docker.sh kimera:latest
```

You should see something like this:

Wait for Kimera to finish running the entire sequence (it will exit automatically when finished). You should now see several files under `output/kimera`

. We specifically care about `output/kimera/traj_vio.csv`

.

## Running the MH_01_easy Sequence

You will not receive full credit unless you run Kimera and ORB-SLAM3 against the full MH_01_easy sequence.

## đź“¨ Deliverable 6 - Performance Comparison [60 pts]

Install evo, a python toolbox for evaluating the quality of trajectory estimates:

```
python3 -m pip install --user evo
```

You may find readme and wiki of evo useful.

First, we need to fix the timestamps on the ORB-SLAM3 trajectory file. Run the `fix_timestamps.py`

script to convert the ORB-SLAM3 timestamps to nanoseconds (and to convert the Kimera-VIO trajectory to the right format).

Now you can use the `evo_traj`

tool to plot both trajectories for `MH_01_easy`

. Add this plot to your report and comment on the differences that you see between trajectories:

```
evo_traj tum output/kimera/kimera.txt output/orbslam/orb_slam3.txt --plot
```

In general, different SLAM systems may use different conventions for the world frame, so we have to align the trajectories in order to compare them. This amounts to solving for the optimal (in the sense of having the *best fit* to the ground truth) $\SE{3}$ transformation of the estimated trajectory. Fortunately, tools like `evo`

provide implementations that do this for us, so weâ€™re not going to ask you to implement the alignment process in this lab. Letâ€™s compare to the ground-truth pose data from the EuRoC dataset next, taking into consideration this pose alignment.

First, convert the ground-truth trajectory for the `MH_01_easy`

sequence to the right format:

```
evo_traj euroc ~/datasets/vnav/MH_01_easy/mav0/state_groundtruth_estimate0/data.csv --save_as_tum
```

Next, run:

```
evo_traj tum output/kimera/kimera.txt output/orbslam/orb_slam3.txt --ref data.tum --plot --align
```

Add this plot to your report and comment on the differences that you see between trajectories.

## đź“¨ [Optional] Deliverable 7 - LDSO [+10 pts]

Based on LSD-SLAM, LDSO is a direct (and sparse) formulation for SLAM. It can also handle loop closures (a limitation of the original LSD-SLAM).

Install the code for LDSO by following the `Readme.md`

in the GitHub repository.

In the same `Readme.md`

there is an example on how to run this pipeline for the EuRoC dataset:

```
cd ~/path/to/LDSO
./bin/run_dso_euroc preset=0 files=$HOME/datasets/vnav/MH_01_easy/mav0/cam0
```

At this point, you should be able to see the following window:

LDSO will by default output the results in `results.txt`

in the same folder where you ran the command. You can also specify the output folder by setting the `output`

flag when running DSO. The results file will only be generated once the sequence has been processed.

Note that the output from LDSO might be formatted incorrectly (e.g., spurious backspaces).

Plot the trajectory produced by LDSO with the trajectories from Kimera and ORB-SLAM3. Briefly comment on any differences in the trajectories.

Note that `LDSO`

is a monocular method and there is no way of recovering the true scale of the scene. You will want to use the `--correct_scale`

flag for `evo`

to estimate not just the $\SE{3}$ alignment of the predicted trajectory to the ground-truth trajectory, but also the global scale factor.

Hereâ€™s an example of the aligned trajectories plotted in the x-y plane (you can add the flag `--plot_mode xy`

to compare with this plot):

## Summary of Team Deliverables

- Comparison plots of Kimera and ORB-SLAM3 on
`MH_01_easy`

with comments on differences in trajectories - Plot of Kimera and ORB-SLAM3 trajectories on
`MH_01_easy`

aligned with ground-truth with comments on any differences in the trajectories - [Optional] Plot of Kimera + ORB-SLAM3 + LDSO (aligned and scale corrected with ground-truth) with comments on any differences in trajectories