Exercises
- Submission
- 👤 Individual
- đź‘Ą Team
Submission
In this Lab, there are 5 deliverables throughout the handout. Deliverables 1 and 2 will require pen and paper and are considered individual tasks, while Deliverable 3-5 are a team task which requires coding in the lab6 directory that we will provide.
Individual
Please submit the individual deliverables to Gradescope. For math-related questions LaTeX (or other typesetting software) is required.
Team
Please push the source code of the entire lab6
package in the folder lab6
of the team repository. Include also in your lab6
folder a PDF containing non-code deliverables (plots, comments).
Deadline
Deadline: the VNAV staff will clone your repository on October 16 at 1PM EDT.
👤 Individual
📨 Deliverable 1 - Nister’s 5-point Algorithm [20 pts]
Read the paper and answer the questions below
Read the following paper.
[1] Nistér, David. “An efficient solution to the five-point relative pose problem.” 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Vol. 2. 2003. link here.
Questions:
- Outline the main computational steps required to get the relative pose estimate (up to scale) in Nister’s 5-point algorithm.
- Does the 5-point algorithm exhibit any degeneracy? (degeneracy = special arrangements of the 3D points or the camera poses under which the algorithm fails)
- When used within RANSAC, what is the expected number of iterations the 5-point algorithm requires to find an outlier-free set?
- Hint: take same assumptions of the lecture notes
📨 Deliverable 2 - Designing a Minimal Solver [15 pts]
Can you do better than Nister? Nister’s method is a minimal solver since it uses 5 point correspondences to compute the 5 degrees of freedom that define the relative pose (up to scale) between the two cameras (recall: each point induces a scalar equation). In the presence of external information (e.g., data from other sensors), we may be able use less point correspondences to compute the relative pose.
Consider a drone flying in an unknown environment, and equipped with a camera and an Inertial Measurement Unit (IMU). We want to use the feature correspondences extracted in the images captured at two consecutive time instants $t_1$ and $t_2$ to estimate the relative pose (up to scale) between the pose at time $t_1$ and the pose at time $t_2$. Besides the camera, we can use the IMU (and in particular the gyroscopes in the IMU) to estimate the relative rotation between the pose of the camera at time $t_1$ and $t_2$.
You are required to solve the following problems:
- Assume the relative camera rotation between time and is known from the IMU. Design a minimal solver that computes the remaining degrees of freedom of the relative pose.
- Hint: we only want to compute the pose up to scale
- OPTIONAL (5 bonus pts): Describe the pseudo-code of a RANSAC algorithm using the minimal solver developed in point a) to compute the relative pose in presence of outliers (wrong correspondences).
đź‘Ą Team
In this section, we will estimate the motion of a (simulated) flying drone in real time and compare the performances of different algorithms.
For the algorithms, we will be using the implementations provided in the OpenGV library (note: OpenGV).
For the datasets, we will use pre-recorded rosbag
files of our simulated drone flying in an indoor environment.
Additionally, for motion estimation:
- We will only focus on two-view (vs multi-camera) pose estimation. In OpenGV, we refer to two-view problems as “Central” (vs “Non-Central”) relative pose problems.
- We will focus only on the calibrated case, where the intrinsics matrix K is given, and we assume that the images are rectified (distortion removed) using the parameters that you estimated previously.
Getting started: code base and datasets
Prerequisites: Lab 6 will use the feature matching algorithms developed in Lab 5 (in particular, we use SIFT matching), so make sure you have a working version of Lab 5 already in the VNAV workspace.
Prepare the code base: Use
git pull
to update the git repo used to distribute lab codes https://github.com/MIT-SPARK/VNAV-labs, and you should see a new folder namedlab6
. This lab also requires code from Lab 5, so you need a colcon workspace with both your Lab 5 code and this Lab 6 code. It is easiest if you copy thelab6
folder that you just pulled into the same colcon workspace that you used for Lab 5.You also need a small update to the
CMakeLists.txt
file in Lab 5. Replace the Lab 5 CMakeLists.txt in your team repository with the one from the updated Lab 5 stencil code.We also need to install OpenGV. In your colcon workspace, clone
https://github.com/MIT-SPARK/opengv.git
. Runcolcon build
and make sure that OpenGV and the stencil code build successfully.Download the datasets: We will use the following dataset for this lab:
vnav-lab6-office
and you can download it here.
After downloading the dataset you will need to unzip it.
The rosbag files include the following topics of the drone:
- Ground-truth pose estimate of the drone’s body frame:
/tesse/odom
- RGB image from the left-front camera of the drone:
/tesse/left_cam/rgb/image_raw
- Depth image:
/tesse/depth_cam/mono/image_raw
You can play these datasets by running:
ros2 bag play <path to bag>
while in parallel open RVIZ by:
rviz2 -d <path_to_lab_6>/config/default.rviz
You should see on the left the RGB Image and the Depth image.
Let’s perform motion estimation!
We will use two methods to estimate the motion of the drone:
- Motion estimation from 2D-2D correspondences (Deliverable 4)
- Motion estimation from 3D-3D correspondences (Deliverable 5)
In Deliverable 4, we will perform motion estimation only using 2D RGB images taken from the drone’s camera, while in Deliverable 5, we will additionally use the depth measurements to get the sense of 3D.
NOTE:
- All your main implementations of the motion estimation algorithms should be in the
pose_estimation.cpp
file. In the file, we have also provided many comments to help your implementation, so please go through the comments in details. - For this lab, we provide a number of useful utility functions in
lab6_utils.h
. You do not need to use these functions to complete the assignment, but they might help save you some time and frustration.
📨 Deliverable 3 - Initial Setup [5 pts]
Before we go to motion estimation, an important task is to calibrate the camera of the drone, i.e., to obtain the camera intrinsics and distortion coefficients. Normally you would need to calibrate the camera yourself offline to obtain the parameters.
However, in this lab the camera that the drone is equipped with has been calibrated already, and calibration information is provided to you! (If you are curious about how to calibrate a camera, feel free to check this OpenCV tutorial)
As part of the starter code, we provide a function calibrateKeypoints
to calibrate and undistort the keypoints. Make sure you use this function to calibrate the keypoints before passing them to RANSAC.
📨 Deliverable 4 - 2D-2D Correspondences [45 pts]
Given a set of keypoint correspondences in a pair of images (2D - 2D image correspondences), as computed in the previous lab 5, we can use 2-view (geometric verification) algorithms to estimate the relative pose (up to scale) from one viewpoint to another.
To do so, we will be using three different algorithms and comparing their performance.
We will first start with the 5-point algorithm of Nister. Then we will test the 8-point method we have seen in class. Finally, we will test the 2-point method you developed in Deliverable 2. For all techniques, we use the feature matching code we developed in Lab 5. In particular, we use SIFT for feature matching in the remaining of this problem set.
We provide you with a skeleton code in lab6
folder where we have set-up ROS callbacks to receive the necessary information.
We ask you to complete the code inside the following functions:
1. cameraCallback
: this is the main function for this lab.
Inside, you will have to use three different algorithms to estimate the relative pose from frame to frame:
- OpenGV’s the 5-point algorithm with RANSAC (see OpenGV API)
- OpenGV’s 8-point algorithm by Longuet-Higgins with RANSAC
- OpenGV’s 2-point algorithm with RANSAC. This algorithm requires you to provide the relative rotation between pairs of frames. This is usually done by integrating the IMU’s gyroscope measurements. Nevertheless, for this lab, we will ask you to compute the relative rotation using the ground-truth pose of the drone between both frames.
For each part, follow the comments written in the source code for further details.
We strongly recommend you to take a look at how to use OpenGV functions here.
OPTIONAL (5 bonus pts): if you are curious about how important is to reject outliers via RANSAC, try to use the 5-point method without RANSAC (see OpenGV API), and add the results to the performance evaluation below.
2. evaluateRPE
: evaluating the relative pose estimates
After implementing the relative pose estimation methods, you are required to evaluate their accuracy and plot their errors over time. Since you also have the ground-truth pose of the drone, it is possible to compute the Relative Pose Error (RPE) between your estimated relative pose from frame to frame and the actual ground-truth movement. Follow the equations below and compute the translation and rotation relative errors on the rosbag we provided.
The relative pose error is a metric for investigating the local consistency of a trajectory
RPE compares the relative poses along the estimated and the reference trajectory. Given the ground truth pose $T^W_{ref,t}$ at time $t$ (with respect to the world frame $W$), we can compute the ground truth relative pose between time $t-1$ and $t$ as:
\[ T_{ref,t}^{ref,t-1} = \left(T^W_{ref,t-1}\right)^{-1} T^W_{ref,t} \in \SE{3} \]
Similarly, the 2-view geometry algorithms we test in this lab will provide an estimate for the relative pose between the frame at time $t-1$ and $t$:
\[ T^{est,t-1}_{est,t} \in \SE{3} \]
Therefore, we can compute the mismatch between the ground truth and the estimated relative poses using one of the distances we discussed during lecture.
When using 2D-2D correspondences, the translation is only computed up to scale (and is conventionally returned as a vector with unit norm). so we recommend scaling the corresponding ground truth translation to have unit norm before computing the errors we describe below.
Relative translation error: This is simply the Euclidean distance between the ground truth and the estimated relative translation:
\[ RPE_{t-1,t}^{tran} = \left\Vert \mathrm{trans}\left(T_{ref,t}^{ref,t-1}\right) - \mathrm{trans}\left(T^{est,t-1}_{est,t}\right) \right\Vert_2 \]
where $\mathrm{trans}(\cdot)$ denotes the translation part of a pose.
Relative rotation error: This is the chordal distance between the ground truth and the estimated relative rotation:
\[ RPE_{i,j}^{rot} = \left\Vert \mathrm{rot}\left(T_{ref,t}^{ref,t-1}\right) - \mathrm{rot}\left(T_{est,t}^{est,t-1}\right) \right\Vert_{F} \]
where $\mathrm{rot}(\cdot)$ denotes the rotation part of a pose.
You will need to implement these error metrics, compute them for consecutive frames in the rosbag, and plot them as discussed above.
As a deliverable, provide 2 plots showing the rotation error and the translation error over time for each of the tested techniques (2 plots with 3 lines for the algorithms using RANSAC). You can write the data to a file and do the plotting with Python if you prefer (upload as well the python script if necessary).
3. Publish your relative pose estimate
In order to visualize your relative pose estimate between time $t-1$ and $t$, we postmultiply your estimated relative pose between time $t-1$ and $t$ by the ground truth pose at time $t-1$. This will give you a pose estimate at time $t$ that you can visualize in Rviz. To do so, we use the ground-truth pose of the previous frame (obtained from ROS messages), “plus” the relative pose between current frame and previous frame (obtained from your algorithms, and then scale the translation using ground-truth), to compute the estimated (absolute) pose of the current frame, and then publish it.
To run your code, use:
ros2 launch lab_6 video_tracking.launch.yaml
but be sure to modify the dataset path and parameters to run the correct method! For example, the pose_estimator
parameter determines which algorithm to be used for the motion estimation.
Note that we are cheating in this visualization since we use the ground truth from the previous time stamp. In practice, we cannot concatenate multiple estimates from 2-view geometry since they are up to scale (so for visualization, we use groundtruth to recover the scale).
In the next deliverable we will see that 3D-3D correspondences allow us to reconstruct the correct scale for the translation.
📨 Deliverable 5 - 3D-3D Correspondences [20 pts]
The rosbag we provide you also contains depth values registered with the RGB camera, this means that each pixel location in the RGB camera has an associated depth value in the Depth image.
In this part, we have provided code to scale to bearing vectors to 3D point clouds, and what you need to do is to use Arun’s algorithm (with RANSAC) to compute the drone’s relative pose from frame to frame.
1. cameraCallback
: Implement Arun’s algorithm
Implement Arun’s algorithm in this function. Use the evaluateRPE function you used previously to plot the rotation error and the translation error over time as well. Mind that, in this case, there is no scale ambiguity, therefore we cannot really compare the translation error of this approach against the previous ones. Implement Arun’s algorithm with RANSAC using OpenGV.
To run your code, use:
ros2 launch lab_6 video_tracking.launch.yaml pose_estimator:=3
with the pose_estimator
parameter set to 3
so that Arun’s method is used.
Note that while we can now reconstruct the trajectory by concatenating the relative poses, such a trajectory estimate will quickly diverge due to error accumulation. In future lectures, we will study Visual-Odometry and Loop closure detection as two ways to mitigate the error accumulation.
Performance Expectations
What levels of rotation and translation errors should one expect from using these different algorithms? To set the correct expection, we think the following errors are satisfactory:
- Using 5-point or 8-pt with RANSAC, for most of the frames, you can get rotation error below 1 degree and translation error below 0.5 (note that the translation error is between 0 and 2 since both ground-truth translation and estimated translation have unit norm), with 5-pt algorithm slightly outperforming 8-pt algorithm.
- Using 2-point with RANSAC, for most of the frames, you can get the translation error below 0.1 (note that the translation error is between 0 and 2).
- Using 3-point with RANSAC (3D-3D), for most of the frames, you can get rotation error below 0.1 degree, and translation error below 0.1 (if you normalize the translations), and even smaller if you don’t normalize the translations since the frame rate is very high.
Summary of Team Deliverables
For the given dataset, we require you to run all algorithms on it and compare their performances. Therefore, the Team Deliverables should include plots of translation and rotation error for each of the methods (5pt, 8pt, 2pt, Arun 3 pt) using the given rosbag (using RANSAC is required, while without RANSAC is optional).