Exercises
Deadline
This lab will be cloned from your team repo on November 6th at 1pm EST. It should be submitted in a folder called lab8
.
Don’t forget to do lab9’s individual portion, which will be due at the same time.
đź‘Ą Team
Introduction
Purpose: Use YOLO as a front end object detector to solve a back end estimation problem with GTSAM. Practice developing with ROS2 with less “training wheels”.
Recall, YOLO is a Convolutional Neural Network based on the paper “You Only Look Once: Unified, Real-Time Object Detection”.
In the previous part we setup and used YOLO to detect objects in various images. This time we will use YOLO as a front end to detect an object of interest across multiple frames of a video, and solve for the object’s position with GTSAM by building a factor graph of camera poses with object detections.
The RGB-D TUM dataset is a popular video benchmarking dataset (with ground truth) for various robotic vision applications. We specifically will be using the freiburg3_teddy
sequence. This dataset is great, but it was collected in the early 2010s so a few things are outdated (i.e., their video previews use Flash and the data is in ROS1). Luckily, we updated the data for you! Checkout a video preview of freiburg3_teddy
here. Our goal will be to determine the position of the Teddy Bear!
Although we still provide starter code for this deliverable, you will be given a lot more freedom implementing this deliverable. This is to better simulate how one might use ROS2 in their research (i.e., with less “training wheels”).
Installation & Setup
This lab assumes you already have your VNAV workspace setup, with ROS2, OpenCV, and GTSAM installed from the earlier labs.
Downloading the dataset as a ROS2 bag
- If you don’t already have one, create a data folder within your VNAV workspace:
mkdir -p ~/vnav/data
- Download the
freiburg3_teddy
sequence as a ROS2 bag here. Place it in~/vnav/data
. - Uncompress the data with the following command:
tar xzvf lab8_data.tar.gz
You should now have a folder called
rgbd_dataset_freiburg3_teddy
with the following files:~/vnav/data/rgbd_dataset_freiburg3_teddy ├── metadata.yaml └── rgbd_dataset_freiburg3_teddy.db3
- Try running the bag with:
ros2 bag play -l ~/vnav/data/rgbd_dataset_freiburg3_teddy
and viewing the image data in either
rviz
orrqt
. Note the-l
argument will infinitely repeat the bag replay.
Setup code
- Update the starter code repo:
cd ~/vnav/starter-code git pull
- Copy relevant starter code to your
team-submissions
repo:cp -r ~/vnav/starter-code/lab8 ~/vnav/team-submissions
- Link submission code to your code:
mkdir -p ~/vnav/ws/lab8/src cd ~/vnav/ws/lab8/src ln -s ../../../team-submissions/lab8 lab_8
- Clone and setup the Ultralytics ROS2 package:
cd ~/vnav/ws/lab8/src GIT_LFS_SKIP_SMUDGE=1 git clone -b humble-devel https://github.com/Alpaca-zip/ultralytics_ros.git rosdep install -r -y -i --from-paths . python3 -m pip install -r ultralytics_ros/requirements.txt
- Build the starter code:
cd ~/vnav/ws/lab8 colcon build
- Confirm your
~/.bashrc
now contains the following line to source lab8:source ~/vnav/ws/lab8/install/setup.bash
- Re-source your
~/.bashrc
to ensure lab8 is sourced in your current session:source ~/.bashrc
Running the starter code
- To test our code, we want to play the ROS2 bag and launch YOLO to process frames:
# terminal 1 ros2 bag play -l ~/vnav/data/rgbd_dataset_freiburg3_teddy # terminal 2 ros2 launch ultralytics_ros tracker.launch.xml debug:=true input_topic:=/camera/rgb/image_color
Note that
input_topic:=/camera/rgb/image_color
is telling YOLO to process data on that bag topic. By default, YOLO will send processed frames to the/yolo_image
topic. - Open up
rviz
orrqt
to confirm YOLO is running. Confirm you can see something like this: Note: Depending on your computer hardware, YOLO may not run at the same framerate as the original video. - Run the starter C++ code with:
# terminal 3 ros2 run lab_8 deliverable_1
The starter code should show you how to get camera pose information from
/tf
and pixel information about YOLO’s “Teddy Bear” detections. This will be needed for your implmentation!
📨 Deliverable 1 - Object Localization [45 pts]
Our goal for this exercise is to localize the teddy bear that is at the center of the scene in the freiburg3_teddy
dataset. To do so, we will use YOLO detections to know where the teddy bear is. With the bounding box of the teddy bear, we can calculate a crude approximation of the bear’s 3D position by using the center pixel of the bounding box. If we accumulate enough 2D measurements, we can formulate a least-squares problem in GTSAM to triangulate the 3D position of the teddy bear.
For that, we will need to perform the following steps:
- Examine the C++ starter code to understand how we are (i) getting the ground-truth transformation of the camera with respect to the world from the
tf
topic and (ii) getting the center of the Teddy Bear bounding box from YOLO. - Formulate a GTSAM problem where we are trying to estimate the 3D position of the center pixel in the bounding box. You will need to use multiple
GenericProjectionFactors
in order to fully constrain the 3D position of the teddy bear and estimate the 3D position of the teddy bear. (Recall the GTSAM exercise where you performed a toy-example of Bundle Adjustment problem and use the same factors to build the problem.) Note that now, the poses of the camera are given to you as ground-truth information, so you might want to this as priors instead of as odometry. Tip: If you need it, the bag file has camera calibration info in thecamera_info
topics. - Solve the problem in GTSAM. (Consider repurposing code from
lab_7
!) - In
rviz
, plot the (i) estimated 3D position of the teddy bear, (ii) the trajectory of the camera, and (iii) which frames got a good detection of the teddy bear. For instance, yourrviz
should look something like this:
where the green line is (i), purple sphere is (ii), and the red arrows are (iii).
- (Optional) Extra Credit (10 pts) Calculate the covariance of the teddy bear estimate, place it in a
geometry_msgs::PoseWithCovariance
message, and plot it inrviz
using the size of the sphere to represent the covariance.
What to submit
To evaluate this deliverable, we will examine your implementation, but will not focus on the end result (although it will count). Instead we ask you add a PDF to your team submission repo called lab8-team-writeup.pdf
with the following information:
- A small summary of your design choices and considerations taken in order to solve this problem.
- Your “best” estimate of the position of the teddy bear as
geometry_msgs::PointStamped
message. - A screenshot of
rviz
showing the plotted information (as explained above). - (Optional) If you did the extra credit covariance calculation and plotting above, include an additional screenshot and the “best” estimate of covariance as a
geometry_msgs::PoseWithCovariance
message.
All together this writeup should be at least 250 words.