Overview
This work presents a ROS 2 framework for real-time spherical object detection, monocular 3D localization, and reactive tracking. Departing from the usual blob-detection approach, it uses the Hough Circle Transform for robust 2D detection, then applies pure camera geometry to recover 3D coordinates from a single RGB image — no depth sensor, no stereo rig, no learned depth network.
The architecture is deliberately decoupled: detection, 3D estimation, and motion control each run as independent ROS 2 nodes. Tracking can be reactive on 2D data while a parallel consumer uses the 3D output for downstream tasks (logging, AR overlays, manipulator targeting).
📄 Published as a preprint on TechRxiv — Vision-Based 3D Object Coordinate Estimation and Tracking using Hough Transform and Modular 3D Estimation in ROS (2025).
Contributions
- Hough Circle Transform as a robust, parameter-driven alternative to color-only blob detection.
- Decoupled architecture — independent 3D estimation and 2D tracking processes for parallel use.
- Comprehensive Gazebo validation — illumination sweeps, range tests, multi-ball scenarios.
System Architecture
The pipeline is split into three ROS 2 nodes that communicate purely through topics — every node can be replaced, restarted, or relocated to another machine without touching the others.
2D Detection — Hough Circle Transform
detect_ball consumes /camera/image_raw and runs a four-stage pipeline:
- Pre-processing — Gaussian blur for noise reduction, then Adaptive Histogram Equalization to lift local contrast.
- HSV color masking — convert to HSV, threshold on the target hue range. HSV is much more lighting-stable than RGB.
- Hough Circle Transform — gradient-based Hough vote in
(a, b, r)accumulator space, picking peaks where edge pixels agree on a circle:
Filters: minimum / maximum radius (drop irrelevant detections) and accumulator threshold (only confident votes).(x − a)² + (y − b)² = r² - Output — publish normalized pixel center
(u, v)and pixel radiusrto/detected_ball.
3D Position Estimation — Pure Camera Geometry
detect_ball_3d subscribes to /detected_ball and recovers 3D coordinates (x₃ᴅ, y₃ᴅ, z₃ᴅ) from the camera intrinsics — no depth sensor, no learned model.
Depth from apparent size
If the ball’s true diameter is 2·r_real and its apparent angular size at distance d is θ_ball:
θ_ball = z2d · h_fov (image-plane angular extent)
d = r_real / tan(θ_ball / 2)
In practice the inverse-tangent relation collapses to a stable mapping from pixel radius r to metric distance d once the camera’s horizontal field of view (h_fov) is calibrated.
Vertical decomposition
θ_y = y2d · v_fov / 2
y3d = d · sin(θ_y)
d' = d · cos(θ_y) // distance projected onto the horizontal plane
Horizontal decomposition
θ_x = x2d · h_fov / 2
x3d = d' · sin(θ_x)
z3d = d' · cos(θ_x)
The result is a clean (x₃ᴅ, y₃ᴅ, z₃ᴅ) in the camera frame, ready for downstream consumers — manipulators, AR overlays, dataset logging.
Reactive Tracking Control
follow_ball closes the loop on the 2D detection. It uses two simple controllers and a search behavior:
Angular control — proportional controller on the normalized horizontal offset x ∈ [−1, 1]:
ω = −Kp · x
Linear control — forward velocity gated by apparent size (the bigger the ball, the closer it is):
v = vf while r < r_max
v = 0 otherwise
Search behavior — if no detection arrives within t_max, the robot enters search mode at fixed angular velocity Ωs until reacquisition.
Exponential filtering smooths offset and radius before they enter the controllers:
x̂ₜ = α · x̂ₜ₋₁ + (1 − α) · xₜ
This keeps the robot from chattering on noisy detections while still reacting quickly to genuine motion.
Validation in Gazebo
Detection robustness vs illumination
The simulated scene’s main light source was swept across diffuse RGB values. Approximate lux levels were computed as:
Lux ≈ ((R + G + B) / 3) · L_ref (L_ref = 10,000 lx for white light)
| Diffuse RGB | Approx. Lux | Detected |
|---|---|---|
| (0.8, 0.8, 0.8) | 8000 lx | ✅ |
| (0.5, 0.5, 0.5) | 5000 lx | ✅ |
| (0.1, 0.1, 0.1) | 1000 lx | ✅ |
| (0.05, 0.05, 0.05) | 500 lx | ✅ |
Detection holds reliably down to ~500 lx — comfortably below typical office lighting.
Operational range
| Bound | Distance | Reason |
|---|---|---|
| Minimum | 10 cm | Camera FOV / lens distortion |
| Maximum | 3.5 m | Camera resolution + Hough radius threshold |
This range covers the bulk of indoor robot interaction tasks: following, pick-and-place, reactive games.
Sample 3D output
For the configuration shown in the paper:
x = 0.15 my = −0.34 mz = 0.045 m
Velocity profiles during a chase show initial rotational alignment, then a steady forward approach, with smooth distance reduction and minimal lateral drift in the trajectory plot.
Tech Stack
- ROS 2 — node graph, topics, lifecycle
- OpenCV — Hough Circle Transform, HSV thresholding, CLAHE
- Gazebo — physics + camera simulation
- Python / C++ — node implementations
- Differential-drive base — same platform as the SLAM project
Watch the Tracking
Applications
- Automated ball collection — sports stadiums, training centers
- Industrial sorting — 3D coordinates feed a manipulator that classifies parts by size and position
- Augmented reality — overlay graphics on real-world ball games
- Agricultural robotics — size-filtered Hough detection for fruit-picking
Full Paper
The complete preprint, including all equations, figures, and references:
Read on TechRxiv · Open PDF in a new tab · Download PDF
Future Work
- Real-world validation beyond the Gazebo benchmarks — outdoor lighting, motion blur, partial occlusions
- Multi-object tracking with persistent identities (Hungarian assignment + Kalman filtering)
- Polymorphic detection — extend Hough to ellipses / generalized shapes for non-spherical targets
- Sensor fusion — combine the monocular depth estimate with sparse LiDAR returns for confidence-weighted localization
Contributors
- Imad-Eddine NACIRI
- Oussama Errouji
- Jade Bousliman
Takeaway
You don’t always need a depth sensor. With clean camera intrinsics, a robust 2D detector, and a decoupled architecture, monocular geometry is enough to put an object in 3D space — and ROS 2 turns that into a reusable building block for everything from reactive control to manipulation.