22.5847° N
89.5485° E
// SUNDARBANS DELTA
DEPTH
200M ↓
// TARGET
AUV CONTROL STACK — duburi_ws
MONGLA
— duburi — autonomous
A ROS 2 Humble control, mission, and vision stack for ArduSub vehicles.
One action surface, axis-isolated control, YOLO perception, DVL dead-reckoning,
and full RoboSub task autonomy — field-tested on Duburi 4.2.
ROS2 Humble ArduSub 4.x Pixhawk 2.4.8 YOLO v26 Nucleus1000 DVL BNO085 IMU ByteTrack
01 — The Platform

Duburi 4.2 — hardware at a glance

Octagonal Marine 5083 aluminum hull, 8× Blue Robotics T200 thrusters in vectored_6dof configuration — same ArduSub frame as BlueROV2 Heavy.

FWD T1 T2 T3 T4 T5 T6 T7 T8 Ch5 FWD Ch6 LAT Ch4 YAW torpedo · grabber · dropper vectored_6dof · 8× T200 camera FOV TOP VIEW
ComponentSpec
HullOctagonal Marine 5083 aluminum, in-house
Framevectored_6dof (8× T200) — same as BlueROV2 Heavy
Flight controllerPixhawk 2.4.8 · ArduSub 4.x · EKF3
Companion SBCRaspberry Pi 4B · BlueOS · MAVLink router
Mission SBCNvidia Jetson Orin Nano · all ROS2 nodes
Depth sensorBar30 (ArduSub AHRS2 altitude)
IMUBNO085 on ESP32-C3 · USB CDC · gyro+accel
DVLNortek Nucleus1000 · 192.168.2.201 · TCP 9000
CamerasBlue Robotics Low-Light HD USB (fwd + down)
Network5-port onboard switch · FathomX PoE tether
PayloadSlingshot torpedo · aluminum grabber · solenoid dropper
02 — Architecture

All subsystems in one view

Five ROS2 packages, one action surface (/duburi/move), one MAVLink owner. Every command flows through the same registry — CLI, mission runner, and Python API alike.

OPERATOR duburi CLI mission runner DuburiClient YASMIN SM (roadmap) /duburi/move duburi_manager ActionServer auv_manager_node Duburi facade COMMANDS dispatch VisionState pool heading_lock owner duburi_control motion_yaw · yaw_snap / yaw_glide motion_forward · drive_* + arc Ch5+Ch4 motion_lateral · drive_lateral_* Ch6 motion_depth · hold_depth setpoint motion_vision · P-loop Ch4/5/6+depth heading_lock · 20 Hz Ch4 yaw-rate MAVLink AUV HARDWARE Pixhawk 2.4.8 · ArduSub EKF3 RPi · BlueOS · UDP 14550 8× T200 · ESCs BNO085 · DVL Nucleus1000 Cameras · YOLO · ByteTrack duburi_vision: camera_node → detector_node → tracker_node /detections · /tracks duburi_sensors YawSource ABC · factory telemetry
03 — Motion Control

Open-loop vs closed-loop movement

Mongla uses three control modes depending on the axis. ArduSub owns the 400 Hz inner loops; Python shapes setpoints at 5–20 Hz.

OPEN LOOP — TRANSLATION forward · lateral · arc RC override Thrusters Ch5 (fwd) / Ch6 (lat) PWM 1100–1900µs @ 20 Hz, timed thrust envelope (bang-bang vs trapezoid) constant trapezoid (smooth=true) time → thrust CLOSED LOOP — YAW & DEPTH ArduSub 400 Hz PIDs Σ r setpt Kp·e + Ki∫e Plant ESCs+water y closed-loop feedback SET_ATTITUDE_TARGET (yaw) SET_POSITION_TARGET (depth) Python streams setpoints · ArduSub closes loop @ 400 Hz HEADING LOCK — HYBRID Python P-loop · 20 Hz · yaw_source read yaw_source error = shortest (target−cur) Kp×err → Ch4 0.6%/deg ±18% RC_OVERRIDE Ch4 ArduSub yaw PID yaw_source reading
Forward / Lateral
Open-loop, timed

Python sends RC_CHANNELS_OVERRIDE with a thrust percentage for a set duration. With smooth_translate:=true, a trapezoid_ramp shapes the envelope — easing in, cruise, easing out. The ease-out IS the brake. After: 200 ms reverse kick (constant mode) + 1.2 s settle.

Yaw / Depth
Closed-loop via ArduSub

Python streams setpointsSET_ATTITUDE_TARGET for yaw at 10–20 Hz, SET_POSITION_TARGET_GLOBAL_INT for depth at 5 Hz. ArduSub's 400 Hz attitude and position PIDs close the actual loops. Python never fights the firmware.

Heading Lock
Hybrid P-loop @ 20 Hz

A Python daemon thread reads yaw_source, computes proportional error, and streams Ch4 yaw-rate overrides at 20 Hz. Translation commands run on top — they only write Ch5/Ch6, leaving Ch4 to the lock. Source-agnostic: AHRS, BNO085, or DVL heading.

04 — Perception

Vision pipeline: camera → detection → tracking

YOLO v26 runs on the Jetson GPU at ~30 Hz. ByteTrack assigns stable IDs across frames. VisionState bridges detections to control commands.

camera_node USB capture 1920×1080 30 Hz forward + downward /image_raw detector_node YOLO v26 Ultralytics · Jetson GPU CUDA accelerated class · bbox · conf /detections tracker_node ByteTrack + Kalman supervision.ByteTrack PerTrackKalman smoother stable track IDs (occlusion bridge) smoothed bbox (cx,cy) Kalman-only pred on miss opt-in: --tracking true /tracks VisionState auv_manager_node per-camera subscriber bbox_error() cx_err = (cx−0.5)/0.5 cy_err = (cy−0.5)/0.5 bbox_h_frac = h/frame_h detection age → stale check errors motion_vision P-loop → RC override yaw_pct = kp_yaw × cx_err lat_pct = kp_lat × cx_err fwd_pct = kp_fwd × dist_err depth_Δ = kp_dep × cy_err @ 20 Hz · live-tunable ROS params sensor_msgs/Image vision_msgs/Detection2DArray Detection2DArray + tracking_id normalised errors [−1,+1]
ByteTrack: Zhang et al., "ByteTrack: Multi-Object Tracking by Associating Every Detection Box" ECCV 2022 · arxiv.org/abs/2110.06864
YOLO: Ultralytics YOLO v26 architecture — real-time object detection · docs.ultralytics.com
Kalman Filter: R.E. Kalman, "A New Approach to Linear Filtering and Prediction Problems" ASME 1960 — used here for per-track bbox smoothing
05 — Vision → Control

How the AUV sees, aligns & follows

Bounding box geometry drives three independent control channels simultaneously. The camera frame is a normalized coordinate system — errors feed directly into P-gains.

target bbox gate 0.92 cx_err = +0.42 → yaw RIGHT cy_err = +0.03 ≈ on target bbox_h_frac 0.46 (too small → approach) 0,0 1,0 0,1 1,1 normalised image frame [0,1] cx=0.5 = target centred horizontally bbox_h_frac=target_frac = correct distance

Bounding box → thrust channels

cx_err = (bbox_cx − 0.5) / 0.5 # [−1, +1]
cy_err = (bbox_cy − 0.5) / 0.5
dist_err = target_bbox_h_frac − bbox_h/frame_h

yaw_pct = kp_yaw × cx_err # → Ch4
lat_pct = kp_lat × cx_err # → Ch6 (strafe)
fwd_pct = kp_fwd × dist_err # → Ch5
depth_Δ = kp_dep × cy_err # → depth setpoint

Vision verb modes

VerbActive axesSettling condition
vision_align_yawCh4 only|cx_err| < deadband for N frames
vision_align_latCh6 only|cx_err| < deadband
vision_align_depthdepth setpoint|cy_err| < deadband
vision_hold_distanceCh5bbox_h_frac ≈ target ± tol
vision_align_3dyaw+fwd+depthall axes settled simultaneously
Live tuning during a run: all vision gains are ROS params — change them without restarting:
ros2 param set /duburi_manager vision.kp_yaw 80.0
ros2 param set /duburi_manager vision.deadband 0.06
Tracking mode: With --tracking true, VisionState reads from /tracks instead of /detections. ByteTrack+Kalman bbox is smoother and has stable IDs — better for slow-moving targets, occlusion, and low-confidence frames. Raw detections have lower latency (no tracking buffer).
06 — Sensor Fusion

Yaw, position & state estimation

Three sensor paths, one clean interface. YawSource is an ABC — swap the backend with one ROS param. EKF3 on Pixhawk handles IMU + mag + baro fusion at 400 Hz.

Pixhawk AHRS2 50 Hz · via MAVLink BNO085 ESP32-C3 USB · no mag Nucleus1000 DVL TCP · AHRS + BottomTrack Bar30 (depth) ArduSub AHRS2.altitude YawSource ABC factory.py dispatch yaw_source:= param mavlink_ahrs bno085 dvl / nucleus_dvl bno085_dvl (pool ★) heading_lock P-loop · 20 Hz Ch4 yaw-rate override DVL position get_position() → (x,y) m reset_position() before each move Duburi facade heading → heading_lock position → DVL closed-loop depth → ArduSub ALT_HOLD yaw → SET_ATTITUDE_TARGET fwd/lat → RC_OVERRIDE → Pixhawk wrapper → MAVLink robot_localization EKF/UKF fusion Phase 4 — roadmap
EKF3 (ArduPilot): Mahony et al., "Nonlinear Complementary Filters on the Special Orthogonal Group" IEEE Trans. Automatic Control 2008 · ArduPilot EKF3 docs: ardupilot.org/dev/docs/ekf3.html
Kalman Filter background: Thrun, Burgard, Fox — "Probabilistic Robotics" MIT Press 2005 · BNO085: Bosch Sensortec SH-2 reference: ceva-dsp.com/BNO085-Datasheet
07 — Full Integration

Complete system operation

All subsystems working together: perception → state → control → actuation → sensing, in one closed loop at multiple timescales.

VISION LAYER · 20-30 Hz CONTROL LAYER · 10-20 Hz HARDWARE LAYER · 400 Hz Camera YOLO26 ByteTrack VisionState Mission/CLI auv_manager bbox errors motion modules Pixhawk.py BNO085 DVL Nucleus ArduSub EKF3 400 Hz Att. PID Pos. PID Motor mixer ESCs PWM 1100-1900 8× T200 vectored_6dof yaw feedback → heading_lock AHRS2 50Hz · RC_CHANNELS 5Hz · BAT 1Hz (telemetry)
08 — RoboSub 2026

Competition tasks & Mongla's approach

RoboSub is an international student competition for fully autonomous underwater vehicles. Tasks test perception, navigation, manipulation, and mission management — the exact capabilities Mongla is built for.

Competition format (RoboSub 2026): 15-minute autonomous run. No human interaction after the start signal. Points scored by completing tasks in sequence. Gate must be passed first. Source: robonation.gitbook.io/robosub-resources
01
Collecting Data — Gate
Pass through gate · choose side (reef shark / sawfish)
MANDATORY
RED Reef Shark BLACK Sawfish Duburi ① depth lock ② vision_align_yaw ③ lateral offset ✓ pass
  • Submerge to gate depth (set_depth -0.8), engage heading lock toward gate bearing
  • Drive forward, camera detects gate frame (YOLO class: gate) — vision_align_yaw centres it horizontally
  • YOLO detects divider colour (RED=reef shark / BLACK=sawfish) — select chosen side
  • vision_align_lat or lateral offset moves AUV to correct side of centre
  • move_forward through gate with heading lock — DVL confirms passage distance
02
Navigate the Channel — Slalom
Weave between red/white PVC buoy pairs
R W heading lock active lateral offset per buoy pair exit heading
  • Path markers (orange 4ft × 6in) on pool floor — downward camera detects and aligns heading
  • Detect red buoy (left) and white buoy (right) of each pair via forward camera
  • Pass between each pair: move_lateral to correct offset, heading lock maintains forward direction
  • DVL measures forward progress between pairs — prevents overshoot
03
Drop a BRUVS — Bin
Drop markers into correct half of bin
SHARK SAWFISH Duburi ① downward cam detects bin ② vision_align_3d centers over target half ③ solenoid dropper fires
  • Descend over bin using Bar30 depth control, downward camera detects bin and divider line
  • YOLO classifies target half (based on chosen animal: reef shark side)
  • vision_align_3d centres AUV horizontally over correct half using downward camera
  • Solenoid dropper releases marker at correct depth above bin
04
Tagging — Torpedoes
Fire torpedoes through target openings on board
target board torpedo 1 ↓ pinger ① acoustic pinger guides approach ② vision align → fire
  • Acoustic pinger (hydrophone) guides initial approach to torpedo board area
  • Forward camera detects target board and classifies openings (YOLO classes: large/small circles)
  • vision_align_3d --axes yaw,depth aligns torpedo tube with target opening
  • vision_hold_distance holds correct standoff for torpedo trajectory
  • Fire slingshot torpedo — repeat for second opening with repositioning
05
Ocean Cleanup — Octagon
Surface inside octagon · face image · collect trash · place in baskets
MAX POINTS
surface pinger guides acoustic pinger → navigate to oct. → surface inside → face image (yaw) baskets
  • Acoustic pinger guides navigation to octagon location
  • AUV surfaces inside octagon frame using depth setpoint = 0
  • Forward camera detects the reference image on octagon wall; vision_align_yaw faces it
  • Arm grabber, manoeuvre with vision_align_3d to collect floating trash objects
  • Place collected objects into correct basket (classified by YOLO from visual markers)
  • Maximum bonus points: collect multiple pieces and sort correctly
RoboSub 2026 Task Descriptions — RoboNation Team Handbook §3.2 · robonation.gitbook.io
Competition Sequence of Events — §3.4 · robonation.gitbook.io
09 — Mission Execution

Full run: from CLI to competition completion

A complete RoboSub run as a Mongla mission script. One YAML-like DSL drives all subsystems in sequence.

ARM + bringup_check Launch vision pipeline cameras + YOLO + ByteTrack set_depth(-0.8) ArduSub ALT_HOLD engages lock_heading(0°) BNO085 yaw source 20Hz daemon thread starts TASK 1: GATE vision_acquire → vision_align_yaw → lateral offset move_forward_dist(4m) → heading_lock maintained TASK 2: SLALOM path markers (downward cam) → buoy pairs (fwd cam) TASK 3–5: BIN / TORPEDO / OCTAGON vision_align_3d + payload actuation DISARM + unlock_heading mission complete

Mission DSL — actual code

# missions/robosub_prequal.py
def run(duburi, log):
    duburi.arm()
    duburi.set_depth(-0.8)
    duburi.lock_heading(target=0, timeout=180)

    # ── GATE ──────────────────────────────
    log("approaching gate")
    duburi.vision.find(camera='laptop',
                       target='gate',
                       sweep='yaw_right')
    duburi.vision.yaw(target='gate',
                      duration=10, camera='laptop')
    duburi.move_forward_dist(distance_m=4.0, gain=60)

    # ── BIN ───────────────────────────────
    log("finding bin")
    duburi.set_depth(-1.5)
    duburi.vision.lock(axes='yaw,forward,depth',
                       target='bin',
                       camera='downward',
                       duration=12)
    duburi.drop()  # solenoid dropper

    # ── TORPEDO ───────────────────────────
    duburi.set_depth(-0.8)
    duburi.vision.lock(axes='yaw,depth',
                       target='torpedo_board',
                       distance=0.4, duration=15)
    duburi.fire_torpedo(1)

    duburi.unlock_heading()
    duburi.disarm()
Live-add missions: drop any missions/your_name.py exposing def run(duburi, log), rebuild duburi_planner, and it appears in ros2 run duburi_planner mission --list instantly. No registry edit needed.

State during a vision_align_3d

VisionState cx_err=-0.3 cy_err=+0.1 dist_err=0.2 P gains yaw=-15% dep-0.05m fwd=+14% RC override Ch4=1425 Ch5=1570 depth=-1.55 ArduSub 400 Hz stabilise
10 — Interactive Simulator

Vision alignment — live control simulator

Drag the target inside the camera frame or use the sliders to see how bounding-box errors map to real RC override values. All maths match the live codebase exactly.

↖ drag the target · or use the sliders on the right
YAW Ch4
0%
FWD Ch5
0%
DEPTH Δ
0m
Ch4 PWM
1500µs
cx_err = (bbox_cx − 0.5) / 0.5
cy_err = (bbox_cy − 0.5) / 0.5
dist_err = target_frac − bbox_h_frac
yaw_pct = Kp_yaw × cx_err (clamped ±18%)
fwd_pct = Kp_fwd × dist_err
dep_Δ   = Kp_dep × cy_err × 0.1
Vision align yaw
Centre target horizontally

Only Ch4 driven. AUV yaws until cx_err < deadband. Heading lock suspends during this — vision becomes the yaw authority for the duration.

Vision hold distance
Stand-off via bbox height

Target bbox_h_frac maps to physical distance. Ch5 drives proportionally to dist_err. Approach if bbox too small; back off if too large.

Vision align 3D
All axes simultaneously

Yaw + forward + depth errors computed every 50 ms from the same detection. Each axis has its own deadband and gain — settle condition is all three simultaneously within threshold.

11 — Simulation

Gazebo SITL — BlueROV2 sim target

ArduSub SITL + Gazebo Harmonic gives a faithful vectored_6dof 8-thruster sandbox. Mongla runs identically against sim or real hardware — only the connection profile changes.

ArduSub SITL sim_vehicle.py -v ArduSub vectored_6dof udp:0.0.0.0:14550 Gazebo Harmonic bluerov2_gz BlueROV2 model underwater world physics plugin JSON auv_manager mode:=sim auto-detects SITL all motion modules yaw_source:=mavlink MAVLink UDP 14550 duburi CLI ros2 run duburi_planner duburi arm duburi set_depth -0.5 duburi move_forward /duburi/move mode:=auto probes UDP 14550 → pool USB CDC → desk neither → sim

T1 — ArduSub SITL

sim_vehicle.py \
  -L RATBeach \
  -v ArduSub \
  -f vectored_6dof \
  --model=JSON \
  --out=udp:0.0.0.0:14550 \
  --out=udp:127.0.0.1:14551 \
  --console

T2 — Gazebo world

gz sim -v 3 -r \
  bluerov2_underwater.world

# GZ_SIM_RESOURCE_PATH must
# include bluerov2_gz/models
# and bluerov2_gz/worlds

T3 — Manager + drive

ros2 run duburi_manager start

duburi arm
duburi set_depth --target -0.5
duburi move_forward \
  --duration 5 --gain 60
duburi disarm
Sim parity: BlueROV2 Heavy in Gazebo uses the same vectored_6dof 8-thruster ArduSub frame as Duburi 4.2. Mass, hull shape, and payload differ — but all motion verbs, MAVLink messages, and sensor paths work identically. Develop in sim, deploy at pool without code changes.
10 — Theory & References

Concepts & citations

Mongla is built on well-established robotics, control theory, and computer vision foundations. These are the primary sources for the concepts used.

Control Theory
PID Control

Proportional-Integral-Derivative control — the foundation of depth and heading stabilisation. ArduSub implements cascaded PID (angle → rate) for attitude and altitude.

Åström & Hägglund, "PID Controllers: Theory, Design and Tuning" ISA 1995
ArduPilot attitude control: ardupilot.org
Wikipedia: PID controller
State Estimation
EKF3 & Sensor Fusion

Extended Kalman Filter fuses IMU, magnetometer, barometer, and (optionally) DVL to produce the vehicle's pose estimate. ArduSub EKF3 runs at 400 Hz onboard Pixhawk.

Kalman, R.E. "A New Approach to Linear Filtering" ASME 1960
Thrun et al. "Probabilistic Robotics" MIT Press 2005
ArduPilot EKF3: ardupilot.org/dev/docs/ekf3.html
Computer Vision
YOLO Object Detection

You Only Look Once — single-pass CNN for real-time detection. YOLO v26 on Jetson GPU delivers ~30 Hz detection of gates, buoys, bins, and other RoboSub objects.

Redmon et al., "You Only Look Once" CVPR 2016 · arxiv
Ultralytics YOLO: docs.ultralytics.com
Wikipedia: YOLO
Object Tracking
ByteTrack + Kalman

ByteTrack associates every detection (not just high-confidence ones) to tracks, bridging occlusion gaps. Per-track Kalman smooths bounding box jitter for stable vision control.

Zhang et al., "ByteTrack" ECCV 2022 · arxiv
Filterpy Kalman: filterpy.readthedocs.io
Wikipedia: Kalman filter
Navigation
DVL Dead-Reckoning

Doppler Velocity Log measures velocity relative to seabed via acoustic Doppler shift. Integrating v(t) over time gives position — enabling GPS-denied closed-loop distance moves.

Nortek Nucleus 1000 Technical Manual · nortek.com
Wikipedia: Acoustic Doppler
Dead reckoning: en.wikipedia.org
Middleware
ROS 2 & MAVLink

Robot Operating System 2 (Humble) provides pub/sub, actions, and parameters. MAVLink is a lightweight binary protocol used to command ArduSub and receive telemetry.

ROS 2 Humble: docs.ros.org/en/humble
MAVLink 2.0: mavlink.io
ArduSub: ardusub.com
Mongla
MONGLA — duburi_ws
AUV control stack for Duburi 4.2 · ROS2 Humble · ArduSub · YOLO v26 · Nortek Nucleus1000 DVL
EXPLORE · PERCEIVE · AUTONOMOUS
22.5847° N, 89.5485° E
Sundarbans Delta, Bangladesh
Named for the port of Mongla