Series: Building Infrastructure for an Autonomous Drone Fleet (3/4)
Part 1: Device Identity · Part 2: Telemetry · Part 3: Monitoring · Part 4: Battery Management
The Monitoring Challenge
Drones are remote, intermittently connected, and resource-constrained. Two distinct needs that are often conflated: real-time awareness (“Is it flying right now?”) and historical analysis (“What happened during last Tuesday’s flight?”). Standard observability tools lack domain-specific concepts like flights, battery assignments, and device connectivity patterns.
Two Services, Clear Boundaries
- Ingestion service (Go): real-time data intake, device status, live alerts
- Analysis service (Python): historical storage, categorized logs, flight detection, dashboards
They share no database. The analysis service polls the ingestion API using a cursor. Different languages suited to different strengths, independent scaling, independent failure domains.
Cursor-Based Polling — The Glue
The analysis service tracks exactly which data it has already processed per device. Every 5 minutes, it requests “everything since my last cursor.” Logs are routed into category-specific tables. Database constraints handle deduplication — the design tolerates overlapping fetches. Simple, debuggable, and resilient to downtime on either side.
Automatic Flight Detection
ARM event + GPS movement + DISARM event = flight detected. No pilot input needed — purely derived from the telemetry stream. For each flight: start time, end time, duration, max altitude, distance covered, GPS path. This became the most-used feature: the operations team checks flights, not raw logs.
The Operational Dashboard
Device overview with connectivity, battery level, last activity. Flight timeline with drill-down. Telemetry explorer for historical data per device. Designed for operations: “show me problems” rather than “show me metrics.”
Alerts — Work in Progress
Configurable alert rules evaluated on each polling cycle: battery degradation, device offline, GPS anomaly, unexpected disarm. Currently stored in database and displayed in frontend. Next step: integration with team chat / email / push notifications.
What I Learned
- Separating real-time from historical was the best architectural decision — fundamentally different query patterns and retention needs
- Cursor-based polling is boring but reliable — simpler than WebSockets or event streaming at 5-minute granularity
- Flight detection from telemetry patterns is surprisingly nuanced — the happy path is easy, edge cases are where the complexity lives
- Database migrations on a live system with 8-figure row counts require respect (and statement timeouts)