Sentinel

Agentic Video Intelligence Platform with Confluent + Vertex AI

Vision → Decision → Action: Transform any video feed into governed event streams that drive real-time, explainable actions across manufacturing, healthcare, retail, logistics, and beyond, with an audit-grade evidence trail you can replay.


💡 Inspiration - The Problem We're Solving

Organizations across every industry deploy thousands of cameras - factories, warehouses, hospitals, retail stores, construction sites, farms, energy infrastructure, and smart cities. Yet most footage remains reactive, reviewed only after incidents occur, when the damage is already done.

The cost of waiting is universal and staggering:

The opportunity: Video represents ~80% of global data by 2025 (175 zettabytes projected) - IDC/Seagate 2018, yet organizations across all industries struggle to operationalize multimodal AI in real-time with governance, cost control, and auditability.

Sentinel closes that gap, not for one industry, but for every industry where visual monitoring matters.


🎯 What It Does

Sentinel is a agentic video intelligence platform that continuously converts raw video into governed operational intelligence across any industry:

End-to-End Pipeline

📹 Video Feed
    ↓ (Motion Detection + Sampling)
🔍 Observe (Gemini Multimodal Analysis)
    ↓ (Structured JSON Observations)
🧠 Think (Reasoning + Domain Knowledge Grounding via Vertex AI Search)
    ↓ (Explainable Decisions with Citations)
⚡ Act (Automated Alerts / Actions / webhooks)
    ↓ (Deduped, Cooldown-Protected)
📊 Audit + Real-Time KPIs (BigQuery + Flink SQL)

Cross-Industry Applications

The same architecture solves different problems across verticals:

Industry Use Case Detection Action Impact
Manufacturing Equipment anomaly detection Abnormal vibrations, leaks, smoke Predictive maintenance alert Prevent $2M/hour downtime
Healthcare Patient safety monitoring Fall detection, mobility issues Immediate staff alert Reduce adverse events
Retail Queue & service optimization Long wait times, checkout bottlenecks Staff reallocation Improve customer experience
Logistics Loading dock safety Forklift near-misses, improper stacking Stop operations, supervisor alert Prevent $43K injuries
Agriculture Crop & livestock monitoring Irrigation issues, animal distress Automated intervention Prevent yield loss
Energy Infrastructure monitoring Pipeline leaks, equipment corrosion Emergency shutdown Prevent environmental disasters
Construction Site safety compliance Missing PPE, unsafe scaffolding Stop work order Reduce OSHA violations
Smart Cities Traffic & crowd management Congestion, crowd density Dynamic signal control Optimize urban flow

Demo Implementations (Included)

We've built two use-case examples to showcase the platform's flexibility:

1. Security & Safety Monitoring

  • Detects violations in real-time (PPE missing, unsafe behavior, spills)
  • Evaluates severity with confidence scores
  • Executes stop-line commands or alerts with full evidence chain
  • Shows trace-linked video clips and reasoning

2. Assembly SOP Compliance

  • Sessionizes station workflows into discrete work units
  • Validates completion against SOP requirements
  • Identifies missing steps with citations to procedure documents
  • Provides operator-ready corrective instructions

The key insight: Both demos use the exact same streaming architecture, only the prompts, knowledge bases, and action handlers change. This proves the platform's universality.


🏗️ How We Built It - Architecture

Alt text

Three-layer design: Streaming (Confluent), Intelligence (Vertex AI), and Audit (BigQuery + Flink)

Three-Plane Design

1. Streaming Plane (Confluent Cloud)

  • Multi-stage event choreography through Kafka topics
  • Schema-governed contracts via Schema Registry (JSON Schema)
  • Independent scaling per agent via consumer groups
  • Replay-first architecture for forensics and iteration

2. Intelligence Plane (Vertex AI)

  • Gemini multimodal: Zero-shot video understanding
  • Gemini reasoning: Severity assessment and action planning
  • Vertex AI Search: RAG-grounded SOP lookups with citations
  • Embeddings API: Semantic retrieval for knowledge base

3. Audit & Analytics Plane

  • BigQuery: Immutable audit logs with correlated trace IDs
  • Flink SQL: Real-time KPIs computed directly over Kafka streams
  • Cloud Storage: Clip archival for evidence replay

Multi-Agent Streaming Architecture

video.clips → [Observer Agent]
    ↓
video.observations → [Sessionizer Agent]
    ↓
station.sessions → [Thinker Agent + SOP Grounding]
    ↓
sop.decisions → [Action Agent + Dedup]
    ↓
workflow.actions → [Audit Sink]
    ↓
audit.events (BigQuery)

Key Innovation: Each agent is an independent service consuming/producing from Kafka topics. This enables:

  • Horizontal Scaling where it matters (e.g., 10x Observer instances for 100 cameras)
  • Independent Evolution (swap models/prompts without downstream rewrites)
  • Fault Isolation (one agent failure doesn't crash the pipeline)
  • Clean Contracts (Schema Registry ensures safe evolution)

🔧 Technical Implementation - Why This Works at Scale

1️⃣ Cost-Controlled Multimodal Inference

Video AI inference can bankrupt a deployment. We built multiple cost gates:

Motion Detection Prefilter

  • Pixel-change detection + background subtraction
  • Filters 80-90% of "quiet" clips before inference
  • Turns "impossible economics" into viable deployment

Smart Sampling & Segmentation

  • Configurable clip length and FPS
  • Prevents redundant processing of static scenes

Streaming Deduplication

  • Cooldown windows prevent alert storms
  • Action-level dedup across topics

Cost Model:

API Calls/day ≈ (N_cameras × 1440 minutes/day) / (T_clip × (1 - filter_rate))

Example: 10 cameras, 30-sec clips, 80% filtered
= (10 × 1440) / (0.5 × 0.2) = 144,000 calls/day
With 80% prefilter → 28,800 calls/day

2️⃣ Explainability & Governance as First-Class Output

Every decision includes:

  • Evidence: Exact clip timestamp range
  • Rationale: Human-readable explanation
  • Confidence scores: Per-signal uncertainty
  • Citations: When grounded in SOP/policy (via Vertex AI Search)
  • Trace ID: Correlates across all pipeline stages

Operators trust the system because they can see why it decided, not just what it decided.

3️⃣ Real-Time KPIs Without Extra Infrastructure

The same Kafka streams that drive actions also power analytics:

Flink SQL Queries (examples included):

  • Rule hit rates by violation type
  • Stop-line frequency trends
  • Confidence distribution analysis
  • P95 end-to-end latency tracking
  • Alert storm detection windows

Value: No separate analytics pipeline, KPIs are computed in-stream.

4️⃣ Production-Grade Replay & Forensics

Kafka's retention + correlated trace_id enables:

  • Incident investigation: Replay exactly what the system saw
  • Model tuning: Re-run decisions with updated prompts
  • Compliance audits: Full evidence chain for regulatory review
  • A/B testing: Compare model outputs on same event history

🚀 Confluent Cloud Usage

This project is Confluent-native by design.

What We Used & Why It Matters

Confluent Feature How We Use It Business Value
Kafka Topics Multi-agent backbone with topic-per-stage pattern Decoupling, fault isolation, clean evolution
Consumer Groups Scale Observer instances (10x) independently from Thinker (2x) Cost-efficient horizontal scaling
Schema Registry JSON Schema contracts generated from Pydantic models Safe prompt/model evolution, fewer breakages
Replayability Replay by trace_id for forensics and iteration Incident investigation, compliance, A/B testing
Flink SQL Real-time KPIs + stream-side cost filters Operational visibility, upstream cost gates

Critical Design Decision: We chose event choreography over orchestration. Each agent is autonomous, consuming from upstream topics and producing to downstream topics. This creates natural backpressure, enables independent scaling, and makes the system resilient to partial failures.


🧠 Vertex AI Usage

What We Used & Why It Matters

Vertex AI Feature How We Use It Business Value
Gemini Multimodal Observer agent reads video clips, emits structured signals Zero-shot understanding, no CV pipeline required
Gemini Reasoning Thinker + Doer convert signals → severity → actions Operational judgment with consistent JSON
Vertex AI Search RAG grounding for SOP compliance checks Citation-backed decisions, reduced hallucinations
Embeddings API Semantic SOP chunk retrieval Scalable knowledge grounding

Critical Design Decision: We use structured output prompting (strict JSON schemas) to ensure Gemini outputs are Kafka-ready events, not unstructured text. This makes the pipeline reliable and testable.


💪 Challenges We Overcame

1. Latency vs. Cost Trade-off

Problem: High-resolution video at 30 FPS = 1,800 frames/minute. At $0.05/frame, that's $90/minute = $129,600/day per camera. Impossible.

Solution:

  • Motion detection cuts inference by 80-90%
  • Segment into 15-30 second clips
  • Sample at 1-2 FPS for analysis
  • Result: ~$10-20/day per camera (economically viable)

2. Multi-Agent Coordination Without Brittle Orchestration

Problem: Centralized orchestrators become single points of failure and bottlenecks.

Solution:

  • Event choreography via Kafka topics
  • Schema Registry enforces contracts between agents
  • Each agent scales independently
  • Natural backpressure prevents cascade failures

3. Trust & Explainability for Compliance Use Cases

Problem: "AI said stop the line" isn't acceptable in regulated environments.

Solution:

  • Vertex AI Search grounds decisions in actual SOP documents
  • Every decision includes citations to specific procedure sections
  • Full audit trail with correlated trace IDs
  • Operators can replay incidents to understand "why"

💼 Potential Value Applications

Manufacturing & Industrial:

  • Downtime prevention: Early detection of equipment issues addresses documented $36K–$2.3M/hour costs
  • Predictive maintenance: Faster awareness of visual anomalies (smoke, leaks, vibrations)
  • Quality control: Real-time visual inspection of assembly processes

Safety & Compliance:

  • Injury prevention: With $43K average cost per workplace injury, early hazard detection has measurable value
  • Compliance monitoring: Automated verification of safety protocols
  • Regulatory support: Documented audit trails for incident investigation

Operational Efficiency:

  • Process monitoring: Visual verification of workflow completion
  • Audit support: Reduced time spent on manual video review
  • Quality feedback: Faster identification of process deviations

Healthcare & Patient Safety:

  • Fall detection: Addressing documented hospital fall costs
  • Early intervention: Real-time alerts for patient mobility issues
  • Staff support: Automated monitoring between routine checks

Retail & Customer Experience:

  • Queue optimization: Visual monitoring of checkout wait times
  • Loss prevention: Automated detection of unusual activity
  • Service quality: Real-time awareness of customer service needs

🎯 What's Next

We built Sentinel as a architectural foundation. Natural next steps for us are:

Near-Term (1-3 Months)

  • Flink-first cost gating: Move motion/signal thresholds into stream processing (materialized views)
  • Connector ecosystem: Slack, PagerDuty, ServiceNow, Jira (action handlers already modular)
  • Policy pack system: Plug-in SOP libraries per station/site/customer with version control

Medium-Term (3-6 Months)

  • Vector Search hardening: Upgrade SOP retrieval to Vertex Vector Search for lower latency
  • Multi-modal expansion: Add audio analysis (machine sounds, alarms) to video
  • Edge deployment: Run Observer agents closer to cameras for ultra-low latency

Long-Term (6-12 Months)

  • Federated learning: Train station-specific anomaly models on local data
  • Predictive maintenance: Correlate visual signals with equipment telemetry
  • Cross-site benchmarking: Compare SOP adherence across facilities

🛠️ Built With

Confluent Cloud (Core Platform)

  • Kafka Topics: Multi-agent event backbone
  • Consumer Groups: Independent scaling per agent type
  • Schema Registry: Governed JSON Schema contracts
  • Flink SQL: Real-time KPIs and stream analytics
  • Replayability: Forensic replay by trace ID

Google Cloud Vertex AI (Intelligence Layer)

  • Gemini 2.5 Flash (Multimodal): Video understanding
  • Gemini 2.5 Pro (Reasoning): Decision synthesis
  • Vertex AI Search: RAG-grounded SOP retrieval
  • Embeddings API: Semantic knowledge base

Google Cloud Infrastructure

  • Cloud Storage: Clip archival
  • BigQuery: Audit log warehouse
  • Cloud Run: FastAPI control plane (demo UI)
  • Secret Manager: API key governance

🎬 Try It Yourself

Demo Video: https://youtu.be/-X6tXlmWlvM

Live Demo: https://github.com/Niket93/sentinel

Code Repository: https://sentinel-464199486062.us-central1.run.app/ui


👥 Team

Niket Shah - LinkedIn


📚 References & Research

  1. IDC/Seagate (2018): "The Digitization of the World - From Edge to Core" - 175 zettabytes of data by 2025, 80% video/video-like
  2. Siemens (2024): "The True Cost of Downtime 2024" - Downtime costs $36K/hour (FMCG) to $2.3M/hour (automotive)
  3. Uptime Institute (2024/2025): "Annual Outage Analysis" - 54% of outages cost >$100K, 20% cost >$1M
  4. ITIC (2024): "Hourly Cost of Downtime Survey" - 90%+ of enterprises estimate >$300K/hour downtime cost
  5. National Safety Council (2023): "Work Injury Costs - Injury Facts" - $176.5B total workplace injury costs, $43K average per injury

🏆 Why This Matters

Real-time video intelligence is hard to operationalize. Most approaches either sacrifice cost-efficiency, explainability, or governance. Sentinel demonstrates that you don't have to choose.

✅ Solves Documented Problems
Addresses $36K–$2.3M/hour downtime costs and $43K workplace injury costs with a practical, economically viable approach.

✅ Production-First Design
Cost controls aren't an afterthought, they're built into the architecture. Motion detection, sampling strategies, and deduplication make multimodal AI inference economically feasible at scale.

✅ Deep Sponsor Integration
This isn't a shallow integration. We use Confluent's event choreography for multi-agent coordination, Schema Registry for safe evolution, and Flink SQL for real-time KPIs. Vertex AI powers zero-shot video understanding, grounded reasoning with RAG, and citation-backed decisions.

✅ Explainability as Output
Every decision includes evidence, rationale, confidence scores, and citations. This isn't a black box, it's a system operators can trust and auditors can verify.

✅ Demonstrates Architectural Thinking
Two different use cases running on the same infrastructure proves the approach is adaptable. The streaming backbone doesn't change, only prompts and knowledge bases do.

Sentinel shows how Confluent and Vertex AI can work together to make video intelligence operationally viable: governed, explainable, cost-controlled, and production-ready.

Built With

Share this project:

Updates