Blogs

AWS Kinesis Video Streams and Real Time Computer Vision Architecture (What to Build First)

Hero image for: AWS Kinesis Video Streams and Real Time Computer Vision Architecture (What to Build First)
Shawn Wilborne
August 27, 2025
4
min read

AWS Kinesis Video Streams and Real Time Computer Vision Architecture (What to Build First)

Key Takeaways

  • Real time CV reliability depends on ingest, buffering, and event routing.
  • Start with measurable event definitions and latency targets.
  • Use Kinesis Video Streams for ingest, and containers for inference.
  • Store events and evidence with enough metadata to audit and improve.

Real time computer vision projects often fail for non ML reasons: stream reliability, buffering, latency, and alert routing. This post outlines a practical AWS architecture for analyzing camera streams and producing reliable events.

Start with the outcome, not the model

Define:

  • What event matters (person detected, loitering, PPE compliance)
  • Alert channel (webhook, SMS, incident system)
  • Latency target (p95)
  • False positive tolerance

Without these, tuning is endless.

Core AWS building blocks

Video ingest

Compute

Common options:

  • ECS on Fargate for containerized inference
  • EC2 with GPUs for heavier models
  • EKS when you already run Kubernetes

References:

Event routing

Model layer options

  • Managed: Amazon Rekognition for common detection use cases
  • Custom: YOLO family models, fine tuned on your data

References:

A minimal production pipeline

  1. Ingest RTSP or camera feed into Kinesis Video Streams
  2. Consumer service pulls fragments
  3. Decode frames using FFmpeg
  4. Run inference on sampled frames
  5. Apply post processing rules, tracking, smoothing
  6. Emit event with confidence and evidence snapshot
  7. Persist events and review artifacts

FFmpeg reference:

Storing events and evidence in MongoDB

MongoDB works well for event records and review workflows:

  • camera_events: event type, timestamps, confidence, references
  • evidence: S3 pointers to snapshots or clips
  • alert_deliveries: webhook attempts and retries

Observability and reliability

Do not ship without:

  • Metrics: ingest lag, frames processed per second, inference latency
  • Logs with correlation IDs
  • Replay capability for incident review

References:

ze every frame?** Usually no. Sampling plus tracking can meet most outcomes while controlling cost.

Q: When should we use Rekognition vs a custom model? Rekognition is a great baseline for common detections. Custom models win when your environment is specific, you need higher accuracy, or you need custom classes.

Q: How do we handle multiple cameras and scaling? Use a queue or partitioning strategy, and scale inference workers horizontally. Keep per camera state separate.

Internal reference:

FAQs

**Q: Do we need to analy

Q: What should we link to internally? A: Link to relevant solution pages like Computer Vision or Document Intelligence, and only link to published blog URLs on the main domain. Avoid staging links.

Written By
Shawn Wilborne
AI Builder