Blogs

AI-Enhanced Apps with React and AWS for Vision

Shawn Wilborne
August 27, 2025
5
min read

Modern computer-vision workflows often involve multiple specialized agents (image classifiers, annotation tools, QA reviewers, billing trackers) working together. An LLM-based orchestrator can plan and route tasks across these agents—think cognitive orchestration—where the “master agent” uses context and SOPs to decide the next step. With LangChain routing and LangGraph multi-agent workflows, an image analysis request can be decomposed into subtasks (classification → annotation → review → billing), each handled by an independent service. This yields a decoupled, scalable system and preserves traceability of decisions and data flows. For teams exploring agent platforms, compare LangGraph (stateful orchestration for agents) with OpenAI’s Agents SDK (tool-calling & orchestration primitives) and the evolving Responses/Agents tooling (The Verge coverage). (LangChain, LangChain AI, OpenAI Platform, OpenAI, The Verge)

Reference Architecture (AWS + LangChain + Mongo)
Ingress & Orchestrator. Client calls an API Gateway endpoint → Step Functions state machine. Choice/Map states implement routing and parallelism. The planner is a LangChain LCEL/LangGraph “conductor” deciding which sub-agents to call next (e.g., vision vs. LLM). (AWS Documentation, LangChain, LangChain AI)

Agents as Lambdas/Services. Each agent is an AWS Lambda (for light/CPU work or external API calls) or a containerized service on ECS/Fargate or SageMaker real-time endpoints (for GPU inference like YOLO/CLIP). (Amazon Web Services, Inc., AWS Documentation)

Shared Memory & Logs. MongoDB Atlas stores agent state, intermediate artifacts, confidence scores, costs, and audit logs for dashboards and billing. Optionally purchase via AWS Marketplace for consolidated billing. (MongoDB, Amazon Web Services, Inc.)

Human-in-the-Loop. If a QA agent’s confidence drops below a threshold, the workflow emits a human review task; otherwise it proceeds autonomously. AWS offers native HITL via Amazon Augmented AI (A2I) and SageMaker Ground Truth. (AWS Documentation)

Why these services?
Step Functions is a managed workflow engine with per-transition pricing that’s easy to model (pricing; see also this costing walkthrough). Lambda offers pay-per-use compute. For heavier GPU models, use Fargate (vCPU/GB-seconds billing) or SageMaker endpoints (hourly instance pricing). Atlas provides a perpetual free tier and flexible paid tiers (Atlas pricing overview; Atlas on AWS). (Amazon Web Services, Inc., MongoDB)

System Diagram
Multi-Agent Orchestration for Vision Systems
Image Overview
Boxes: React/Web client → API Gateway → Step Functions (Planner: LangChain/LangGraph) → Parallel branches: (A) Vision Agent (ECS/SageMaker GPU) (B) LLM Agent (Lambda → OpenAI/Claude) (C) Annotation Agent (Lambda/service) (D) QA Agent (LLM; escalate to Human Review queue on low confidence) (E) Billing Agent → MongoDB Atlas (state, costs, logs) → Response aggregator → Client.
Callouts: Distributed/Inline Map for large fan-out; Inline map/parallels for small fan-out. See Distributed Map vs Inline Map (example). (AWS Documentation)

What LangChain/LangGraph Actually Do
Planner & Router. Use LCEL/routing and LangGraph agentic concepts to classify inputs and route them to the right agent.
Multi-Agent Graph. Model agents as nodes with handoffs between them (LangGraph multi-agent).
Alternatives. Prefer a vendor SDK? OpenAI’s Agents SDK provides tool-calling/orchestration primitives; OpenAI has also signaled a shift toward a more unified Responses/Agents stack (The Verge). (LangChain, LangChain AI, OpenAI Platform, OpenAI, The Verge)

Sample Workflow: Auto-Labeling → QA → Billing
Upload & Preprocess. Client uploads to S3; the orchestrator triggers Vision Agent to run detection/embedding (GPU model via SageMaker real-time or Fargate).
Annotation Agent. Draft labels are created (e.g., class/box masks).
QA Agent. An LLM agent checks schema/consistency; below a threshold it emits a Human Review task (see A2I).
Billing Agent. Reads per-step usage (GPU seconds, Lambda GB-seconds, LLM tokens) and computes cost.
Persist & Notify. MongoDB stores artifacts: detections, QA verdict, reviewer ID, costs; client gets a job summary. For evented patterns: S3 event notifications trigger downstream processing (e.g., Lambda). (AWS Documentation)

Parallelism patterns: For high throughput, use Distributed Map to fan-out per-image tasks at scale. For modest batches, Inline Map (parallel state) in Step Functions works well (example). (AWS Documentation)

MongoDB as Agent Memory & Ledger
Use Atlas to persist:
Job state: status, agent_step, timestamps, retry counts.
Confidence & Escalations: per-label confidence; escalated: true/false, reviewer_id.
Usage & Cost: tokens_in/out, gpu_seconds, lambda_gb_seconds, cost_usd per step and totals.
Audit Trail: full trace of decisions for compliance. Atlas pricing fits startups through enterprise: free tier for prototyping; Flex and Dedicated tiers scale as needed (pricing; Atlas on AWS). (MongoDB, Amazon Web Services, Inc.)

Human-in-the-Loop (HITL) Triggers
Policy: If confidence < 0.9 or class in {rare, regulated}, add a Review task.
Routing: Planner writes a “needs_review” flag in Mongo; Step Functions routes to a human review queue/service (consider A2I with private workforce).
Closure: Reviewer decision updates Mongo; the workflow resumes. This preserves automation speed while keeping quality and compliance. (AWS Documentation)

Costs You Can Explain to Stakeholders
Step Functions: $0.025 per 1,000 state transitions (Standard) — simple to estimate by counting steps (official pricing; costing blog).
Lambda: $0.20 per 1M requests + compute billed per GB-second (handy back-of-napkin rate ≈ $0.0000166667/GB-s; see also this calculator explainer).
Fargate: Pay-as-you-go for vCPU-seconds and GB-seconds.
SageMaker GPU endpoints: pay hourly per instance size (e.g., g4dn family) — see pricing and this AWS cost primer.
MongoDB Atlas: free tier for dev; paid tiers scale up (pricing).
Rule of thumb (example): 1,000 images, 5 workflow steps each → ~5,000 Step Functions transitions ≈ $0.125; light Lambda work adds a few cents; MongoDB is a small fixed line item for small workloads. GPU endpoints dominate cost when you keep them warm; use Fargate jobs or batch/endpoints for spiky loads. (Amazon Web Services, Inc., Dashbird, MongoDB)

Dashboard & Ops (what to show your PM/CTO)
Latency percentiles per agent (LLM vs GPU vs Lambda).
Cost breakdown by step/model/backend (tokens, GPU seconds, Lambda GB-seconds).
HITL queue health (backlog, average review time).
Quality KPIs (auto-accept rate, defect rate by class, drift alarms).

When to Choose What
All-serverless (Lambda + Step Functions): best for glue logic, light preprocessing, and LLM API calls where you don’t need GPUs and you value minimal ops.
GPU services (Fargate/SageMaker): best for heavy vision inference (YOLO/CLIP) or when you need warm models & low p95 latency.
LangGraph vs vendor SDKs: choose LangGraph for graph-based, stateful multi-agent flows and deep integration with LangChain tools; consider OpenAI’s Agents SDK if your stack is already centered on OpenAI models and you want their native tools and tracing (recent coverage). (LangChain AI, OpenAI Platform, The Verge)

Written By
Shawn Wilborne
AI Builder