Our Tech Stack, Your Superpower

We build blazing-fast, AI-powered web apps using the latest tech. From React to GPT-4, our stack is built for speed, scale, and serious results.

What Powers Our Projects

  1. React.js, Node.js, MongoDB, AWS
  2. GPT-4, Claude, Ollama, Vector DBs
  3. Three.js, Firebase, Supabase, TailwindCSS

Every project gets a custom blend of tools—no cookie-cutter code here. We pick the right tech for your goals, so your app runs smooth and grows with you.

“Great tech is invisible—until it blows your mind.”

We obsess over clean code, modular builds, and explainable AI. Weekly updates and async check-ins keep you in the loop, minus the jargon.

Trusted by startups, educators, and SaaS teams who want more than just ‘off-the-shelf’ solutions.

Why Our Stack Stands Out

We don’t just follow trends—we set them. Our toolkit is always evolving, so your product stays ahead of the curve.

From MVPs to full-scale platforms, we deliver fast, flexible, and future-proof solutions. No tech headaches, just results.

Ready to build smarter? Let’s turn your vision into a launch-ready app—powered by the best in AI and web tech.

Lid Vizion: Miami-based, globally trusted, and always pushing what’s possible with AI.

interface image of employee interacting with hr software
Every pixel, powered by AI & code.

AI Web Apps. Built to Win.

From Miami to the world—Lid Vizion crafts blazing-fast, AI-powered web apps for startups, educators, and teams who want to move fast and scale smarter. We turn your wildest ideas into real, working products—no fluff, just results.

Our Tech Stack Superpowers

  1. React.js, Node.js, MongoDB, AWS
  2. GPT-4, Claude, Ollama, Vector DBs
  3. Three.js, Firebase, Supabase, Tailwind

We blend cutting-edge AI with rock-solid engineering. Whether you need a chatbot, a custom CRM, or a 3D simulation, we’ve got the tools (and the brains) to make it happen—fast.

No cookie-cutter code here. Every project is custom-built, modular, and ready to scale. We keep you in the loop with weekly updates and async check-ins, so you’re never left guessing.

“Tech moves fast. We move faster.”

Trusted by startups, educators, and SaaS teams who want more than just another app. We deliver MVPs that are ready for prime time—no shortcuts, no surprises.

Ready to level up? Our team brings deep AI expertise, clean APIs, and a knack for building tools people actually love to use. Let’s make your next big thing, together.

From edge AI to interactive learning tools, our portfolio proves we don’t just talk tech—we ship it. See what we’ve built, then imagine what we can do for you.

Questions? Ideas? We’re all ears. Book a free consult or drop us a line—let’s build something awesome.

Why Lid Vizion?

Fast MVPs. Modular code. Clear comms. Flexible models. We’re the partner you call when you want it done right, right now.

Startups, educators, agencies, SaaS—if you’re ready to move beyond just ‘playing’ with AI, you’re in the right place. We help you own and scale your tools.

No in-house AI devs? No problem. We plug in, ramp up, and deliver. You get the power of a full-stack team, minus the overhead.

Let’s turn your vision into code. Book a call, meet the team, or check out our latest builds. The future’s waiting—let’s build it.

What We Build

• AI-Powered Web Apps • Interactive Quizzes & Learning Tools • Custom CRMs & Internal Tools • Lightweight 3D Simulations • Full-Stack MVPs • Chatbot Integrations

Frontend: React.js, Next.js, TailwindCSS Backend: Node.js, Express, Supabase, Firebase, MongoDB AI/LLMs: OpenAI, Claude, Ollama, Vector DBs Infra: AWS, GCP, Azure, Vercel, Bitbucket 3D: Three.js, react-three-fiber, Cannon.js

Published

10 Feb 2024

Words

Jane Doe

Blogs

Scalable MongoDB Design for Vision Metadata

Shawn Wilborne
August 27, 2025
5
min read

Storing and managing large-scale computer vision metadata (e.g. object detections, classification labels, feature embeddings, and processing jobs) poses unique challenges. This includes handling billions of annotation records, supporting fast similarity searches, and accommodating evolving ML models – all while remaining cost-effective. Below we discuss schema design patterns, sharding strategies, embedding storage options, query optimizations, and trade-offs for small vs. enterprise deployments.

Schema Patterns for Detections, Labels, Embeddings, and Jobs

Designing a MongoDB schema for vision data requires balancing embedding vs. referencing data, guided by the cardinality of relationships and access patterns. Key patterns include:

Detections (Object Annotations)
If each image has only a few detected objects (“one-to-few”), embedding them as an array of sub-documents inside the image document is convenient, per the MongoDB “rules of thumb”. This yields faster reads and avoids extra queries. However, if images can have many annotations or if the total annotations are extremely large (“one-to-many” or “one-to-squillions”), it’s better to store detections as separate documents with a reference to the image ID, aligning to the one-to-squillions pattern with parent references. MongoDB’s rough guidance is to avoid arrays that grow into the hundreds per document, and remember the 16 MB document size limit. For billions of detections overall, a separate Detections collection (fields like image_id, bbox, label, confidence) keeps documents bounded and queries indexable by image_id.

Labels (Image or Region Labels)
For a few categorical labels per image, embed an array in the image document. If labels are numerous or must be accessed independently of the image, use a separate Labels collection referencing the image, another case where the “don’t embed if you need separate access” rule applies. This lets you query “all labels by annotator” or “all images with label X” efficiently.

Embeddings
Feature embeddings might be stored per image or per detected object. One small vector per image can be embedded in the image doc, but many or large vectors bloat the doc and risk the 16 MB limit. A scalable approach is a dedicated Embeddings collection with image_id or detection_id, the vector, model_version, and timestamp. This follows the general guidance to embed small, reference large or numerous sub-objects and also prepares you for vector indexing.

Jobs (Processing/Labeling Tasks)
Jobs are naturally a separate collection. Keep fields like type, state, priority, timestamps, and a payload. MongoDB encourages embedding related payloads in the same doc unless there’s a clear reason to split, and you can implement a simple job queue pattern using findAndModify to atomically claim work.

Using Mongoose
In Node.js, Mongoose remains a widely used ODM (community signals, developer discussions). Define schemas with ObjectId references where needed and enforce application-side validation and types for long-term robustness.

Using MongoDB Sharding for Billions of Annotations

Storing billions of annotation documents is feasible with sharding. MongoDB deployments have handled billions of docs in a single collection when data and load are distributed.

Shard Key Selection
Choose a high-cardinality shard key that spreads inserts evenly. Hashing image or video IDs often prevents hot spots while still keeping some locality. If time-range queries are critical, consider compound keys, but beware monotonic keys that skew writes.

Pre-splitting and Balancing
For huge initial loads, pre-split chunks and control the balancer during bulk ingest to avoid massive data movement.

Working Set and Memory
Sharding divides the working set so each shard’s indexes and data are more likely to fit in RAM, and aggregate write throughput can scale roughly linearly as you add shards, a pattern observed in large-scale deployments and community writeups. Real-world users like Craigslist leveraged careful shard keys and chunk management to scale to multi-billion-doc workloads.

Tradeoffs: Storing Embeddings in MongoDB vs. External Vector Databases

MongoDB Atlas Vector Search
Recent MongoDB releases let you keep vectors and metadata together and run combined queries, reducing the “synchronization tax” of maintaining a separate vector store (overview). Atlas Vector Search builds on Lucene’s HNSW for approximate nearest neighbors (HNSW background in MongoDB blog). This is often ideal for moderate scales and simpler ops.

PostgreSQL + pgvector
If you are a SQL-first team, pgvector adds operators and IVF indexes for similarity search. It performs well at moderate scale, though some independent benchmarks show specialized engines outperforming general databases in high-recall, high-QPS settings (Redis benchmark write-up).

Specialized Vector Databases
Managed services like Pinecone and open-source engines like Milvus focus entirely on vector workloads, offering speed and scale advantages for massive corpora and strict latency SLOs (comparison perspective). The tradeoff is extra system complexity and cost to keep vectors in sync with MongoDB.

Query Patterns and Optimizations

Time-Series Annotation Queries
Index timestamps or frame numbers, and include video_id where applicable. Range scans are then efficient. If most queries hit recent data, align shard keys and indexes to avoid concentration on a single shard.

Spatial Overlap Queries
For rectangle-overlap checks, you can filter by axes with range conditions, or model boxes as polygons and use MongoDB’s geospatial indexes with $geoIntersects. Keep per-image searches small when possible and post-filter in memory if each image has few boxes.

Embedding Similarity Search
In Atlas, define a knn index on the vector field and query with an aggregation using HNSW (MongoDB’s explainer). Combine with normal filters and ensure those fields are indexed too. If self-hosting without Atlas Search, consider an external FAISS index in your app layer for candidate generation, then fetch docs from Mongo.

Batch Labeling Jobs
Use a clear state field and proper indexes; have workers atomically claim work via findAndModify patterns). Overall, align schema and indexes to the queries you actually run – the core MongoDB data-modeling principle.

Embedding Use Cases: Retrieval vs. Model Versioning

Embeddings are model outputs and change with architectures and preprocessing. Treat them as versioned artifacts tied to the model. If you only need “current” similarity, overwrite or replace per item to keep the active index lean. If you need auditability or A/B analyses, store multiple versions with an explicit model_version and query within a single versioned space (practical guidance).

Cost and Performance Trade-offs: Small Teams vs. Enterprise Scale

Small teams
Keep it simple. One MongoDB cluster for metadata plus vectors minimizes infra and the need for cross-DB sync, with developer productivity gains noted in practitioner write-ups. Evaluate Atlas tiers and Search node add-ons against your budget using MongoDB’s pricing.

Enterprises
At multi-billion-doc scale, sharding becomes mandatory and you may split responsibilities: MongoDB for operational data and a specialized vector store for high-QPS similarity. Benchmarks suggest specialized engines can deliver much higher throughput at similar accuracy (example results). Expect higher cost and complexity, but better tail latency and headroom.

Scale stepwise
Start unified, profile real bottlenecks, then add shards, read replicas, caches, or specialized engines as needed. MongoDB can handle large workloads when designed around your access patterns; escalate complexity only when the metrics demand it.

URL Index

  1. 6 Rules of Thumb for MongoDB Schema Design
    https://dzone.com/articles/6-rules-thumb-mongodb-schema-1
  2. MongoDB: Billions of documents in a collection
    https://stackoverflow.com/questions/11320907/mongodb-billions-of-documents-in-a-collection
  3. What is the correct Mongo schema design for a job queue
    https://stackoverflow.com/questions/13433942/what-is-the-correct-mongo-schema-design-for-a-job-queue
  4. Mongoosejs post on X
    https://twitter.com/mongoosejs/status/1769899569192591505
  5. Do a lot of teams use Mongoose over the native driver
    https://www.reddit.com/r/node/comments/15v7j6c/do_lot_of_them_use_mongoose_over_mongodb_native/
  6. The Benefits of Using MongoDB as a Vector Database
    https://medium.com/@abuadonald/the-benefits-of-using-mongodb-as-a-vector-database-4e9d1da8251d
  7. Vector Search and LLM Essentials – MongoDB Blog
    https://www.mongodb.com/company/blog/vector-search-llm-essentials-what-when-why
  8. Benchmarking results for vector databases – Redis
    https://redis.io/blog/benchmarking-results-for-vector-databases/
  9. MongoDB Atlas vs Pinecone – Zilliz comparison
    https://zilliz.com/comparison/mongodb-atlas-vs-pinecone
  10. How to store and query embeddings in PostgreSQL without losing your mind
    https://www.postgresql.fastware.com/blog/how-to-store-and-query-embeddings-in-postgresql-without-losing-your-mind
  11. MongoDB Pricing
    https://www.mongodb.com/pricing

Written By
Shawn Wilborne
AI Builder
Lamar Giggetts
Software Architect