Our Tech Stack, Your Superpower

We build blazing-fast, AI-powered web apps using the latest tech. From React to GPT-4, our stack is built for speed, scale, and serious results.

What Powers Our Projects

  1. React.js, Node.js, MongoDB, AWS
  2. GPT-4, Claude, Ollama, Vector DBs
  3. Three.js, Firebase, Supabase, TailwindCSS

Every project gets a custom blend of tools—no cookie-cutter code here. We pick the right tech for your goals, so your app runs smooth and grows with you.

“Great tech is invisible—until it blows your mind.”

We obsess over clean code, modular builds, and explainable AI. Weekly updates and async check-ins keep you in the loop, minus the jargon.

Trusted by startups, educators, and SaaS teams who want more than just ‘off-the-shelf’ solutions.

Why Our Stack Stands Out

We don’t just follow trends—we set them. Our toolkit is always evolving, so your product stays ahead of the curve.

From MVPs to full-scale platforms, we deliver fast, flexible, and future-proof solutions. No tech headaches, just results.

Ready to build smarter? Let’s turn your vision into a launch-ready app—powered by the best in AI and web tech.

Lid Vizion: Miami-based, globally trusted, and always pushing what’s possible with AI.

interface image of employee interacting with hr software
Every pixel, powered by AI & code.

AI Web Apps. Built to Win.

From Miami to the world—Lid Vizion crafts blazing-fast, AI-powered web apps for startups, educators, and teams who want to move fast and scale smarter. We turn your wildest ideas into real, working products—no fluff, just results.

Our Tech Stack Superpowers

  1. React.js, Node.js, MongoDB, AWS
  2. GPT-4, Claude, Ollama, Vector DBs
  3. Three.js, Firebase, Supabase, Tailwind

We blend cutting-edge AI with rock-solid engineering. Whether you need a chatbot, a custom CRM, or a 3D simulation, we’ve got the tools (and the brains) to make it happen—fast.

No cookie-cutter code here. Every project is custom-built, modular, and ready to scale. We keep you in the loop with weekly updates and async check-ins, so you’re never left guessing.

“Tech moves fast. We move faster.”

Trusted by startups, educators, and SaaS teams who want more than just another app. We deliver MVPs that are ready for prime time—no shortcuts, no surprises.

Ready to level up? Our team brings deep AI expertise, clean APIs, and a knack for building tools people actually love to use. Let’s make your next big thing, together.

From edge AI to interactive learning tools, our portfolio proves we don’t just talk tech—we ship it. See what we’ve built, then imagine what we can do for you.

Questions? Ideas? We’re all ears. Book a free consult or drop us a line—let’s build something awesome.

Why Lid Vizion?

Fast MVPs. Modular code. Clear comms. Flexible models. We’re the partner you call when you want it done right, right now.

Startups, educators, agencies, SaaS—if you’re ready to move beyond just ‘playing’ with AI, you’re in the right place. We help you own and scale your tools.

No in-house AI devs? No problem. We plug in, ramp up, and deliver. You get the power of a full-stack team, minus the overhead.

Let’s turn your vision into code. Book a call, meet the team, or check out our latest builds. The future’s waiting—let’s build it.

What We Build

• AI-Powered Web Apps • Interactive Quizzes & Learning Tools • Custom CRMs & Internal Tools • Lightweight 3D Simulations • Full-Stack MVPs • Chatbot Integrations

Frontend: React.js, Next.js, TailwindCSS Backend: Node.js, Express, Supabase, Firebase, MongoDB AI/LLMs: OpenAI, Claude, Ollama, Vector DBs Infra: AWS, GCP, Azure, Vercel, Bitbucket 3D: Three.js, react-three-fiber, Cannon.js

Published

10 Feb 2024

Words

Jane Doe

Blogs

Efficient Data Retrieval from Images: Using LlamaIndex, Mongo, and AWS

Shawn Wilborne
August 27, 2025
6
min read

Image-centric RAG augments (or replaces) text-only retrieval by indexing image embeddings directly. Instead of captioning images first (and losing detail), we embed images (e.g., CLIP) and run vector similarity search to fetch the most relevant visuals for a text or image query. LlamaIndex’s MultiModalVectorStoreIndex can store CLIP/VoyageAI embeddings in MongoDB Atlas, so a plain text query retrieves semantically similar images (and/or their captions) from one vector store—often more accurate than caption-only pipelines (OpenAI Cookbook; LlamaIndex → Mongo).

Atlas Vector Search is built-in (no extra fee for the feature), and even the Free Tier supports vector indexing—making image RAG cost-friendly for startups (Mongo forum; Mongo pricing).

Architecture at a glance

S3 (images) → Lambda (embeddings/captions) → MongoDB Atlas (vectors + metadata) → LlamaIndex (retriever) → LLM/UI

  • Storage: Amazon S3 holds raw images (≈$0.023/GB-mo for Standard) and triggers processing on upload (S3 pricing guide).
  • Compute: An embedding service (Lambda, SageMaker, or a small GPU container) generates vectors (CLIP, VoyageAI) and optional captions (BLIP → then text-embeddings). Lambda pricing is $0.20/million requests + $0.00001667/GB-s (AWS Lambda pricing).
  • Index: MongoDB Atlas with a Vector Search index on embedding. LlamaIndex’s MongoDBAtlasVectorSearch adapter wires it up (LlamaIndex Mongo).
  • Query: A user’s text or image query is embedded in the same space; LlamaIndex retrieves Top-K vectors (with optional metadata filters) and returns images + captions to the app/LLM (LlamaIndex multimodal example; OpenAI Cookbook).

Choosing embedding models (CLIP, VoyageAI, BLIP)

  • Direct image embeddings: CLIP (e.g., ViT-B/32 via PyTorch) or VoyageAI multimodal map images and text into a shared vector space—perfect for text→image and image→image search (OpenAI Cookbook; LlamaIndex multimodal example).
  • Caption-then-embed: If you need captions, run BLIP/BLIP-2 to generate one, then embed with a text model (e.g., OpenAI text-embedding). This is flexible, but tends to be lossier than CLIP-style direct image embeddings for nearest-neighbor retrieval (OpenAI Cookbook).

Implementation tip: Keep one canonical dimension (e.g., 512 or 768) across the corpus; don’t mix vector sizes in the same index.

Ingestion pipeline (step-by-step)

  1. Upload to S3 (with metadata)
    Store the image and record metadata (filename, tags, EXIF/GPS). S3 events will kick off embedding. Costs are tiny: 100 GB ≈ $2.30/mo, 1 TB ≈ $23/mo (S3 pricing guide).
  2. Embedding extraction (Lambda or endpoint)
    • S3 event → Lambda pulls the image, calls CLIP/VoyageAI (local PyTorch, SageMaker endpoint, or Bedrock-hosted).
    • Optional: run BLIP to create a caption and a text embedding for hybrid search.
      Cost sanity check: 3 M images @ 120 ms each, 1.5 GB memory → ~540k GB-s. With 400k GB-s free + 1M free requests, net ≈ $2.33 (compute) + $0.40 (2M billable requests) ≈ $2.73 total (AWS Lambda pricing).
  3. Index in MongoDB Atlas (vectors + metadata)
    Configure Vector Search on embedding (cosine or dot-product). Then store documents like:
  4. {
     "_id": "img_123",
     "s3_key": "catalog/2025/08/23/img_123.jpg",
     "embedding": [/* d floats */],
     "caption": "vintage red coupe on city street",
     "meta": {"brand":"Acme", "category":"car", "uploadedAt":"2025-08-23T15:12:00Z"}
    }
  5. Vector Search is included; you pay for the cluster (e.g., Shared/Free, or M20 ≈ $0.08/hr ≈ $60/mo) (Mongo pricing; forum).
  6. Build the LlamaIndex
    Use MongoDBAtlasVectorSearch in the StorageContext, then build a MultiModalVectorStoreIndex from your image docs (LlamaIndex Mongo; multimodal example).

Minimal setup (illustrative)

Create Atlas vector index & build LlamaIndex

# pip install llama-index llama-index-vector-stores-mongodb
from llama_index.core import StorageContext, VectorStoreIndex, Document
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
from pymongo import MongoClient

MONGO_URI = "mongodb+srv://..."
client = MongoClient(MONGO_URI)
db = client["image_search"]
coll = db["images"]

# 1) Configure Atlas Vector Search (one-time, in Atlas UI or via code)
# Example (conceptual): dimensions=512, cosine similarity
# vector_store.create_vector_search_index(path="embedding", dimensions=512, similarity="cosine")

# 2) Wire Mongo vector store into LlamaIndex
vector_store = MongoDBAtlasVectorSearch.from_collection(coll, index_name="embedding_index")
storage_ctx = StorageContext.from_defaults(vector_store=vector_store)

# 3) Upsert documents with embeddings already present in Mongo (from your Lambda step)
# Or, if you embed here, attach your image embed model to VectorStoreIndex.from_documents(...)
index = VectorStoreIndex.from_documents([], storage_context=storage_ctx)

# 4) Query (text -> image)
retriever = index.as_retriever(similarity_top_k=6)
results = retriever.retrieve("red vintage cars at night")
for node in results:
   print(node.metadata.get("s3_key"), node.score)

(Exact helpers vary by version; align with the current LlamaIndex API and your Atlas index settings.)
Docs: LlamaIndex Mongo adapter/API (link); multimodal example (link).

Hybrid retrieval: vector + filters

Blend semantic and structured search in one call:

  • Vector: nearest neighbors in embedding.
  • Filters: Mongo fields—e.g., { "meta.category": "car", "meta.uploadedAt": { "$gte": ... } }.
  • LlamaIndex supports metadata filters + Top-K vector retrieval, e.g., “red sneakers” AND brand=Acme.

This yields precise results without over-fetching and keeps your index compact.

Serving results

  • Keep originals in S3; front with CloudFront for global, low-latency delivery.
  • Return signed URLs to clients, or pipe results into an LLM for multimodal chat (“show and describe the top-3 images”).
  • For “find similar to this image,” embed the query image client-side and hit the same vector store.

Cost & sizing cheat-sheet

  • Atlas Vector Search: feature is free; pay for cluster (Free/Shared → $, M20 ≈ $60/mo). Plenty for 10^5–10^6 images if vectors are small (forum; pricing).
  • S3: $0.023/GB-mo (Standard). 50 GB ≈ $1.15/mo; 1 TB ≈ $23/mo (S3 guide).
  • Lambda embedding jobs: essentially dollars-scale for millions of images, thanks to free-tiers + per-use pricing (Lambda pricing).
  • Throughput: Use S3 events + Lambda concurrency for bursts; fall back to SageMaker or a small GPU service for heavy models/batching.

Best practices

  • Canonicalize embeddings: uniform dims & metric (cosine vs dot).
  • Normalize vectors: improves search stability.
  • Store captions + EXIF: hybrid queries (“red coats” + city=Paris).
  • Chunk big batches: throttle to respect Atlas write limits; use bulk writes.
  • Version your models: keep embedding_v in docs; reindex selectively on upgrades.
  • Test metrics: A/B CLIP vs BLIP-caption+text-embed on your data; CLIP often wins for pure image similarity (OpenAI Cookbook).

TL;DR

  • Image-first RAG with CLIP/VoyageAI embeddings in MongoDB Atlas Vector Search improves accuracy over caption-only pipelines (OpenAI Cookbook).
  • AWS + LlamaIndex gives a tiny, pay-as-you-go stack: S3 → Lambda → Atlas; LlamaIndex handles multimodal retrieval & filters (LlamaIndex Mongo; multimodal example).
  • Costs stay low: Atlas small cluster (~$60/mo), S3 pennies/GB, Lambda dollars for millions of embeddings (Mongo pricing; S3; Lambda).
  • Result: fast, accurate visual search + hybrid filters that make your images as searchable and actionable as text.

URL Index

  1. Multimodal RAG with CLIP (image search) — OpenAI Cookbook
    https://cookbook.openai.com/examples/custom_image_embedding_search
  2. Atlas Vector Search paid or free? — MongoDB Forum
    https://www.mongodb.com/community/forums/t/is-vector-search-feature-paid-or-free/267191
  3. LlamaIndex → MongoDB Atlas Vector Search (API)
    https://docs.llamaindex.ai/en/stable/api_reference/storage/vector_store/mongodb/
  4. MongoDB Pricing (Shared/Dedicated incl. M20)
    https://www.mongodb.com/pricing
  5. S3 Pricing (guide/estimates)
    https://www.cloudzero.com/blog/s3-pricing/
  6. AWS Lambda Pricing (GB-s & free-tier)
    https://aws.amazon.com/lambda/pricing/
  7. (same)
    https://aws.amazon.com/lambda/pricing/
  8. LlamaIndex multimodal (VoyageAI + Mongo) example
    https://docs.llamaindex.ai/en/stable/examples/multi_modal/llamaindex_mongodb_voyageai_multimodal/

Written By
Shawn Wilborne
AI Builder