interface image of employee interacting with hr software — Every pixel, powered by AI & code.

AI Web Apps. Built to Win.

From Miami to the world—Lid Vizion crafts blazing-fast, AI-powered web apps for startups, educators, and teams who want to move fast and scale smarter. We turn your wildest ideas into real, working products—no fluff, just results.

Our Tech Stack Superpowers

React.js, Node.js, MongoDB, AWS
GPT-4, Claude, Ollama, Vector DBs
Three.js, Firebase, Supabase, Tailwind

We blend cutting-edge AI with rock-solid engineering. Whether you need a chatbot, a custom CRM, or a 3D simulation, we’ve got the tools (and the brains) to make it happen—fast.

No cookie-cutter code here. Every project is custom-built, modular, and ready to scale. We keep you in the loop with weekly updates and async check-ins, so you’re never left guessing.

“Tech moves fast. We move faster.”

Trusted by startups, educators, and SaaS teams who want more than just another app. We deliver MVPs that are ready for prime time—no shortcuts, no surprises.

Ready to level up? Our team brings deep AI expertise, clean APIs, and a knack for building tools people actually love to use. Let’s make your next big thing, together.

From edge AI to interactive learning tools, our portfolio proves we don’t just talk tech—we ship it. See what we’ve built, then imagine what we can do for you.

Questions? Ideas? We’re all ears. Book a free consult or drop us a line—let’s build something awesome.

Why Lid Vizion?

Fast MVPs. Modular code. Clear comms. Flexible models. We’re the partner you call when you want it done right, right now.

Startups, educators, agencies, SaaS—if you’re ready to move beyond just ‘playing’ with AI, you’re in the right place. We help you own and scale your tools.

No in-house AI devs? No problem. We plug in, ramp up, and deliver. You get the power of a full-stack team, minus the overhead.

Let’s turn your vision into code. Book a call, meet the team, or check out our latest builds. The future’s waiting—let’s build it.

What We Build

• AI-Powered Web Apps • Interactive Quizzes & Learning Tools • Custom CRMs & Internal Tools • Lightweight 3D Simulations • Full-Stack MVPs • Chatbot Integrations

Frontend: React.js, Next.js, TailwindCSS Backend: Node.js, Express, Supabase, Firebase, MongoDB AI/LLMs: OpenAI, Claude, Ollama, Vector DBs Infra: AWS, GCP, Azure, Vercel, Bitbucket 3D: Three.js, react-three-fiber, Cannon.js

Published

10 Feb 2024

Words

Jane Doe

Blogs

On-Device Photo Intelligence Engines (Privacy-First Clustering and Personalization)

min read

‍

On-device ML lets modern iOS apps analyze and organize photos without sending images to a server. Apple’s own Photos app “uses a number of machine learning algorithms, running privately on-device,” to power features like People and Memories (private knowledge graphs of people/places/things) (Apple ML Research). Keeping inference local means images never leave the device—great for latency, offline use, and privacy/GDPR risk reduction (Fritz: on-device benefits; Apple ML Research). By contrast, cloud-only filters like the 2019 FaceApp spike triggered public concern by sending faces to remote servers (Fritz: FaceApp discussion).

Core idea: train/distill heavy models off-device, ship a compact Core ML model to iOS, compute embeddings locally, and do similarity search & clustering on-device. Optionally sync embeddings/labels (not raw photos) to the cloud for cross-device personalization.

Why on-device?

Privacy by default: images remain on the phone; only optional features/embeddings may sync. Less GDPR/PII exposure, fewer breach vectors (Fritz; Apple ML Research).
Latency & offline: Apple Neural Engine accelerates inference with near-instant response and no round-trip delays (Fritz).
Personalization: build a private, on-device knowledge graph of people/places/things that powers naming, clustering, deduping, and “Memories” (Apple ML Research).

System at a glance

Teacher model (CLIP/Vit) → 2) Distilled student (MobileNet/Tiny ViT) → 3) Convert to Core ML → 4) On-device embeddings (Vision/Core ML) → 5) Local index & clustering (SQLite/Core Data) → 6) Optional cloud sync (only embeddings/labels).

Distillation: CLIP teacher → tiny student

Compress the teacher’s representational power into a small model that runs great on iPhones. Knowledge distillation trains the student to mimic teacher outputs (logits/embeddings), “compressing and accelerating” without big accuracy loss (Distillation explainer).

Teacher: CLIP image encoder (e.g., ViT) is powerful but large (≈350 MB FP32 typical for original CLIP artifacts) (PicCollage).
Student: MobileNet/Tiny-ViT sized for Core ML/ANE. Teams have reported ~7× compression (350 MB → 48 MB FP32, 24 MB FP16) with “negligible” search accuracy loss after Core ML conversion (PicCollage). Apple’s MobileCLIP family shows similar size/quality tradeoffs (e.g., largest variant ≈173 MB) (MobileCLIP overview).
Training is offline: distill on GPUs in the cloud; only ship the student to devices—no on-device training assumed (distillation workflow).

Cost sanity check: an AWS p3.2xlarge (V100) is ≈$3.06/hr on-demand (spot ≈$0.97/hr). A 5–10 hr distillation job runs roughly $15–$30 on-demand; less on spot (Vantage: p3.2xlarge).

PyTorch → Core ML (and quantize)

Use coremltools to convert PyTorch directly to .mlmodel (TorchScript tracing/scripting), then apply FP16 or even 8-bit post-training quantization to cut size/latency (coremltools: PyTorch conversion). If you hit unsupported ops, ONNX can be a fallback—but Apple notes direct PyTorch conversion is preferred (ONNX notes).

Extract embeddings on iOS (Vision/Core ML)

Two practical options:

Your distilled Core ML model via VNCoreMLRequest to get a 512/768-d embedding per photo (CLIP-style).
Vision feature prints: VNGenerateImageFeaturePrintRequest yields normalized 768-d vectors (iOS 17), comparable with Euclidean (≈cosine) distance. In practice, near-duplicate thresholds around ~0.4–0.6 (normalized distance) work well—tune per dataset (Vision feature prints write-up).

Clustering & deduping:

Persist embeddings locally (Core Data/SQLite).
For small libraries, brute-force NN (cosine/Euclidean) is fine; for larger sets, use product quantization or HNSW (client-side) as needed.
Apple’s Photos research uses agglomerative clustering on on-device face/body embeddings for People albums—privacy-preserving and effective (Apple ML Research).

Personalization scenarios (all on-device)

Duplicate & burst pruning: thresholded NN + lightweight agglomerative clusters to collapse near-duplicates.
Smart albums: cluster by scene/subject; combine embeddings with EXIF/time/location for “Trip to SF 2024.”
Semantic search: with a text encoder (distilled CLIP text tower), compare text embeddings to image embeddings for queries like “corgi in the snow.” The same vector search idea powers server demos too (MongoDB tutorial examples).

Optional cloud sync (hybrid)

On-device only: maximum privacy—no images or vectors leave the phone (Fritz).
Cloud-optional: sync embeddings/labels only (encrypted) for cross-device search. MongoDB Atlas supports Vector Search (store {"_id": photoId, "embedding": [...]}; query via $vectorSearch) (Atlas examples).
- Embedding sizes are tiny (~768 floats ≈ ~3 KB each). Even 100k photos ≈ 300 MB of vectors.
- Remember: embeddings can leak semantics if compromised; treat as sensitive (encrypt at rest/in transit).

iOS implementation checklist

Model: distill, export .mlmodel, FP16 if quality holds.
Runtime: use Vision for VNCoreMLRequest/VNGenerateImageFeaturePrintRequest; batch over Photos library with background tasks.
Index: Core Data/SQLite with schema: {photoId, ts, exif, embedding, clusters}.
Clustering: start with thresholded NN + agglomerative; add HNSW if you need faster queries.
UX: privacy notice + toggles, progress UI for first-run indexing, “review duplicates” surfaces.
Power: schedule heavy work on charge/Wi-Fi; incremental updates via PhotoKit change events.
Testing: calibrate distance thresholds per device/photo domain; A/B FP16 vs FP32.

Architecture (one slide view)

Cloud (offline): Pretrain/Distill CLIP → export student → Core ML convert/quantize → deliver .mlmodel.
Device: iOS app (Swift) → Core ML & Vision infer → store embeddings locally → NN search & clustering → personalization UI.
Optional: Encrypted sync of embeddings/labels to MongoDB Atlas vector index for cross-device search.

tl;dr

Privacy-first wins: keep photos and inference on-device; build a private knowledge graph (Apple does this in Photos) (Apple ML Research).
Distill big → small: CLIP-class teacher → compact Core ML student; FP16 cuts size with minimal loss (PicCollage).
Use Vision/Core ML: 512/768-d embeddings; cluster & dedupe with simple thresholds; agglomerative works well for People-style grouping (Vision feature prints; Apple ML Research).
Cloud optional: if you must sync, upload embeddings/labels only and treat them as sensitive; Atlas Vector Search can power cross-device queries (MongoDB tutorial).
Cost: one-time GPU training is the big line item; on-device inference is effectively free at runtime (p3.2xlarge pricing).