Blogs

Building a Secure Expungement Automation Tool with Lid Vizion: OCR, Rules, Form Fill, and Secure Storage

Diagram-style cover image for secure document automation workflow
Shawn Wilborne
August 27, 2025
7
min read

Building a Secure Expungement Automation Tool with Lid Vizion: OCR, Rules, Form Fill, and Secure Storage

Legal and compliance teams don’t drown because they lack OCR. They drown because they’re forced to turn unstructured PDFs into auditable decisions—manually.

If your workflow looks like “download case PDF → copy/paste key fields → check eligibility rules → fill out official forms → store everything somewhere safe,” the real challenge is building a trusted pipeline that:

  • extracts facts deterministically
  • validates them (and routes exceptions)
  • runs a rules engine with versioning and citations
  • produces court-ready outputs
  • logs a complete decision trail

This is exactly the kind of system Lid Vizion is built to accelerate.

Lid Vizion is infrastructure—a set of pre-built platform components (Identity, Event Engine, Rules Engine, API Layer, Analytics) that you wire into your workflow so you can go live quickly without rebuilding the same plumbing for every document-heavy use case.

In this guide, we’ll walk through a Maryland expungement automation MVP architecture:

  • ingest case PDFs + supporting reports
  • OCR and extract structured fields
  • validate and review exceptions
  • determine eligibility via deterministic rules
  • auto-fill official forms
  • store inputs/outputs securely (SharePoint)

…and we’ll map each step to the Lid Vizion platform components that make the system shippable.

The workflow you’re really building (and why it breaks)

For expungement, sequencing matters:

  1. Upload Maryland case PDF → extract case number, charges, dispositions, key dates.
  2. Upload National Criminal Case Search report → extract out-of-state blockers.
  3. Run eligibility logic → produce a decision report with citations and “facts used.”
  4. Generate court-ready PDFs → petitions/forms + supporting packet.
  5. Store the packet securely with RBAC + audit logs.

Most teams get stuck because they treat the problem as “document parsing,” but production success depends on:

  • repeatable IDs across artifacts and events
  • a consistent event log and retry story
  • rule versioning and explainability
  • exception handling (human-in-the-loop)

Lid Vizion reference architecture (components that ship)

Here’s a pragmatic architecture that balances speed-to-MVP with auditability.

1) Identity Layer (IDs that make the system traceable)

Every packet needs durable identifiers:

  • case_id
  • document_id
  • packet_run_id
  • rule_version

With Lid Vizion’s Identity Layer, you generate these IDs consistently and attach them everywhere: logs, database rows, files, and downstream integrations.

2) Event Engine (make every step observable)

Instead of “a script ran,” you want structured events:

  • document_uploaded
  • ocr_completed
  • extraction_normalized
  • validation_failed / validation_passed
  • eligibility_decided
  • forms_generated
  • packet_uploaded_to_sharepoint

This is your operational backbone for retries, monitoring, and audits.

3) Rules Engine (deterministic decisioning + routing)

Two different rule sets matter:

  • Eligibility rules (legal logic + waiting periods + exclusions)
  • Workflow rules (routing, review thresholds, fraud/quality flags)

Lid Vizion’s Rules Engine gives you a clean place to express both.

4) API Layer (integrations without duct tape)

Your system will need to integrate with:

  • storage (SharePoint)
  • auth/RBAC
  • internal systems
  • notifications

The API Layer + webhooks keep it modular.

5) Analytics & Intelligence Dashboard

Once you have IDs + events, analytics becomes easy:

  • throughput (packets/day)
  • exception rate
  • OCR confidence distribution
  • time-to-decision
  • reviewer edits per packet

That’s how you prove ROI and continuously improve.

Ingestion + OCR: treat PDFs as hostile input

In legal workflows, a “PDF” is often:

  • a scanned image
  • inconsistent layouts
  • multi-page bundles
  • tables, stamps, checkboxes

Amazon Textract is a solid default for OCR and structured extraction (forms/tables) and supports image files and PDFs. (Textract docs: https://docs.aws.amazon.com/textract/latest/dg/what-is.html)

Lid Vizion pattern: store originals + artifacts + provenance

For every document:

  • store the original file immutably
  • hash it (integrity)
  • store raw OCR output
  • store normalized facts (schema)
  • store provenance pointers (page + bounding box + extractor version)

The Identity Layer makes every artifact addressable; the Event Engine records each stage.

From extraction to truth: validation gates are the product

OCR will be wrong sometimes. The pipeline succeeds when errors are:

  • detected early
  • isolated cleanly
  • corrected efficiently

Gate 1: schema + type validation

  • required fields present
  • dates parse correctly
  • enumerations map to known values

Gate 2: internal consistency checks

  • disposition date >= arrest date
  • waiting period anchors calculated correctly

Gate 3: human-in-the-loop review

Route exceptions to a reviewer UI:

  • show the source PDF page next to extracted fields
  • allow edits
  • require reviewer notes for changes

In Lid Vizion terms:

  • validation failures become Events
  • routing is handled by Rules Engine
  • reviewer actions are logged for audit + analytics

Eligibility engine: deterministic rules + versioning + citations

For expungement, treat decisioning like production code:

  • version the rules engine
  • store rule_version with every outcome
  • persist the exact “facts used”
  • generate an explanation report with citations

The anti-hallucination contract

Whether you use an LLM later to write a narrative summary or not, the system should enforce:

  • the engine may only reference facts present in your normalized schema
  • missing facts must be surfaced explicitly (e.g., “Missing: disposition_date”) and routed to review

That’s how you produce decisions you can defend.

Forms: deterministic mapping beats clever rendering

Court forms should be generated from deterministic mappings:

  • map each extracted fact to a specific form field
  • version the mapping
  • generate preview + final PDFs

Deliverables typically include:

  • eligibility_report.pdf (facts + decision + citations)
  • petition_form_filled.pdf
  • supporting exhibits

Secure storage in SharePoint (Graph API)

Many orgs already standardize on Microsoft 365. Microsoft Graph supports SharePoint sites, lists, and document libraries. (Graph SharePoint overview: https://learn.microsoft.com/en-us/graph/api/resources/sharepoint?view=graph-rest-1.0)

For uploads, Graph supports PUT .../content for files (docs note up to 250 MB for “small file” upload). (Upload method: https://learn.microsoft.com/en-us/graph/api/driveitem-put-content?view=graph-rest-1.0&tabs=http)

Where Lid Vizion helps:

  • Identity Layer ensures every uploaded artifact has a stable ID and naming convention
  • Event Engine logs each upload/download action
  • Rules Engine can enforce retention policies or route sensitive packets

Security: encrypt, isolate, audit

A legal workflow isn’t secure because it uses TLS. You need:

  • encryption at rest for DB + artifacts
  • strict access controls
  • audit logs for uploads, edits, and packet generation

AWS KMS is commonly used for key management and access controls around encryption keys. (KMS overview: https://docs.aws.amazon.com/kms/latest/developerguide/overview.html)

Operational checklist (what makes it production)

  • IDs everywhere (case_id, document_id, packet_run_id)
  • Event log for every stage + retries
  • Normalized facts + provenance
  • Validation gates + review UI
  • Versioned eligibility rules + versioned form mappings
  • Secure storage + audit trail
  • Analytics for throughput, exception rate, time-to-decision

Key takeaways

  • OCR is table stakes; validation + auditability is the product.
  • Durable IDs + event logs turn a brittle script into a shippable system.
  • Deterministic rules + “facts used” reporting prevents hallucinations.
  • Lid Vizion accelerates delivery by providing pre-built components:
    • Identity Layer
    • Event Engine
    • Rules Engine
    • API Layer
    • Analytics Dashboard

FAQs

Q: Can we use an LLM to decide eligibility? A: You can, but you usually shouldn’t. Use deterministic rules for the decision. If you use an LLM, constrain it to summarizing outcomes from facts already in your schema.

Q: What if OCR confidence is low? A: Route it to review. Treat confidence thresholds as workflow rules (Rules Engine) and log the exception as an event.

Q: How do we make this defensible in an audit? A: Store provenance (page + bounding box), rule version, mapping version, and a complete event trail of who changed what and why.

Q: Why SharePoint instead of S3? A: If your org already runs on Microsoft 365, SharePoint can reduce adoption friction. The key is tight permissions + logging. (Graph APIs support document library access.)

Q: How do we prove ROI? A: Track cycle time, exception rate, and reviewer touches per packet. Put those metrics in an analytics dashboard and compare baseline vs post-automation.

Written By
Shawn Wilborne
AI Builder