Document processing is one of the most practical jobs for a self-hosted AI agent. It is repetitive, context-heavy, privacy-sensitive, and usually connected to follow-up work that lives outside the document itself.
A team may receive invoices, contracts, purchase orders, onboarding forms, support attachments, receipts, research PDFs, screenshots, reports, or scanned letters. The documents need to be saved, named, classified, summarized, checked against rules, and routed to the right person. Humans can do this. Humans should not have to do all of it manually.
OpenClaw works well for document processing automation because it can combine local files, scheduled tasks, OCR tools, private model routing, structured extraction, message channels, and proof files. The goal is not to let an agent approve a contract or pay an invoice by itself. The goal is to remove the boring handling layer and create a clean review queue.
This guide shows how to design a self-hosted AI agent workflow for private document processing with OpenClaw.
Why document processing is a good self-hosted workflow
Documents often contain data you do not want to paste into random cloud tools. Invoices include vendor names, tax numbers, addresses, bank details, item lines, and payment terms. Contracts include private pricing, customer names, renewal dates, and legal obligations. Internal reports include strategy, financial data, or operational weak spots.
A self-hosted agent gives you three advantages.
First, the files can stay on your own machine or server. The workflow can use a local model for classification and extraction, then escalate only low-risk summaries to a stronger cloud model when the policy allows it.
Second, the workflow can be built around local proof. Every processed file can produce a small record showing the source path, checksum, extraction result, confidence, reviewer, and next action.
Third, the system can connect document handling to operations. A PDF is rarely the endpoint. It usually creates a task, a calendar reminder, a support reply, a payment review, a CRM note, or a knowledge base update.
The workflow you should build first
Do not start with a universal document brain. Start with one narrow lane.
A good first workflow is invoice intake because the structure is predictable and the stakes are clear. The agent should:
- Watch one intake folder
- Detect new PDFs or images
- Extract plain text
- Classify the document type
- Pull key fields into a structured file
- Flag missing or unusual fields
- Save a proof note
- Send a short review message to a human
- Move the file into a dated archive only after review
This is enough to save time without pretending that automation has solved accounting.
Once that lane works, you can add contracts, receipts, policy documents, research PDFs, or support attachments. The order matters. Reliable small workflows beat clever universal workflows.
A practical folder structure
Keep the first version boring:
documents/
inbox/
processing/
needs-review/
approved/
archived/
rejected/
proofs/
extracted/
The agent should never edit the original file in place. It should copy or move the file through states and write sidecar files for extraction.
For example:
documents/extracted/2026-06-25_vendor-invoice-1842.json
documents/proofs/2026-06-25_vendor-invoice-1842.md
This gives a human a trail. If an extraction is wrong, you can inspect the original document, the OCR output, the structured data, and the agent note.
Step 1: define document types
The first design choice is not the model. It is the taxonomy.
For invoice processing, use a small set:
- Vendor invoice
- Credit note
- Receipt
- Statement
- Unknown finance document
- Not finance
For contracts, use a different set:
- New customer agreement
- Vendor agreement
- Renewal notice
- Data processing agreement
- Amendment
- Termination notice
- Unknown legal document
The agent should be allowed to say unknown. That is a feature, not a failure. Most mistakes in document automation come from forcing a confident label when the input is messy.
Step 2: extract text before asking the model
Use deterministic tools before language models. PDFs with embedded text should be converted to text directly. Scanned PDFs and photos need OCR. The language model should receive the best available plain text plus basic metadata.
The input packet can include:
- File name
- File size
- Page count
- OCR confidence if available
- Extracted plain text
- Source folder
- Received timestamp
This keeps the model focused on interpretation instead of file handling.
Step 3: write a strict output schema
Agents become easier to trust when they produce predictable outputs. For invoices, use a JSON schema like this:
{
"document_type": "vendor_invoice",
"vendor_name": "",
"invoice_number": "",
"invoice_date": "",
"due_date": "",
"currency": "",
"total_amount": "",
"tax_amount": "",
"payment_terms": "",
"line_items_summary": "",
"risk_flags": [],
"confidence": "low|medium|high",
"needs_human_review": true
}
The schema should not be too large. If you ask for 60 fields, you will spend more time debugging missing values than processing documents. Start with the fields that decide routing.
Step 4: add risk flags
Risk flags are the reason the workflow becomes useful. A summary is nice. A clear exception queue is better.
Useful invoice flags include:
- Missing invoice number
- Missing due date
- Bank details present
- Total amount over review threshold
- Vendor not recognized
- Currency mismatch
- Duplicate invoice number
- Due date within seven days
- Low OCR confidence
- Handwritten or scanned source
- Tax amount missing
The agent does not need to decide whether the invoice is valid. It needs to say which invoices need attention first.
Step 5: route by confidence and risk
A simple routing policy is enough:
- High confidence, no risk flags: move to needs-review with a short summary
- Medium confidence or minor flags: move to needs-review and tag as priority
- Low confidence or critical flags: keep in processing and alert a human
- Unknown document type: move to needs-review and ask for manual classification
Do not auto-approve payments. Do not auto-sign contracts. Do not auto-delete documents. The agent can prepare the queue. A human should own irreversible decisions.
Step 6: create proof notes
Every document should get a proof note. It does not need to be long.
# Document Processing Proof
- Source file: documents/inbox/vendor-invoice-1842.pdf
- Processed at: 2026-06-25 10:00
- Document type: vendor_invoice
- Confidence: medium
- Risk flags: due_date_within_7_days, bank_details_present
- Output JSON: documents/extracted/2026-06-25_vendor-invoice-1842.json
- Human review required: yes
- Agent action: moved to needs-review
This proof file is what turns an AI workflow from a mystery box into an operations system.
Step 7: use local and cloud models carefully
Document workflows are a natural fit for model routing.
Use a local model when:
- The document contains private data
- The task is classification
- The extraction schema is simple
- The document is short
- The result will be reviewed by a human
Use a stronger cloud model only when:
- The document is long or legally complex
- The text quality is poor
- The summary must be polished
- The workflow policy allows external processing
- Sensitive fields have been redacted first
The safest default is local classification and extraction, then optional human-approved escalation for difficult documents.
Step 8: build a review message
The agent should send concise review messages, not long essays.
Example:
Invoice needs review:
- Vendor: Acme Hosting
- Total: EUR 842.00
- Due: 2026-07-01
- Flags: new vendor, bank details present
- File: documents/needs-review/acme-hosting-842.pdf
- Proof: documents/proofs/2026-06-25_acme-hosting-842.md
This is enough for a human to decide what to inspect next.
Common mistakes
The first mistake is trying to process every document type on day one. Build one lane, measure it, then expand.
The second mistake is treating OCR output as truth. OCR can miss numbers, swap characters, and drop table structure. Keep confidence visible.
The third mistake is mixing extraction and approval. Extraction is an agent job. Approval is a human job unless the organization has a mature control system.
The fourth mistake is failing to preserve originals. Never overwrite the source file. Every processing step should be recoverable.
The fifth mistake is sending private documents to cloud models without a policy. Model routing should be explicit.
A 30-day implementation plan
Week one: build the intake folder, OCR step, document classifier, and proof file format.
Week two: add structured extraction for one document type, usually vendor invoices.
Week three: add risk flags, duplicate checks, and a human review message.
Week four: measure accuracy, failure types, review time saved, and false confidence. Tighten the schema before adding more document types.
Final recommendation
Self-hosted AI agent document processing works when the workflow is narrow, stateful, and proof-driven. OpenClaw should watch the files, extract the text, classify the document, write structured data, flag risk, and prepare human review.
Do not start with full autonomy. Start with a clean queue.
That is where document automation starts paying back quickly.