Blog

Guides for AI-ready documents

Practical writing on converting documents for LLMs: layout-aware parsing, token-aware chunking, and building retrieval pipelines.

Docling vs Pandoc for AI context

Layout-aware parsing vs linear extraction, compared head to head for RAG and LLM context.

How to split Markdown into token-aware chunks for RAG and long-context models, with a recommended default.

A pragmatic, seven-step pipeline for parsing, cleaning, chunking, and embedding documents for production retrieval.