Blog
Guides for AI-ready documents
Practical writing on converting documents for LLMs: layout-aware parsing, token-aware chunking, and building retrieval pipelines that actually work.
- Comparison
Docling vs Pandoc for AI context - which wins?
Layout-aware parsing vs linear extraction, compared head to head for RAG and LLM context.
Read the post - Guide
Markdown chunking for LLMs - a practical guide
How to split Markdown into token-aware chunks for RAG and long-context models, with a recommended default.
Read the post - Guide
RAG document processing - from PDF to embeddings
A pragmatic, seven-step pipeline for parsing, cleaning, chunking, and embedding documents for production retrieval.
Read the post