Compile your files into AI-ready context.

Upload PDFs, DOCX files, or folders. DocDigest turns them into one clean, structured, token-aware Markdown digest for Claude, ChatGPT, Gemini, Cursor, and RAG workflows.

Create a digest See the full extraction

Free tier · 200 pages / month · No credit card

docling-technical-report.pdf → digest.md

Source

Docling Technical Report

p.1 / 9

Two-column academic PDF with tables, figures, code, and references (arXiv:2408.09869).

Output

43 KB, UTF-8

# DocDigest Output
 
- Digest ID: `0f3860bd-dd07-403e-8284-2a0ae9dc1964`
- Sources: 1
- Target model: `claude-3-5-sonnet`
 
## Table of sources
 
1. **docling-technical-report.pdf** - 9,782 tokens - `ready`
 
---
 
## Source 1 - docling-technical-report.pdf
 
```yaml
filename: docling-technical-report.pdf
mime: application/pdf
tokens: 9782
status: ready
```
 
## Docling Technical Report
 
**Abstract.** This technical report introduces Docling, an easy to use,
self-contained, MIT-licensed open-source package for PDF document conversion.

digest.md View full extraction →

pages in

9,782

tokens

headings

43 KB

Markdown out

Built on Docling

Every extraction runs on Docling, IBM Research's open-source document parser: layout-aware PDF understanding, table structure, and reading order. MIT-licensed.

GitHub Docs Technical report

Three reasons

A precise tool for serious AI work.

Combine files

Many files and formats - PDF, DOCX, Markdown, ZIP folders - become one coherent, source-aware output.

See token + structure clarity

File names, hierarchy, token counts, parsing status, warnings, and context-window fit - visible at a glance.

Improve quality

Docling preserves headings, tables, layout, and code blocks far better than copy-paste from a PDF viewer.

Outputs

Clean Markdown. JSON.

Each digest ships with a full token report and parse warnings so you know exactly what you're feeding the model. Export anywhere.

Source headers preserved on every section
Per-file token counts for Claude, GPT, and Gemini
Tables extracted as Markdown grids, not flattened OCR text
Optional raw Docling JSON for downstream tooling

digest.md9,782 tokens

# DocDigest Output
 
- Sources: 1
- Target model: claude-3-5-sonnet
 
## Table of sources
 
1. docling-technical-report.pdf - 9,782 tokens - ready
 
## Source 1 - docling-technical-report.pdf
 
```yaml
filename: docling-technical-report.pdf
tokens: 9782
status: ready
```

FAQ

Common questions

Is this a chat-with-PDF tool?

No. DocDigest is a context preparation tool. It compiles your files into one clean Markdown digest you can paste into any LLM.

What about scanned documents?

Enable OCR in the advanced options. It is included on every plan, including Free.

Where do my files live?

Files are uploaded over TLS and processed in isolation. They are kept until you delete them from your workspace.

What can I export?

Every digest exports as clean Markdown or raw Docling JSON, with per-file token counts and parse warnings. Create and download digests straight from the web app.