Free PDF OCR
Turn scanned PDFs and image-based documents into searchable, copyable text. Upload a scanned PDF, and the tool renders each page with PDF.js, then runs Tesseract OCR (the same engine used by Google Docs) to extract the text. Download the extracted text or a searchable PDF. Best on desktop Chrome, Firefox, or Edge for full performance. Files never leave your browser.
Drop a scanned PDF here or click to browse
Your file never leaves your device · OCR runs entirely in your browser
🔒 Your files never leave your device — OCR runs with Tesseract.js in your browser.
Frequently Asked Questions
Are my PDF files uploaded to a server?+
No. All OCR processing happens inside your browser using Tesseract.js, a WebAssembly port of the open-source Tesseract OCR engine. Your files never leave your device.
Why is desktop recommended for OCR?+
OCR is computationally intensive. Tesseract.js loads a ~10 MB WASM module and processes each page independently. Desktop browsers have more memory and CPU available, resulting in faster and more reliable OCR. On mobile, processing is slower and may fail on very large PDFs.
What is the quality of the OCR output?+
Tesseract.js (version 5) is based on Tesseract 4.0 with LSTM neural network — the same technology used in Google Docs' document OCR. For clean, well-scanned documents at 150+ DPI, accuracy is typically 95-99%. Handwritten text, poor scan quality, or unusual fonts reduce accuracy.
What languages are supported?+
English is the default. Additional languages available include French, German, Spanish, Italian, Portuguese, and others. Select your language before starting OCR for the best accuracy. Mixed-language documents may require running OCR twice.
How long does OCR take?+
The first run downloads the Tesseract WASM module and language data (~10-15 MB total). After that, each page takes approximately 3-10 seconds depending on page complexity and your device. A 10-page document typically takes 1-2 minutes total.
What can I do with the OCR output?+
Download the extracted text and paste it into a Word document or Google Doc. Use the PDF to Word or PDF to Excel tools on PDFs that previously had no text layer. Search, summarize, or translate the extracted content.
How OCR Works in the Browser
OCR (Optical Character Recognition) converts images of text into machine-readable text. This tool uses Tesseract.js — a WebAssembly port of the Tesseract OCR engine — running entirely in your browser. Your scanned PDF is rendered page by page using PDF.js, each page becomes a canvas image, and Tesseract analyzes each canvas to produce a text transcript. Nothing is sent to a server.
When to Use OCR
Use OCR when your PDF was created by a scanner, a mobile camera app, or any process that produced an image rather than text. Signs that OCR is needed: you cannot select or copy text in the PDF; searching the PDF returns no results; the file size is unusually large relative to page count. If your PDF already has selectable text, the PDF to Word converter will give faster and more accurate results without OCR overhead.
Accuracy Factors
OCR accuracy depends on scan quality, font clarity, and page orientation. For best results: use PDFs scanned at 200 DPI or higher; ensure pages are not rotated or skewed; use clear, printed fonts rather than handwriting. Handwritten text is recognized poorly by Tesseract and is not a supported use case. Printed text in standard fonts at adequate resolution typically achieves 95%+ accuracy on English documents.
Choosing a Language
Select the primary language of your document before starting OCR. Tesseract loads a separate trained data file for each language — selecting the correct language significantly improves accuracy for accented characters, ligatures, and language-specific word patterns. For documents mixing two languages, choose the language that makes up the majority of the text.
After OCR: Convert to Word or Excel
Once you have the extracted text file, paste its contents into the PDF to Word tool for document formatting, or the PDF to Excel tool if the scanned document contained tables. For large documents, use the PDF Splitter to break the scan into smaller sections before OCR — this reduces per-run processing time and lets you retry individual pages that produced poor results.