⚡ Tablers

A blazingly fast PDF table extraction library with python API powered by Rust

Features#

🚀 Blazingly Fast - Core algorithms written in Rust for maximum performance
🐍 Pythonic API - Easy-to-use Python interface with full type hints
📄 Edge Detection - Accurate table detection using line and rectangle edge analysis
📝 Text Extraction - Extract text content from table cells with configurable settings
📤 Multiple Export Formats - Export tables to CSV, Markdown, and HTML
🔐 Encrypted PDFs - Support for password-protected PDF documents
💾 Memory Efficient - Lazy page loading for handling large PDF files
🖥️ Cross-Platform - Works on Windows, Linux, and macOS

Why Tablers?#

This project draws significant inspiration from the table extraction modules of pdfplumber and PyMuPDF. Compared to pdfplumber and PyMuPDF, tablers has the following advantages:

High Performance: Utilizes Rust for high-performance PDF processing
More Configurable: Supports customizable table filter settings (min_rows, min_columns, include_single_cell, e.g., see this issue)
Clean Python Dependencies: No external python dependencies required

Benchmark#

Performance comparison of tablers, pymupdf and pdfplumber for PDF table extraction:

Table Extraction Benchmark

For more details, please refer to the tablers-benchmark repository.

Note#

This solution is primarily designed for text-based PDFs and does not support scanned PDFs.

Installation#

pip install tablers

Quick Start#

Basic Table Extraction#

from tablers import Document, find_tables

# Open a PDF document
doc = Document("example.pdf")

# Extract tables from each page
for page in doc.pages():
    tables = find_tables(page, extract_text=True)
    for table in tables:
        print(f"Found table with {len(table.cells)} cells")
        for cell in table.cells:
            print(f"  Cell: {cell.text} at {cell.bbox}")

doc.close()

Using Context Manager#

from tablers import Document, find_tables

with Document("example.pdf") as doc:
    page = doc.get_page(0)  # Get first page
    tables = find_tables(page, extract_text=True)

    for table in tables:
        print(f"Table bbox: {table.bbox}")

For more advanced usage, please refer to the documents.

Requirements#

Python >= 3.10
Supported platforms: Windows (x64), Linux (x64) with glibc >= 2.34, macOS (ARM64)

License#

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments#

pdfium-render - Rust bindings for PDFium
PyO3 - Rust bindings for Python