Changelog#

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.3.0] - 2025-01-13#

Add python Edge constructor for programmatic edge creation with orientation, x1, y1, x2, y2, width, and color parameters
Add explicit strategy for table detection, allowing the use of explicitly provided edges (#7)
Add explicit_h_edges and explicit_v_edges settings to TfSettings for providing explicit edges
Allow page parameter to be None in find_tables, find_all_cells_bboxes and get_edges when both strategies are explicit (and extract_text is False for find_tables)
Add plumber_edge_to_tablers_edge function for converting pdfplumber edges to tablers edges
Add documentation and doc workflow with Material-for-MkDocs (#6)

Add CSV export for tables (to_csv) (#5)
Add Markdown export for tables (to_markdown)
Add HTML export for tables (to_html)
Add min_rows and min_columns settings for table filtering (default: None, no filter)
Add include_single_cell setting to configure whether to include tables with only one cell (default: false)
Add need_strip option to table extraction functions for whitespace and line feed handling (default: true)
Add rows and columns properties for Python bindings

Fix the bug that linux whl does not contains libpdfium.so (fixed by renaming it to libpdfium.so.1)

Update TfSettings default strategies from Lines to LinesStrict
Replace horizontal_ltr and vertical_ttb with text_read_in_clockwise to handle text with rotation_degrees 90 and 270 simultaneously
Enable to deal with pdf with page_count > 65535 by updating pdfium-render
Use global pdfium runtime