Changelog#
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.3.0] - 2025-01-13#
Added#
- Add python
Edgeconstructor for programmatic edge creation withorientation,x1,y1,x2,y2,width, andcolorparameters - Add
explicitstrategy for table detection, allowing the use of explicitly provided edges (#7) - Add
explicit_h_edgesandexplicit_v_edgessettings toTfSettingsfor providing explicit edges - Allow
pageparameter to beNoneinfind_tables,find_all_cells_bboxesandget_edgeswhen both strategies areexplicit(andextract_textisFalseforfind_tables) - Add
plumber_edge_to_tablers_edgefunction for convertingpdfplumberedges totablersedges - Add documentation and doc workflow with Material-for-MkDocs (#6)
Changed#
- Change
Edgeinvalid orientation error from Rust panic to PythonValueError - Change
get_edgesfunction signature and API
[0.2.0] - 2025-01-05#
Added#
- Add CSV export for tables (
to_csv) (#5) - Add Markdown export for tables (
to_markdown) - Add HTML export for tables (
to_html) - Add
min_rowsandmin_columnssettings for table filtering (default: None, no filter) - Add
include_single_cellsetting to configure whether to include tables with only one cell (default: false) - Add
need_stripoption to table extraction functions for whitespace and line feed handling (default: true) - Add
rowsandcolumnsproperties for Python bindings
Fixed#
- Fix handling of multiple MoveTo commands in one path segment
- Improve rectangle detection with better path segment type handling
[0.1.1] - 2025-12-30#
Fixed#
- Fix the bug that linux whl does not contains
libpdfium.so(fixed by renaming it tolibpdfium.so.1)
[0.1.0] - 2025-12-30#
Added#
- Add NonNegative validations for settings
- Add context manager support to Document class for Python
- Add table finding and text extraction settings with new API functions
- Add comprehensive README with features and usage examples
- Add comprehensive docstrings to Python modules and Rust code
- Add tests
- Add CI workflow
- Add pre-commit hooks
Changed#
- Update TfSettings default strategies from Lines to LinesStrict
- Replace
horizontal_ltrandvertical_ttbwithtext_read_in_clockwiseto handle text with rotation_degrees 90 and 270 simultaneously - Enable to deal with pdf with page_count > 65535 by updating pdfium-render
- Use global pdfium runtime
Fixed#
- Fix cargo clippy errors and update lint scripts
- Replace macOS pdfium dylib with arm64 version
[0.0.0] - 2025-12-25#
Added#
- lines / lines_strict / text strategies for extracting tables in a pdf page