API Reference#

This page provides detailed documentation for all public classes and functions in Tablers.

Functions#

find_tables#

Find all tables in a PDF page or from explicit edges.

def find_tables(
    page: Page | None = None,
    extract_text: bool = True,
    tf_settings: TfSettings | None = None,
    **kwargs: Unpack[TfSettingItems]
) -> list[Table]

Parameters:

Parameter	Type	Default	Description
`page`	`Optional[Page]`	`None`	The PDF page to analyze. Can be `None` only if both strategies are `"explicit"` and `extract_text` is `False`
`extract_text`	`bool`	`True`	Whether to extract text content from table cells
`tf_settings`	`Optional[TfSettings]`	`None`	TableFinder settings object. If not provided, default settings are used
`**kwargs`	`Unpack[TfSettingItems]`	-	Additional keyword arguments passed to TfSettings

Returns: list[Table] - A list of Table objects found in the page.

Raises:

ValueError - If page is None and extract_text is True.
ValueError - If page is None and either strategy is not "explicit".

Example:

from tablers import Document, find_tables

with Document("example.pdf") as doc:
    page = doc.get_page(0)
    tables = find_tables(page, extract_text=True)
    for table in tables:
        print(f"Table with {len(table.cells)} cells at {table.bbox}")

Example with explicit edges (no page required):

from tablers import Edge, TfSettings, find_tables

h_edges = [Edge("h", 0.0, 0.0, 100.0, 0.0), Edge("h", 0.0, 100.0, 100.0, 100.0)]
v_edges = [Edge("v", 0.0, 0.0, 0.0, 100.0), Edge("v", 100.0, 0.0, 100.0, 100.0)]

settings = TfSettings(
    horizontal_strategy="explicit",
    vertical_strategy="explicit",
    explicit_h_edges=h_edges,
    explicit_v_edges=v_edges,
)

tables = find_tables(page=None, extract_text=False, tf_settings=settings)

find_all_cells_bboxes#

Find all table cell bounding boxes in a PDF page or from explicit edges.

def find_all_cells_bboxes(
    page: Page | None = None,
    tf_settings: TfSettings | None = None,
    **kwargs: Unpack[TfSettingItems]
) -> list[tuple[float, float, float, float]]

Parameters:

Parameter	Type	Description
`page`	`Optional[Page]`	The PDF page to analyze. Can be `None` only if both strategies are `"explicit"`
`tf_settings`	`Optional[TfSettings]`	TableFinder settings object
`**kwargs`	`Unpack[TfSettingItems]`	Additional keyword arguments passed to TfSettings

Returns: list[BBox] - A list of bounding boxes (x1, y1, x2, y2) for each detected cell.

Raises: RuntimeError - If page is None and either strategy is not "explicit".

Example:

from tablers import Document, find_all_cells_bboxes

with Document("example.pdf") as doc:
    page = doc.get_page(0)
    cells = find_all_cells_bboxes(page)
    print(f"Found {len(cells)} cells")

Example with explicit edges (no page required):

from tablers import Edge, TfSettings, find_all_cells_bboxes

h_edges = [Edge("h", 0.0, 0.0, 100.0, 0.0), Edge("h", 0.0, 100.0, 100.0, 100.0)]
v_edges = [Edge("v", 0.0, 0.0, 0.0, 100.0), Edge("v", 100.0, 0.0, 100.0, 100.0)]

settings = TfSettings(
    horizontal_strategy="explicit",
    vertical_strategy="explicit",
    explicit_h_edges=h_edges,
    explicit_v_edges=v_edges,
)

cells = find_all_cells_bboxes(None, tf_settings=settings)

find_tables_from_cells#

Construct tables from a list of cell bounding boxes.

def find_tables_from_cells(
    cells: list[tuple[float, float, float, float]],
    extract_text: bool,
    page: Page | None = None,
    tf_settings: TfSettings | None = None,
    **kwargs: Unpack[TfSettingItems]
) -> list[Table]

Parameters:

Parameter	Type	Description
`cells`	`list[BBox]`	A list of cell bounding boxes to group into tables
`extract_text`	`bool`	Whether to extract text content from cells
`page`	`Optional[Page]`	The PDF page (required if extract_text is True)
`tf_settings`	`Optional[TfSettings]`	Table finder settings
`**kwargs`	`Unpack[TfSettingItems]`	Additional keyword arguments for settings

Returns: list[Table] - A list of Table objects constructed from the cells.

Raises: RuntimeError - If extract_text is True but page is not provided.

get_edges#

Extract edges (lines and rectangle borders) from a PDF page or from explicit edges.

def get_edges(
    page: Page | None = None,
    tf_settings: TfSettings | None = None,
    **kwargs: Unpack[TfSettingItems]
) -> dict[str, list[Edge]]

Parameters:

Parameter	Type	Description
`page`	`Optional[Page]`	The PDF page to extract edges from. Can be `None` only if both strategies are `"explicit"`
`tf_settings`	`Optional[TfSettings]`	TableFinder settings object
`**kwargs`	`Unpack[TfSettingItems]`	Additional keyword arguments passed to TfSettings

Returns: dict - A dictionary with keys "h" (horizontal edges) and "v" (vertical edges).

Raises: RuntimeError - If page is None and either strategy is not "explicit".

plumber_edge_to_tablers_edge#

Convert a pdfplumber edge dictionary to a Tablers Edge object.

from tablers.edges import plumber_edge_to_tablers_edge

def plumber_edge_to_tablers_edge(
    plumber_edge: dict[str, Any],
    page_rotation: float,
    page_height: float,
    page_width: float,
) -> Edge

Parameters:

Parameter	Type	Description
`plumber_edge`	`dict[str, Any]`	A pdfplumber edge dictionary containing `orientation`, `x0`, `y0`, `x1`, `y1`, `linewidth`, and `stroking_color`
`page_rotation`	`float`	The rotation of the page in degrees
`page_height`	`float`	The height of the page
`page_width`	`float`	The width of the page

Returns: Edge - A Tablers Edge object.

Tip

This function can serve as a reference for writing conversion functions for other PDF libraries. See Using Edges from Other Libraries for more details.

Classes#

Document#

Represents an opened PDF document.

class Document:
    def __init__(
        self,
        path: Path | str | None = None,
        bytes: bytes | None = None,
        password: str | None = None
    )

Parameters:

Parameter	Type	Description
`path`	`Union[Path, str, None]`	File path to the PDF document
`bytes`	`Optional[bytes]`	PDF content as bytes
`password`	`Optional[str]`	Password for encrypted PDFs

Note

Either path or bytes must be provided, but not both. If both are provided, only path is used.

Methods:

Method	Returns	Description
`page_count()`	`int`	Get the total number of pages
`get_page(page_num)`	`Page`	Retrieve a specific page by index (0-based)
`pages()`	`PageIterator`	Get an iterator over all pages
`close()`	`None`	Close the document and release resources
`is_closed()`	`bool`	Check if the document has been closed

Context Manager:

with Document("example.pdf") as doc:
    for page in doc:
        print(page.width, page.height)

Page#

Represents a single page in a PDF document.

Attributes:

Attribute	Type	Description
`width`	`float`	The width of the page in points
`height`	`float`	The height of the page in points
`objects`	`Optional[Objects]`	Extracted objects, or None if not extracted

Methods:

Method	Returns	Description
`is_valid()`	`bool`	Check if the page reference is still valid
`extract_objects()`	`None`	Extract all objects from the page
`clear()`	`None`	Clear cached objects to free memory

Table#

Represents a table extracted from a PDF page.

Attributes:

Attribute	Type	Description
`bbox`	`tuple[float, float, float, float]`	Bounding box (x1, y1, x2, y2)
`cells`	`list[TableCell]`	All cells in the table
`rows`	`list[CellGroup]`	All rows in the table
`columns`	`list[CellGroup]`	All columns in the table
`page_index`	`int`	Index of the page containing this table
`text_extracted`	`bool`	Whether text has been extracted

Methods:

Method	Returns	Description
`to_csv()`	`str`	Convert to CSV format
`to_markdown()`	`str`	Convert to Markdown table format
`to_html()`	`str`	Convert to HTML table format

Warning

Export methods raise ValueError if text has not been extracted.

TableCell#

Represents a single cell in a table.

Attributes:

Attribute	Type	Description
`bbox`	`tuple[float, float, float, float]`	Bounding box (x1, y1, x2, y2)
`text`	`str`	Text content of the cell

CellGroup#

Represents a group of table cells arranged in a row or column.

Attributes:

Attribute	Type	Description
`cells`	`list[Optional[TableCell]]`	Cells in this group, with None for empty positions
`bbox`	`tuple[float, float, float, float]`	Bounding box of the entire group

Objects#

Container for all extracted objects from a PDF page.

Attributes:

Attribute	Type	Description
`rects`	`list[Rect]`	All rectangles found in the page
`lines`	`list[Line]`	All line segments found in the page
`chars`	`list[Char]`	All text characters found in the page

Rect#

Represents a rectangle extracted from a PDF page.

Attributes:

Attribute	Type	Description
`bbox`	`tuple[float, float, float, float]`	Bounding box
`fill_color`	`tuple[int, int, int, int]`	Fill color (RGBA)
`stroke_color`	`tuple[int, int, int, int]`	Stroke color (RGBA)
`stroke_width`	`float`	Stroke width

Line#

Represents a line segment extracted from a PDF page.

Attributes:

Attribute	Type	Description
`line_type`	`Literal["straight", "curve"]`	Type of line
`points`	`list[tuple[float, float]]`	Points defining the line path
`color`	`tuple[int, int, int, int]`	Color (RGBA)
`width`	`float`	Line width

Char#

Represents a text character extracted from a PDF page.

Attributes:

Attribute	Type	Description
`unicode_char`	`Optional[str]`	Unicode character
`bbox`	`tuple[float, float, float, float]`	Bounding box
`rotation_degrees`	`float`	Clockwise rotation in degrees
`upright`	`bool`	Whether the character is upright

Edge#

Represents a line edge extracted from a PDF page or created programmatically.

class Edge:
    def __init__(
        self,
        orientation: Literal["h", "v"],
        x1: float,
        y1: float,
        x2: float,
        y2: float,
        width: float = 1.0,
        color: Color = (0, 0, 0, 255),
    ) -> None

Constructor Parameters:

Parameter	Type	Default	Description
`orientation`	`Literal["h", "v"]`	-	"h" for horizontal, "v" for vertical
`x1`	`float`	-	Left x-coordinate
`y1`	`float`	-	Top y-coordinate
`x2`	`float`	-	Right x-coordinate
`y2`	`float`	-	Bottom y-coordinate
`width`	`float`	`1.0`	Stroke width
`color`	`Color`	`(0, 0, 0, 255)`	Stroke color (RGBA)

Raises: ValueError - If orientation is not "h" or "v".

Example:

from tablers import Edge

# Create a horizontal edge
h_edge = Edge("h", 0.0, 50.0, 100.0, 50.0)

# Create a vertical edge with custom width and color
v_edge = Edge("v", 50.0, 0.0, 50.0, 100.0, width=2.0, color=(255, 0, 0, 255))

Attributes:

Attribute	Type	Description
`orientation`	`Literal["h", "v"]`	"h" for horizontal, "v" for vertical
`x1`	`float`	Left x-coordinate
`y1`	`float`	Top y-coordinate
`x2`	`float`	Right x-coordinate
`y2`	`float`	Bottom y-coordinate
`width`	`float`	Stroke width
`color`	`tuple[int, int, int, int]`	Stroke color (RGBA)

Type Aliases#

Alias	Definition	Description
`Point`	`tuple[float, float]`	A 2D point (x, y)
`BBox`	`tuple[float, float, float, float]`	Bounding box (x1, y1, x2, y2)
`Color`	`tuple[int, int, int, int]`	RGBA color (0-255 each)