Skip to content

Settings Reference#

This page documents all configuration settings available in Tablers.

TfSettings#

Table finder settings control how tables are detected and extracted.

from tablers import TfSettings

settings = TfSettings(
    vertical_strategy="lines_strict",
    horizontal_strategy="lines_strict",
    snap_x_tolerance=3.0,
    # ... other options
)

Detection Strategy#

Parameter Type Default Description
vertical_strategy Literal["lines", "lines_strict", "text", "explicit"] "lines_strict" Strategy for detecting vertical edges
horizontal_strategy Literal["lines", "lines_strict", "text", "explicit"] "lines_strict" Strategy for detecting horizontal edges

Strategy Options:

  • "lines_strict" - Only uses explicit line objects. Best for tables with clear borders.
  • "lines" - Uses lines and rectangle borders. Good for most common tables.
  • "text" - Uses text alignment to infer edges. Best for borderless tables.
  • "explicit" - Uses only explicitly provided edges via explicit_h_edges and explicit_v_edges. Best for programmatic table creation.

Tolerance Settings#

Parameter Type Default Description
snap_x_tolerance float 3.0 Tolerance for snapping vertical edges together
snap_y_tolerance float 3.0 Tolerance for snapping horizontal edges together
join_x_tolerance float 3.0 Tolerance for joining horizontal edge segments
join_y_tolerance float 3.0 Tolerance for joining vertical edge segments
intersection_x_tolerance float 3.0 X-tolerance for detecting edge intersections
intersection_y_tolerance float 3.0 Y-tolerance for detecting edge intersections

Edge Detection#

Parameter Type Default Description
edge_min_length float 3.0 Minimum length for edges to be included in final detection
edge_min_length_prefilter float 1.0 Minimum length for edges before merging operations
min_words_vertical int 3 Minimum words required for vertical text-based edge detection
min_words_horizontal int 1 Minimum words required for horizontal text-based edge detection

Explicit Edges#

Parameter Type Default Description
explicit_h_edges Optional[list[Edge]] None Explicit horizontal edges to include in table detection
explicit_v_edges Optional[list[Edge]] None Explicit vertical edges to include in table detection

When using "explicit" strategy, you must provide edges via these parameters. This allows programmatic table creation without requiring a PDF page:

from tablers import Edge, TfSettings, find_all_cells_bboxes

# Create edges for a 2x2 grid
h_edges = [
    Edge("h", 0.0, 0.0, 100.0, 0.0),
    Edge("h", 0.0, 50.0, 100.0, 50.0),
    Edge("h", 0.0, 100.0, 100.0, 100.0),
]
v_edges = [
    Edge("v", 0.0, 0.0, 0.0, 100.0),
    Edge("v", 50.0, 0.0, 50.0, 100.0),
    Edge("v", 100.0, 0.0, 100.0, 100.0),
]

settings = TfSettings(
    horizontal_strategy="explicit",
    vertical_strategy="explicit",
    explicit_h_edges=h_edges,
    explicit_v_edges=v_edges,
)

# No page required when both strategies are explicit
cells = find_all_cells_bboxes(None, tf_settings=settings)

Table Filtering#

Parameter Type Default Description
include_single_cell bool False Whether to include tables with only a single cell
min_rows Optional[int] None Minimum number of rows required. None means no filtering
min_columns Optional[int] None Minimum number of columns required. None means no filtering

Text Extraction (within TfSettings)#

Parameter Type Default Description
text_x_tolerance float 3.0 X-tolerance for text extraction
text_y_tolerance float 3.0 Y-tolerance for text extraction
text_keep_blank_chars bool False Whether to keep blank characters
text_use_text_flow bool False Whether to use PDF text flow order
text_read_in_clockwise bool True Whether text reads in clockwise direction
text_split_at_punctuation Union[Literal["all"], str, None None Punctuation splitting configuration
text_expand_ligatures bool True Whether to expand ligatures
text_need_strip bool True Whether to strip whitespace from cell text

Complete Example#

from tablers import TfSettings

settings = TfSettings(
    # Detection strategy
    vertical_strategy="lines",
    horizontal_strategy="lines",

    # Tolerance settings
    snap_x_tolerance=5.0,
    snap_y_tolerance=5.0,
    join_x_tolerance=3.0,
    join_y_tolerance=3.0,
    intersection_x_tolerance=3.0,
    intersection_y_tolerance=3.0,

    # Edge detection
    edge_min_length=10.0,
    edge_min_length_prefilter=5.0,
    min_words_vertical=3,
    min_words_horizontal=1,

    # Table filtering
    include_single_cell=False,
    min_rows=2,
    min_columns=2,

    # Text extraction
    text_x_tolerance=3.0,
    text_y_tolerance=3.0,
    text_need_strip=True,
)

WordsExtractSettings#

Settings for text/word extraction from PDF pages.

from tablers import WordsExtractSettings

we_settings = WordsExtractSettings(
    x_tolerance=3.0,
    y_tolerance=3.0,
)

Parameters#

Parameter Type Default Description
x_tolerance float 3.0 Horizontal tolerance for grouping characters into words
y_tolerance float 3.0 Vertical tolerance for grouping characters into lines
keep_blank_chars bool False Whether to preserve blank/whitespace characters
use_text_flow bool False Whether to use the PDF's text flow order
text_read_in_clockwise bool True Whether text reads in clockwise direction
split_at_punctuation Union[Literal["all"], str, None] None Punctuation splitting configuration
expand_ligatures bool True Whether to expand ligatures into individual characters
need_strip bool True Whether to strip leading/trailing whitespace from cell text

Punctuation Splitting#

The split_at_punctuation parameter controls how text is split at punctuation:

  • None - No splitting at punctuation
  • "all" - Split at all punctuation characters
  • str - Split at specific characters (e.g., ".,;:")

Complete Example#

from tablers import WordsExtractSettings

we_settings = WordsExtractSettings(
    x_tolerance=3.0,
    y_tolerance=3.0,
    keep_blank_chars=False,
    use_text_flow=False,
    text_read_in_clockwise=True,
    split_at_punctuation=None,
    expand_ligatures=True,
    need_strip=True,
)

Using Settings with Functions#

With find_tables#

from tablers import Document, find_tables, TfSettings

settings = TfSettings(
    vertical_strategy="lines",
    min_rows=2,
)

with Document("example.pdf") as doc:
    page = doc.get_page(0)
    tables = find_tables(page, extract_text=True, tf_settings=settings)

With Keyword Arguments#

You can also pass settings as keyword arguments directly:

from tablers import Document, find_tables

with Document("example.pdf") as doc:
    page = doc.get_page(0)
    tables = find_tables(
        page,
        extract_text=True,
        vertical_strategy="lines",
        horizontal_strategy="lines",
        min_rows=2,
        snap_x_tolerance=5.0,
    )

With find_tables_from_cells#

from tablers import (
    Document,
    find_all_cells_bboxes,
    find_tables_from_cells,
    WordsExtractSettings
)

we_settings = WordsExtractSettings(
    x_tolerance=5.0,
    y_tolerance=5.0,
)

with Document("example.pdf") as doc:
    page = doc.get_page(0)
    cells = find_all_cells_bboxes(page)
    tables = find_tables_from_cells(
        cells,
        extract_text=True,
        page=page,
        we_settings=we_settings,
    )