Model path
The extraction path starts with a vision-language model in the Qwen2-VL and olmOCR direction: document pixels are turned into a representation the model can read, and the prompt path is narrowed toward page reading. The result stays attached to a file, page, row, and reviewable output.
Vision pass
Document pages become visual tokens after rotation, contrast, patch, and page-shape preparation.
Document prompt
A 7B-class Qwen2-VL and olmOCR-style extraction path is guided toward handwriting, forms, and tables.
Cell graph
Headers, rows, totals, merged areas, and column relationships are rebuilt before export.
Owned job flow
Queue state, file metadata, retries, and downloads sit around the model so a batch can survive real use.
Table structure
A workbook is more demanding than plain OCR text. The pipeline has to preserve the idea of a header, detect when writing belongs to the next row, keep totals attached to their column, and avoid turning a ruled table into a paragraph. Preview, editable cells, corrected downloads, and batch comparison all depend on that consistent schema.
Batch layer
Queue admission, workers, storage metadata, file ownership, and retry-safe outputs carry the run beyond one web request. Job state, result files, and review actions stay together because the time saved comes from the whole group, not only the first image.

Excel review
The comparison view puts the source beside the table so teams can correct cells, mark reviewed files, and download the batch. The target is fewer repeated keystrokes and editable spreadsheets that remain useful after they leave AxLiner.

