抽出のコツ

PDF全体の見た目を再現するページではなく、PDF内のテキスト表を崩れにくく抽出するためのメモです。

Extraction tips

Tips for extracting text-based PDF tables with fewer row and column issues.

まず確認すること

PDF上の文字をドラッグして選択できるか確認する。
文字を選択できないPDFは画像PDFの可能性が高く、このツールでは抽出できない。
斜めの表、複数段の表、装飾が多い表は崩れやすい。

Autoで試す

最初はAutoで抽出し、日付・摘要・金額などが列に分かれるか確認する。
行が分かれすぎる場合は行の近接判定を調整する。
列が混ざる場合は列のギャップ判定を調整する。

Manualを使う場面

請求書の明細表だけを抽出したい場合。
ページ全体を対象にすると会社名や合計欄が混ざる場合。
Autoで列が崩れるため、表の範囲だけに絞りたい場合。

保存前チェック

金額列、日付列、空欄、重複行を確認する。
ExcelでCSVが文字化けする場合はBOMをONにする。
Excel出力は抽出結果のXLSX保存であり、PDFの見た目再現ではない。

Check first

Check whether text can be selected directly in the PDF.
If text cannot be selected, it is likely an image PDF and cannot be extracted here.
Rotated, multi-column, or heavily decorated tables may break.

Try Auto first

Run Auto and check whether dates, descriptions, and amounts become separate columns.
Adjust row proximity when rows split or merge incorrectly.
Adjust column gap when columns blend together.

When to use Manual

When you only need the line-item table in an invoice.
When page titles, addresses, or totals interfere with extraction.
When Auto breaks columns and you need to select only the table area.

Before saving

Check amount columns, date columns, blanks, and duplicate rows.
Turn on BOM when CSV text is garbled in Excel.
Excel export saves extracted results as XLSX; it does not recreate the PDF layout.

← ツール本体へ戻る / Back to the tool