Thursday, February 18, 2021

OPEN TALK: Search and Extract: Optimized Document Processing with iText pdf2Data and pdfOCR
Cal Reynolds
iText, Software Engineer
Michael Demey
iText Software, Research Engineer

In this talk we will demo an optimized PDF workflow using pdfOCR to recognize data in PDF documents, and pdf2Data to extract selected data from your OCR search. The beauty of using pdf2Data in this way is it can pick up exactly where pdfOCR leaves off, allowing you to both recognize and extract all kinds of data from PDF documents that would otherwise be inaccessible.

pdf2Data is our iText 7 add-on for smart data extraction from PDF documents. It’s tailored especially for extracting hard to reach data locked inside PDFs, and it fits neatly into the iText 7 ecosystem. The cherry on top? Anyone can quickly create a template for data extraction using the sleek user interface, with no need to tediously define document structures programmatically. Let the template designer assist you in creating your data extraction templates; no coding required!

If you haven’t tried it already, we’d like to give you a quick tour of its capabilities, while also demonstrating how it’s a great companion for our pdfOCR add-on.