Register to build your agenda.

OPEN TALK: Search and Extract: Optimized Document Processing with iText pdf2Data and pdfOCR


Cal Reynolds
iText, Software Engineer

Cal joined iText in 2020 as a pre sales engineer (software engineer) with a passion for helping customers understand how to use and optimize our products. He holds a bachelors in computer science from Hamilton College. When he’s not helping iText customers or working with OCR technology, you can find him traveling or at the movie theater seeing what’s new.

Michael Demey
iText Software, Research Engineer


With interests including Open Source software and licenses, Michaël has been a developer at iText Software since 2011. After almost a decade of working closely with PDF, he has a keen insight into its uses in the real world. When he's not looking at PDF syntax, he likes to play music and (tries to) develop games.


In this talk we will demo an optimized PDF workflow using pdfOCR to recognize data in PDF documents, and pdf2Data to extract selected data from your OCR search. The beauty of using pdf2Data in this way is it can pick up exactly where pdfOCR leaves off, allowing you to both recognize and extract all kinds of data from PDF documents that would otherwise be inaccessible.

pdf2Data is our iText 7 add-on for smart data extraction from PDF documents. It’s tailored especially for extracting hard to reach data locked inside PDFs, and it fits neatly into the iText 7 ecosystem. The cherry on top? Anyone can quickly create a template for data extraction using the sleek user interface, with no need to tediously define document structures programmatically. Let the template designer assist you in creating your data extraction templates; no coding required!

If you haven’t tried it already, we’d like to give you a quick tour of its capabilities, while also demonstrating how it’s a great companion for our pdfOCR add-on.