OCR Text Recognition
UPDF's OCR feature allows you to convert the scanned text of PDF documents into searchable and editable content. With this feature, data across images can also be edited, making the document interactive for the user.
(The Mac with Apple Chip version from the official website has the OCR feature. However, the Mac with Intel Chip version and the Mac App Store version doesn't release the OCR feature yet.)
How to Download and Install OCR
When you open the document, navigate to the "Recognize Text Using OCR" button on the right toolbar.
If you are using this feature for the first time, you must download it as a UPDF plugin. Continue the process by clicking the "Install" button in the pop-up window.
You will be automatically redirected to the next window, which will display the installation progress of the feature. Let the feature install successfully on your Windows device before using it.
How to OCR PDFs
After installation, close the window and navigate to the same button to access the OCR tool via UPDF. When it opens, it will give you two different document type options, including "Searchable PDF" and "Image-only PDF".
- Searchable PDF: By selecting this option, it converts scanned PDF documents into searchable and editable documents.
- Image-only PDF: When this option is selected, it will convert your searchable and editable document into an image-based PDF document, which is neither searchable nor editable.
Document Type: Searchable PDF
If you go for "Searchable PDF," it will convert your scanned PDF documents into editable and searchable documents.
To set this up, you must first determine the correct "Layout" using the options available in the drop-down menu. When setting up the flow layout, you will get three different options:
- Text and pictures only: The recognized text and images will be saved in the PDF document that will be created. The created file is also smaller and may have a different visual structure than the original.
- Text over the page Image: This mode is responsible for preserving background images and illustrations in the source document where OCR was performed. These files are larger; however, they may differ visually from the original.
- Text under the page image: In this mode, the PDF image is preserved; however, the recognized text is placed under an invisible layer below the image. This file type is exactly the same as the original PDF file.
Click the "Gear" icon to access more layout settings you can define for the file. Here you can specify whether you want to "Keep Pictures" while deciding between "Low", "Balanced" or "High" to save files smaller than the original with commendable image and picture quality.
Document Language, Image Resolution, and Page Range:
Define the appropriate Document Language using the 38 different language options in the drop-down menu. This gives UPDF a better basis to accurately identify text in documents.
You can also use the Image Resolution option to specify an appropriate resolution setting for the image. Process the "Page Range" and click "Perform OCR" to perform OCR in the file using the defined settings.
Document Type: Image-only PDF
If you continue to use "Image-only PDF", it will convert your searchable and editable documents into image-based PDF documents that are neither searchable nor editable.
- Set the image quality under the "Keep Picture" section by selecting any of the available options for "Low", "Balanced" or "High".
- Decide if you wish to compress your images using MRC.
- Provide the appropriate "Page Range" and click "Perform OCR" to perform the action on the document. Select the folder and you will get the scanned PDF document immediately.