The Main Process of OCR
Running your documents through an OCR process is required to create Searchable PDFs. Under the assumption that your document has already been input into the computer system through hardware equipments like a digital camera or scanner with no obvious stains, blurs or other optical defects, an OCR application will process the image as follows:
If the image is partially skewed, the OCR application will automatically straighten the skewed image to achieve a vertical or horizontal image.
The OCR application will analyze the scanned image to identify the text parts.
The OCR application will test a small portion of the image and select the highest recognition rate in 4 ways - the scanned image might be turned 0 degrees, 90 degrees, 180 degrees, or 270 degrees.
Separating single characters
The OCR application will separately cut out every character, number and punctuation mark from the image file.
Capturing the features of the characters and comparing
The OCR application will capture the most distinctive and definite parts of each character using various means to identify its difference from other characters. The characters will then be compared with a character database to determine what characters they should be converted into.
Recognition result output
When every character is recognized, the OCR will generate a searchable PDF document.
- If you emphasize high recognition accuracy, we recommend using Black and White mode (Text mode) and a relatively high resolution (300 dpi or above) when digitalizing the documents.
- If you emphasize preserving the documents’ original look but want to add searchability to it, we recommend using the “Searchable PDF” file format when digitalizing the documents.