OpenCV oraz Tesseract OCR - rozpoznawanie tekstu

OpenCV oraz Tesseract OCR - rozpoznawanie tekstu


Today’s photos are made up of tens of millions of pixels containing an infinite amount of data. This might make us wonder - how do we get access to this information? In this post, I will show you, step by step, how to read text data from a photo, using the library in the title of the article: OpenCV and Tesseract OCR.

Installing libraries

At the very beginning, let's prepare the development environment. To do so, we have to install:

pip install matplotlib
pip install pytesseract
  • OpenCV, in the case of python it will go to opencv-python:
    pip install opencv-python
  • matplotlib to display our doings with the input image:

      pip install matplotlib

  • tesseract, if you are a Linux user, type in the terminal:
    sudo apt-get install -y tesseract-ocr
    Users of other operating systems can use the description on the GitHub tesseract.
  • We also need to link our python code to a tesseract; we'll use Pytesseract for that.

​ ​

Preparation of the photo

The most important task is to separate the text from the background. If we properly filter a given image or photo to change the background to white pixels, the tesseract will have no problem reading the text. Let's start with loading the photo and performing a few operations on it.

import cv2
import matplotlib as plt
img = cv2.imread('example.jpg') # image is loaded
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # black and white conversion
thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)[1] # all values over 127 converted to 255(white)
plt.imshow(out, 'gray') # displaying a picture

Sample photo of the inscription from the glasses cleaning wipes that I found on the cupboard, before and after applying the above operations

Reading text from an image.

Now we need to provide our resulting image to be processed by tesseract.


from pytesseract import image_to_string
output = image_to_string(thresh, lang='eng', config='--psm 7')
print('Output: ', output)

The output, in this case, is "Output: GLASSES WIPES," so it's all right. As for the value of config = '- psm 7', they are different depending on what exactly we want to achieve and look like this:

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
                        bypassing hacks that are Tesseract-specific.

Let's try with text on multiple lines (config = '- psm 6')

also, this time, the result turned out to be correct

Output:  Ala ma kota,
kot ma Ale,
ale Ala,
nie ma psa.

To sum up

as you can see, reading text from photos or pictures is not difficult. Of course, you can do more with the photo, such as extending the pixel values over the entire scale range ([0: 255]), or remove elements that we believe are background contamination and even fill our area when there are any gaps. It is worth noting that our "home OCR" can even handle handwriting, of course, if it is at least a little legible.

Title photo: