How to Create an Image to Text Converter Python
10 mins read

How to Create an Image to Text Converter Python


You might have heard about the image-to-text converter tools. Those who extract texts from an image instantly. But have you wondered how these tools work and how you can make one of your own? 

If yes, then this blog post is for you. In this post, we are going to tell you how you can create an image-to-text converter using Python. Don’t worry, it is not that difficult. 

We will not waste your time in defining the basics like Python. Because if you are searching for the topic, this means you already know the basics. 

So, let’s jump straight into the development of the tool and break everything down step by step. But before that have a little look into the prerequisites. 

Prerequisites

Before you jump into the steps to create the tool, let’s make sure you have the prerequisites installed on your device. 

Install Libraries

To get started, you’ll need Python installed on your device. If you have not already installed it simply head over to the official website of Python and download the latest available version. 

After installing Python the next thing you’ll need to do is to install libraries. They are essential. As we are creating an image-to-text converter we are going to use three libraries i.e., Pytesseract, Pillow, and OpenCV. 

Here are the reasons for installing them. 

  • Pytesseract will help us with text extraction
  • Pillow allows us to open and save images in multiple formats
  • OpenCV is for image processing. It will help in tasks like resizing or adjusting images before feeding them to Pytesseract. 

To install the above libraries simply open your command line or terminal (you can search for it in the start menu if you’re on Windows or use the Terminal app on macOS).  Give the below command. It will automatically download and install the mentioned libraries. 



pip install pytesseract pillow opencv-python

Install Tesseract OCR Engine

This one is the critical part. Pytesseract library relies on the Tesseract OCR engine for extracting text from images. 

To install the said OCR engine follow the steps below.

  • Go to Tesseract’s GitHub Page and download the version that is compatible with your operating system. 
  • Once the download is completed, run the installer. Follow the instructions appearing on the screen carefully for a successful installation. 

After the installation is completed check if the Python is available to find it or not. To do this, open your Python script and run the below code at the beginning:



import pytesseractpytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Note: If you’re using macOS or Linux, the path will be different, so adjust it accordingly.

Step-by-Step Process to Create an Image-to-Text Converter

If you have installed the above libraries then it is time to start creating your image-to-text converter. Follow the steps we have mentioned below carefully. 

1. Importing Libraries

The first thing that you have to do is to bring in the libraries you have installed previously. They will do all the heavy lifting for you. Below is the code you can use to import them. 



import pytesseractfrom PIL import Imageimport cv2

2. Loading Image

After importing libraries the next step is to load an image from which I want to extract text. For this, you can use the library either Pillow or OpenCV. 

Code for Using Pillow



image = Image.open('image_path.jpg')

Code for Using OpenCV



image = cv2.imread('image_path.jpg')

Do not forget to replace the (‘image_path.jpg’) with the actual path of the file that you want to load. 

3. Preprocessing the Image

Before moving to the text extraction, preprocessing the image is considered a good idea. By doing this, you can make the text easier to read and improve the accuracy of the OCR process. 

Let us walk you through the basic preprocessing steps. 

  • Resizing: Variations in the image dimensions can affect the accuracy. You have to resize it to a manageable size. 
  • Grayscale Conversion: This means removing unnecessary color information so that Tesseract can detect the text easily. 
  • Thresholding: It involves the conversion of an image into black and white to further help Tesseract better recognize the text. 

Below we have shared the code that you can apply for these steps.



# Convert to grayscalegray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Apply thresholding_, threshold_image = cv2.threshold(gray_image, 150, 255, cv2.THRESH_BINARY)
# Resize the image (optional, adjust size as needed)resized_image = cv2.resize(threshold_image, (800, 600))

Note: We have used the image dimension 800×600. You can adjust them as per your needs. 

Now comes the most important part i.e., extracting text from images. For this, you have to use the Pytesseract library. First, you’ll need to feed the image to Tesseract and then get the text. 

Below is the code that you are going to need for text extraction. 



# Extract text from the imageextracted_text = pytesseract.image_to_string(resized_image)

This line uses pytesseract.image_to_string() to extract the text from the image and store it in the extracted_text variable. 

Easy, right?

5. Displaying and Saving Extracted Text

Once you have the text extracted the next step is to display it on the screen. You can also save it in a .txt file. 

To display the extracted text run the code below.

print(extracted_text)

This will print the extracted text in your console.

To save the text to a file run this code.



with open('extracted_text.txt', 'w') as file:    file.write(extracted_text)

This will create a new file called extracted_text.txt and save all the extracted text inside it.

You’ve successfully created your own image-to-text converter. Now all you need to do is to change the image path, run the same commands, and start extracting the text. 

Enhancing the Converter

Now that you have built a simple image-to-text converter. Let’s enhance it further. Below we’ll walk you through a few ways that you can opt for enhancing your tool. 

Adding GUI Support

Working with a command line tool is a bit technical. Having a graphical user interface (GUI) can make the process easier. For example, look at the image below. 

It is the interface of Image to Text Converter. As you can see it is easier for a user to interact with the tool. They can extract text by simply clicking buttons. There is no need to type commands. 

Libraries like Tkinter and PyQt5 can help you create GUI. Here is a simple example of using Tkinter to create a basic GUI for uploading an image and displaying the extracted text:

First, you need to install Tkinter (if it is not already installed):

pip install tk

After installing Tkinter run the below code for GUI.



import tkinter as tkfrom tkinter import filedialogfrom PIL import ImageTk, Imageimport pytesseractimport cv2
# Create the main windowroot = tk.Tk()root.title("Image-to-Text Converter")
# Function to browse and load an imagedef upload_image():    file_path = filedialog.askopenfilename(title="Select an Image", filetypes=[("Image files", "*.jpg;*.jpeg;*.png")])    if file_path:        img = cv2.imread(file_path)        img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  # Convert to grayscale        text = pytesseract.image_to_string(img)  # Extract text
        # Display the extracted text in a text box        text_box.delete(1.0, tk.END)        text_box.insert(tk.END, text)
# Create buttons and text area for GUIupload_btn = tk.Button(root, text="Upload Image", command=upload_image)upload_btn.pack(pady=10)
text_box = tk.Text(root, height=10, width=50)text_box.pack(pady=20)
# Run the Tkinter event looproot.mainloop()

Things you should know about the above code.

  • Created a simple window (root) with a button to upload images. The uploaded image is processed, and the extracted text is displayed in a text box.
  • The filedialog.askopenfilename() will let the user select an image file from their system.
  • After processing, the extracted text will appear in the text box.

Batch Processing

You can also make your tool to process multiple images in one go. For this, you have to modify your script so that it can handle batch processing. 

For this, you have to run the code we have shared below. 



import os
# Function to process all images in a folderdef process_images_in_folder(folder_path):    for filename in os.listdir(folder_path):        if filename.endswith(('.jpg', '.jpeg', '.png')):            image_path = os.path.join(folder_path, filename)            img = cv2.imread(image_path)            img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)            text = pytesseract.image_to_string(img)
            # Save the extracted text to a file            with open(f"{filename}_extracted.txt", 'w') as file:                file.write(text)
# Specify folder pathfolder_path="path/to/your/folder"
# Call the function to process images in the folderprocess_images_in_folder(folder_path)

The above code will make your script go through each image file. It will process each image, extract the text using Tesseract, and further save the text as a separate .txt file. 

Key Takeaways

In the above blog post, we have shared the complete process of building an image-to-text converter using Python. Try implementing them and start creating your own image-to-text conversion tool. Turn to experiment, learn, and create something extraordinary.



News

Berita Olahraga

News

Berita Terkini

Berita Terbaru

Berita Teknologi

Seputar Teknologi

Drama Korea

Resep Masakan

Pendidikan

Berita Terbaru

Berita Terbaru

Berita Terbaru