Unlocking Code from Images: A Practical Guide to OCR with Tesseract and Flask with ChatGPT

Published in

Python in Plain English

5 min readFeb 22, 2023

The advancement of technology has brought about various innovative solutions to solve day-to-day problems. One of the recent breakthroughs is the ability to convert an image of code into the actual code. This process is often referred to as Optical Character Recognition (OCR). The codes presented in this article represent a Flask application that extracts code from a screenshot using the Tesseract OCR engine.

This article will provide a detailed explanation of the codes and how they work.

To install the libraries used in this application, one can use the pip package manager that comes bundled with Python. The libraries used in this application are Flask, OpenCV, NumPy, and Pytesseract. Here's how to install them:

pip install flask opencv-python numpy pytesseract

The code defines an instance of the Flask application and specifies that it should use the default template folder. The default template folder is the folder that the Flask application will use to search for HTML files to render. The instance is defined as follows:

app = Flask(__name__)

The __name__ parameter is a Python predefined variable that represents the name of the current module. This parameter is passed to the Flask constructor to let Flask know where to find the application files.

The application has two routes, one for the index page and another for the code extraction process. The index route simply renders an HTML file, while the code extraction route receives a screenshot from the user, processes it, extracts the code, and returns it to the user.

The index route is defined as follows:

@app.route('/')
def index():
    return render_template('index.html')

The function render_template is a Flask function that looks for an HTML file in the default template folder and renders it. In this case, the index.html file is rendered when the index route is accessed.

The second route is defined as follows:

@app.route('/extract', methods=['POST'])
def extract():
    # Get the screenshot image from the form data
    screenshot = request.files['screenshot'].read()
    screenshot = np.frombuffer(screenshot, np.uint8)
    screenshot = cv2.imdecode(screenshot, cv2.IMREAD_COLOR)

    # Preprocess the screenshot image to improve text extraction accuracy
    gray = cv2.cvtColor(screenshot, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (5, 5), 0)
    _, binary = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

    # Use Tesseract to extract text from the screenshot
    code = pytesseract.image_to_string(binary, lang='eng', config='--psm 6')

    # Improve the extracted text
    code = code.replace('\n\n', '\n').strip()

    # Return the extracted code to the result template
    return render_template('result.html', code=code)

The route receives a POST request, which contains a screenshot uploaded by the user. The screenshot is obtained from the form data and then processed using the OpenCV library. OpenCV is a popular library for image and video processing. In this application, OpenCV is used to preprocess the screenshot image to improve the accuracy of text extraction. The preprocessing includes converting the image to grayscale, applying Gaussian blur, and thresholding the image.

Once the preprocessing is done, Tesseract OCR is used to extract the text from the processed image. Tesseract is a popular OCR engine developed by Google that can recognize text in over 100 languages. In this application, Tesseract is used to recognize the code in the screenshot.

After the text is extracted, it is further processed to improve its readability. The processing involves replacing consecutive newlines with a single newline and removing any leading and trailing whitespace.

Finally, the extracted code is passed to the result.html file for rendering using the render_template function. The code variable is passed to the template so that it can be displayed to the user.

The HTML templates used by the Flask application are index.html and result.html. These templates are stored in the default template folder and contain the HTML, CSS, and JavaScript code needed to render the web page.

The index.html file contains a simple form with a file input field and a submit button. The user is required to upload a screenshot of the code they want to extract. The form is defined as follows:

<!DOCTYPE html>
<html>
  <head>
    <title>Screenshot to Code</title>
    <link href="https://fonts.googleapis.com/css2?family=Roboto:wght@400;700&display=swap" rel="stylesheet">
    <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>
    <header>
      <h1>Screenshot to Code</h1>
    </header>
    <main>
      <h2>Upload a Screenshot</h2>
      <form action="{{ url_for('extract') }}" method="post" enctype="multipart/form-data">
        <input type="file" name="screenshot">
        <button type="submit">Extract Code</button>
      </form>
    </main>
  </body>
</html>

This is the HTML code for the index.html file, which is the initial page that the user sees. The file is a standard HTML5 document with a head and body section. In the head section, the title of the page is set to "Screenshot to Code". The screenshot is uploaded through a form tag that uses the POST method to submit the file to the /extract URL. The enctype attribute is set to multipart/form-data to allow the form to handle file uploads.

The form tag contains an input tag with a type of file, which allows the user to browse their computer for a file to upload. The name attribute of the input tag is set to "screenshot", which matches the name used in the Flask application to access the file.

Lastly, there is a button tag that is used to submit the form. When the user clicks the "Extract Code" button, the form data is submitted to the /extract URL using the POST method. This is what the homepage looks like.

This is the uploaded picture.

The result.html file is used to display the extracted code to the user. The file contains a <pre> tag to display the code with a monospace font and preserves whitespace. The template is defined as follows:

<!DOCTYPE html>
<html>
  <head>
    <title>Extracted Code</title>
    <link href="https://fonts.googleapis.com/css2?family=Roboto:wght@400;700&display=swap" rel="stylesheet">
    <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
  </head>
  <body>
    <header>
      <h1>Screenshot to Code</h1>
    </header>
    <main>
      <h2>Extracted Code</h2>
      <pre><code>{{ code }}</code></pre>
    </main>
    <footer>
      <p>Powered by Flask and Tesseract.js</p>
    </footer>
    <script src="{{ url_for('static', filename='script.js') }}"></script>
  </body>
</html>

This is the HTML code for the result.html file, which is displayed to the user after the code has been extracted from the uploaded screenshot. The HTML file is similar to the index.html file, but with a different title and h2 element. The extracted code is displayed using a pre tag with a code element. The {{ code }} template variable is used to display the extracted code in the browser. Lastly, there is a footer tag that displays the message "Powered by Flask and Tesseract.js", indicating the technology used to build the application.

This is the output extracted from the picture.

Overall, the Flask application presented in these codes is a simple but effective tool for extracting code from a screenshot. It leverages the power of OpenCV and Tesseract OCR to accurately recognize the text in the image and presents it to the user in a readable format. The application can be further extended to support multiple languages and improve the accuracy of text extraction.

If you find my content helpful or enjoyable, consider supporting me by buying me a coffee!

More content at PlainEnglish.io.

Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube, and Discord.

Interested in scaling your software startup? Check out Circuit.

Unlocking Code from Images: A Practical Guide to OCR with Tesseract and Flask with ChatGPT

Written by Sohail Hosseini