Unlocking Code from Images: A Practical Guide to OCR with Tesseract and Flask with ChatGPT
The advancement of technology has brought about various innovative solutions to solve day-to-day problems. One of the recent breakthroughs is the ability to convert an image of code into the actual code. This process is often referred to as Optical Character Recognition (OCR). The codes presented in this article represent a Flask application that extracts code from a screenshot using the Tesseract OCR engine.
This article will provide a detailed explanation of the codes and how they work.
To install the libraries used in this application, one can use the pip package manager that comes bundled with Python. The libraries used in this application are Flask
, OpenCV
, NumPy
, and Pytesseract
. Here's how to install them:
pip install flask opencv-python numpy pytesseract
The code defines an instance of the Flask application and specifies that it should use the default template folder. The default template folder is the folder that the Flask application will use to search for HTML files to render. The instance is defined as follows:
app = Flask(__name__)
The __name__
parameter is a Python predefined variable that represents the name of the current module. This parameter is passed to the Flask constructor to let Flask know where to find the application files.
The application has two routes, one for the index page and another for the code extraction process. The index route simply renders an HTML file, while the code extraction route receives a screenshot from the user, processes it, extracts the code, and returns it to the user.
The index route is defined as follows:
@app.route('/')
def index():
return render_template('index.html')
The function render_template
is a Flask function that looks for an HTML file in the default template folder and renders it. In this case, the index.html file is rendered when the index route is accessed.
The second route is defined as follows:
@app.route('/extract', methods=['POST'])
def extract():
# Get the screenshot image from the form data
screenshot = request.files['screenshot'].read()
screenshot = np.frombuffer(screenshot, np.uint8)
screenshot = cv2.imdecode(screenshot, cv2.IMREAD_COLOR)
# Preprocess the screenshot image to improve text extraction accuracy
gray = cv2.cvtColor(screenshot, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
_, binary = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# Use Tesseract to extract text from the screenshot
code = pytesseract.image_to_string(binary, lang='eng', config='--psm 6')
# Improve the extracted text
code = code.replace('\n\n', '\n').strip()
# Return the extracted code to the result template
return render_template('result.html', code=code)
The route receives a POST request, which contains a screenshot uploaded by the user. The screenshot is obtained from the form data and then processed using the OpenCV library. OpenCV is a popular library for image and video processing. In this application, OpenCV is used to preprocess the screenshot image to improve the accuracy of text extraction. The preprocessing includes converting the image to grayscale, applying Gaussian blur, and thresholding the image.
Once the preprocessing is done, Tesseract OCR is used to extract the text from the processed image. Tesseract is a popular OCR engine developed by Google that can recognize text in over 100 languages. In this application, Tesseract is used to recognize the code in the screenshot.
After the text is extracted, it is further processed to improve its readability. The processing involves replacing consecutive newlines with a single newline and removing any leading and trailing whitespace.
Finally, the extracted code is passed to the result.html file for rendering using the render_template
function. The code
variable is passed to the template so that it can be displayed to the user.
The HTML templates used by the Flask application are index.html
and result.html
. These templates are stored in the default template folder and contain the HTML, CSS, and JavaScript code needed to render the web page.
The index.html
file contains a simple form with a file input field and a submit button. The user is required to upload a screenshot of the code they want to extract. The form is defined as follows:
<!DOCTYPE html>
<html>
<head>
<title>Screenshot to Code</title>
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@400;700&display=swap" rel="stylesheet">
<link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
</head>
<body>
<header>
<h1>Screenshot to Code</h1>
</header>
<main>
<h2>Upload a Screenshot</h2>
<form action="{{ url_for('extract') }}" method="post" enctype="multipart/form-data">
<input type="file" name="screenshot">
<button type="submit">Extract Code</button>
</form>
</main>
</body>
</html>
This is the HTML code for the index.html
file, which is the initial page that the user sees. The file is a standard HTML5 document with a head
and body
section. In the head
section, the title of the page is set to "Screenshot to Code". The screenshot is uploaded through a form
tag that uses the POST
method to submit the file to the /extract
URL. The enctype
attribute is set to multipart/form-data
to allow the form to handle file uploads.
The form
tag contains an input
tag with a type
of file
, which allows the user to browse their computer for a file to upload. The name
attribute of the input
tag is set to "screenshot", which matches the name used in the Flask application to access the file.
Lastly, there is a button
tag that is used to submit the form. When the user clicks the "Extract Code" button, the form data is submitted to the /extract
URL using the POST
method. This is what the homepage looks like.
This is the uploaded picture.
The result.html
file is used to display the extracted code to the user. The file contains a <pre>
tag to display the code with a monospace font and preserves whitespace. The template is defined as follows:
<!DOCTYPE html>
<html>
<head>
<title>Extracted Code</title>
<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@400;700&display=swap" rel="stylesheet">
<link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
</head>
<body>
<header>
<h1>Screenshot to Code</h1>
</header>
<main>
<h2>Extracted Code</h2>
<pre><code>{{ code }}</code></pre>
</main>
<footer>
<p>Powered by Flask and Tesseract.js</p>
</footer>
<script src="{{ url_for('static', filename='script.js') }}"></script>
</body>
</html>
This is the HTML code for the result.html
file, which is displayed to the user after the code has been extracted from the uploaded screenshot. The HTML file is similar to the index.html
file, but with a different title
and h2
element. The extracted code is displayed using a pre
tag with a code
element. The {{ code }}
template variable is used to display the extracted code in the browser. Lastly, there is a footer
tag that displays the message "Powered by Flask and Tesseract.js", indicating the technology used to build the application.
This is the output extracted from the picture.
Overall, the Flask application presented in these codes is a simple but effective tool for extracting code from a screenshot. It leverages the power of OpenCV and Tesseract OCR to accurately recognize the text in the image and presents it to the user in a readable format. The application can be further extended to support multiple languages and improve the accuracy of text extraction.
If you find my content helpful or enjoyable, consider supporting me by buying me a coffee!
More content at PlainEnglish.io.
Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube, and Discord.
Interested in scaling your software startup? Check out Circuit.