Extracting LaTeX from Images using GPT-4o and Streamlit

Introduction

Have you ever encountered a mathematical equation in an image and wished you could extract its LaTeX code instantly? With the power of AI, this is now possible! In this project, I built a LaTeX OCR (Optical Character Recognition) tool using GPT-4o and Streamlit. This application takes an image containing a mathematical equation and extracts its LaTeX representation effortlessly.

Video Demonstration

LateX OCR Demo


How It Works

The tool leverages GPT-4o’s vision capabilities to analyze the uploaded image and extract the equation in LaTeX format. The process follows these steps:

  1. The user uploads an image containing a mathematical equation.
  2. The image is converted to a Base64-encoded format.
  3. The encoded image is sent to GPT-4o, requesting LaTeX extraction.
  4. The model returns the LaTeX code, which is displayed both as raw text and rendered output.
  5. The extracted LaTeX code can be copied and used in research papers, reports, or any mathematical documentation.

Technologies Used

  • Streamlit: For creating the user-friendly web interface.
  • OpenAI GPT-4o: To extract the mathematical expression from images.
  • PIL (Python Imaging Library): For image handling and processing.
  • Python: The backbone of the application.

Code Implementation

Here’s a breakdown of the core functionality that extracts LaTeX code from an image:

1. Importing Required Libraries

This section imports the necessary libraries:

Code

  • streamlit is used to create the UI.
  • base64 helps encode the image before sending it to GPT-4o.
  • OpenAI is used to interact with GPT-4o.
  • dotenv loads environment variables (API keys).
  • PIL handles image processing.
  • BytesIO is used for in-memory image handling.
  • os retrieves environment variables.

2. Loading API Key

Here, we load the OpenAI API key from a .env file and initialize the OpenAI client.

Code

3. Defining the Function to Extract LaTeX Code

This function processes the uploaded image:

  • It opens the image using PIL.
  • Converts it to RGB mode if necessary.
  • Saves it to a BytesIO buffer.
  • Encodes the image in Base64 format.

Code

4. Sending the Image to GPT-4o for LaTeX Extraction

This section:

  • Sends a request to GPT-4o with the encoded image.
  • Specifies strict guidelines to ensure the correct extraction of LaTeX code.
  • Retrieves and returns the extracted LaTeX code.

Code

5. Handling Errors

If any error occurs during processing, it is displayed using st.error().

Code


User Interface

The Streamlit UI consists of two sections:

  • Upload Section: Users can upload an image containing a mathematical equation.
  • Output Section: Displays the extracted LaTeX code and renders it using Streamlit’s st.latex() function.

Code


Final Code

Code

You can find the complete source code for this project on GitHub: https://github.com/rayaneB0t/Image-to-LaTeX-OCR-app-using-gpt-4o


Conclusion

This project showcases how AI can simplify mathematical equation extraction from images. With GPT-4o’s vision and Streamlit’s interactivity, converting images to LaTeX has never been easier!

Try it out and let me know your thoughts! 🚀

Extracting LaTeX from Images using GPT-4o and Streamlit
Extracting LaTeX from Images using GPT-4o and Streamlit

Start the conversation