Introduction
Have you ever encountered a mathematical equation in an image and wished you could extract its LaTeX code instantly? With the power of AI, this is now possible! In this project, I built a LaTeX OCR (Optical Character Recognition) tool using GPT-4o and Streamlit. This application takes an image containing a mathematical equation and extracts its LaTeX representation effortlessly.
Video Demonstration
How It Works
The tool leverages GPT-4o’s vision capabilities to analyze the uploaded image and extract the equation in LaTeX format. The process follows these steps:
- The user uploads an image containing a mathematical equation.
- The image is converted to a Base64-encoded format.
- The encoded image is sent to GPT-4o, requesting LaTeX extraction.
- The model returns the LaTeX code, which is displayed both as raw text and rendered output.
- The extracted LaTeX code can be copied and used in research papers, reports, or any mathematical documentation.
Technologies Used
- Streamlit: For creating the user-friendly web interface.
- OpenAI GPT-4o: To extract the mathematical expression from images.
- PIL (Python Imaging Library): For image handling and processing.
- Python: The backbone of the application.
Code Implementation
Here’s a breakdown of the core functionality that extracts LaTeX code from an image:
1. Importing Required Libraries
This section imports the necessary libraries:
- streamlit is used to create the UI.
- base64 helps encode the image before sending it to GPT-4o.
- OpenAI is used to interact with GPT-4o.
- dotenv loads environment variables (API keys).
- PIL handles image processing.
- BytesIO is used for in-memory image handling.
- os retrieves environment variables.
2. Loading API Key
Here, we load the OpenAI API key from a .env file and initialize the OpenAI client.
3. Defining the Function to Extract LaTeX Code
This function processes the uploaded image:
- It opens the image using PIL.
- Converts it to RGB mode if necessary.
- Saves it to a BytesIO buffer.
- Encodes the image in Base64 format.
4. Sending the Image to GPT-4o for LaTeX Extraction
This section:
- Sends a request to GPT-4o with the encoded image.
- Specifies strict guidelines to ensure the correct extraction of LaTeX code.
- Retrieves and returns the extracted LaTeX code.
5. Handling Errors
If any error occurs during processing, it is displayed using st.error().
User Interface
The Streamlit UI consists of two sections:
- Upload Section: Users can upload an image containing a mathematical equation.
- Output Section: Displays the extracted LaTeX code and renders it using Streamlit’s st.latex() function.
Final Code
You can find the complete source code for this project on GitHub: https://github.com/rayaneB0t/Image-to-LaTeX-OCR-app-using-gpt-4o
Conclusion
This project showcases how AI can simplify mathematical equation extraction from images. With GPT-4o’s vision and Streamlit’s interactivity, converting images to LaTeX has never been easier!
Try it out and let me know your thoughts! 🚀
Start the conversation