API Documentation

Overview

The PDF & Image Text Extractor API provides a powerful OCR service that can extract text from both PDF documents and images. It uses advanced AI technology to recognize both printed and handwritten text.

Endpoint

POST /ocr

This endpoint accepts multipart form data containing a PDF or image file and returns the extracted text in a structured format.

Request Format

Headers

Content-Type: multipart/form-data

Body Parameters

Parameter	Type	Description
file	File	The PDF or image file to process (PDF, PNG, JPG, JPEG)

Response Format

Success Response (200 OK)

{
  "success": true,
  "extracted_text": "Extracted content in markdown format"
}

Error Response (4xx/5xx)

{
  "success": false,
  "error": "Error message describing what went wrong"
}

Example Usage

cURL

curl -X POST \
  https://pdfextractor-offi.onrender.com/ocr \
  -H "Content-Type: multipart/form-data" \
  -F "file=@document.pdf"

JavaScript

const formData = new FormData();
formData.append('file', fileInput.files[0]);

const response = await fetch('https://pdfextractor-offi.onrender.com/ocr', {
  method: 'POST',
  body: formData
});

const data = await response.json();
if (data.success) {
  console.log(data.extracted_text);
}

Supported File Types

PDF documents (.pdf)
PNG images (.png)
JPEG images (.jpg, .jpeg)

Features

Advanced OCR with AI-powered text recognition
Support for both printed and handwritten text
Multiple file format support
Structured markdown output
Fast processing times
Error handling and detailed feedback

Rate Limits

The API currently has no strict rate limits, but we recommend implementing reasonable throttling in your applications to ensure optimal performance for all users.