API Documentation

Overview

The PDF & Image Text Extractor API provides a powerful OCR service that can extract text from both PDF documents and images. It uses advanced AI technology to recognize both printed and handwritten text.

Endpoint

POST /ocr

This endpoint accepts multipart form data containing a PDF or image file and returns the extracted text in a structured format.

Request Format

Headers

Content-Type: multipart/form-data

Body Parameters

ParameterTypeDescription
fileFileThe PDF or image file to process (PDF, PNG, JPG, JPEG)

Response Format

Success Response (200 OK)

{
  "success": true,
  "extracted_text": "Extracted content in markdown format"
}

Error Response (4xx/5xx)

{
  "success": false,
  "error": "Error message describing what went wrong"
}

Example Usage

cURL

curl -X POST \
  https://pdfextractor-offi.onrender.com/ocr \
  -H "Content-Type: multipart/form-data" \
  -F "file=@document.pdf"

JavaScript

const formData = new FormData();
formData.append('file', fileInput.files[0]);

const response = await fetch('https://pdfextractor-offi.onrender.com/ocr', {
  method: 'POST',
  body: formData
});

const data = await response.json();
if (data.success) {
  console.log(data.extracted_text);
}

Supported File Types

  • PDF documents (.pdf)
  • PNG images (.png)
  • JPEG images (.jpg, .jpeg)

Features

  • Advanced OCR with AI-powered text recognition
  • Support for both printed and handwritten text
  • Multiple file format support
  • Structured markdown output
  • Fast processing times
  • Error handling and detailed feedback

Rate Limits

The API currently has no strict rate limits, but we recommend implementing reasonable throttling in your applications to ensure optimal performance for all users.