API Documentation
Overview
The PDF & Image Text Extractor API provides a powerful OCR service that can extract text from both PDF documents and images. It uses advanced AI technology to recognize both printed and handwritten text.
Endpoint
POST /ocr
This endpoint accepts multipart form data containing a PDF or image file and returns the extracted text in a structured format.
Request Format
Headers
Content-Type: multipart/form-data
Body Parameters
Parameter | Type | Description |
---|---|---|
file | File | The PDF or image file to process (PDF, PNG, JPG, JPEG) |
Response Format
Success Response (200 OK)
{ "success": true, "extracted_text": "Extracted content in markdown format" }
Error Response (4xx/5xx)
{ "success": false, "error": "Error message describing what went wrong" }
Example Usage
cURL
curl -X POST \ https://pdfextractor-offi.onrender.com/ocr \ -H "Content-Type: multipart/form-data" \ -F "file=@document.pdf"
JavaScript
const formData = new FormData(); formData.append('file', fileInput.files[0]); const response = await fetch('https://pdfextractor-offi.onrender.com/ocr', { method: 'POST', body: formData }); const data = await response.json(); if (data.success) { console.log(data.extracted_text); }
Supported File Types
- PDF documents (.pdf)
- PNG images (.png)
- JPEG images (.jpg, .jpeg)
Features
- Advanced OCR with AI-powered text recognition
- Support for both printed and handwritten text
- Multiple file format support
- Structured markdown output
- Fast processing times
- Error handling and detailed feedback
Rate Limits
The API currently has no strict rate limits, but we recommend implementing reasonable throttling in your applications to ensure optimal performance for all users.