API Documentation
Use our PDF analysis API to extract, parse, and analyze PDF documents with advanced capabilities.
POST https://pdfextractor-offi.onrender.com/api/parse-pdf
Upload and parse a PDF file
Request
POST https://pdfextractor-offi.onrender.com/api/parse-pdf
Content-Type: multipart/form-data
Request Body:
{
"file": [binary PDF file],
"format": "markdown" | "json" | "md_mm",
"lang": "auto" | "en" | "fr" | ...,
"start_page": 0,
"end_page": 10
}
Response
200 OK
Content-Type: application/json | text/markdown
Response Body (JSON format):
{ "content": "# Document Title\n\nDocument content...", "format": "json", "document": { "title": "Document Title", "metadata": { ... } }, "pages": [ { "number": 0, "content": "Page content...", "elements": [ ... ] } ] }
Example Usage (JavaScript)
const formData = new FormData(); formData.append('file', pdfFile); formData.append('format', 'json'); formData.append('lang', 'auto'); formData.append('start_page', '0'); formData.append('end_page', '10'); const response = await fetch('https://pdfextractor-offi.onrender.com/api/parse-pdf', { method: 'POST', body: formData }); const result = await response.json(); console.log(result);
Additional Endpoints
Additional utility endpoints for working with the API
GET https://pdfextractor-offi.onrender.com/api/formats
Returns available output formats for PDF parsing
{ "formats": [ { "id": "markdown", "name": "Markdown", "description": "Standard Markdown format with headers, paragraphs, and lists" }, { "id": "json", "name": "JSON", "description": "Structured JSON with full document information" }, { "id": "md_mm", "name": "Multimodal Markdown", "description": "Markdown with embedded images and formulas" } ] }
GET https://pdfextractor-offi.onrender.com/api/languages
Returns available language options for OCR
{ "languages": [ { "code": "auto", "name": "Auto-detect" }, { "code": "en", "name": "English" }, { "code": "ch", "name": "Chinese" }, { "code": "ja", "name": "Japanese" }, { "code": "korean", "name": "Korean" }, { "code": "fr", "name": "French" }, { "code": "german", "name": "German" }, { "code": "it", "name": "Italian" }, { "code": "es", "name": "Spanish" }, { "code": "pt", "name": "Portuguese" }, { "code": "ru", "name": "Russian" }, { "code": "ar", "name": "Arabic" } ] }
Error Responses
Possible error responses from the API
Status
Error Code
Description
400
Bad Request
Invalid or missing parameters.
400
Bad Request
Only PDF files allowed.
400
Bad Request
No file provided.
500
Server Error
Error processing PDF.
Available Endpoints
Complete list of API endpoints
Endpoint
Description
/api/health
Check API status
/api/parse-pdf
Parse uploaded PDF
/api/parse-url
Parse PDF from URL
/api/formats
Get output formats
/api/languages
Get language options