API Documentation

Use our PDF analysis API to extract, parse, and analyze PDF documents with advanced capabilities.

POST https://pdfextractor-offi.onrender.com/api/parse-pdf
Upload and parse a PDF file

Request

POST https://pdfextractor-offi.onrender.com/api/parse-pdf

Content-Type: multipart/form-data

Request Body:

{

"file": [binary PDF file],

"format": "markdown" | "json" | "md_mm",

"lang": "auto" | "en" | "fr" | ...,

"start_page": 0,

"end_page": 10

}

Response

200 OK

Content-Type: application/json | text/markdown

Response Body (JSON format):

{
  "content": "# Document Title\n\nDocument content...",
  "format": "json",
  "document": {
    "title": "Document Title",
    "metadata": { ... }
  },
  "pages": [
    {
      "number": 0,
      "content": "Page content...",
      "elements": [ ... ]
    }
  ]
}

Example Usage (JavaScript)

const formData = new FormData();
formData.append('file', pdfFile);
formData.append('format', 'json');
formData.append('lang', 'auto');
formData.append('start_page', '0');
formData.append('end_page', '10');

const response = await fetch('https://pdfextractor-offi.onrender.com/api/parse-pdf', {
  method: 'POST',
  body: formData
});

const result = await response.json();
console.log(result);
Additional Endpoints
Additional utility endpoints for working with the API

GET https://pdfextractor-offi.onrender.com/api/formats

Returns available output formats for PDF parsing

{
  "formats": [
    {
      "id": "markdown",
      "name": "Markdown",
      "description": "Standard Markdown format with headers, paragraphs, and lists"
    },
    {
      "id": "json",
      "name": "JSON",
      "description": "Structured JSON with full document information"
    },
    {
      "id": "md_mm",
      "name": "Multimodal Markdown",
      "description": "Markdown with embedded images and formulas"
    }
  ]
}

GET https://pdfextractor-offi.onrender.com/api/languages

Returns available language options for OCR

{
  "languages": [
    { "code": "auto", "name": "Auto-detect" },
    { "code": "en", "name": "English" },
    { "code": "ch", "name": "Chinese" },
    { "code": "ja", "name": "Japanese" },
    { "code": "korean", "name": "Korean" },
    { "code": "fr", "name": "French" },
    { "code": "german", "name": "German" },
    { "code": "it", "name": "Italian" },
    { "code": "es", "name": "Spanish" },
    { "code": "pt", "name": "Portuguese" },
    { "code": "ru", "name": "Russian" },
    { "code": "ar", "name": "Arabic" }
  ]
}
Error Responses
Possible error responses from the API
Status
Error Code
Description
400
Bad Request
Invalid or missing parameters.
400
Bad Request
Only PDF files allowed.
400
Bad Request
No file provided.
500
Server Error
Error processing PDF.
Available Endpoints
Complete list of API endpoints
Endpoint
Description
/api/health
Check API status
/api/parse-pdf
Parse uploaded PDF
/api/parse-url
Parse PDF from URL
/api/formats
Get output formats
/api/languages
Get language options