PDF Processor API Documentation

RESTful endpoints for PDF processing, text extraction, and physical PDF splitting

API Base URL

All API endpoints are accessible at the following base URL:

https://pdf.betadev.biz

Text Extraction API

Endpoint: /api/process

Extract text from a PDF with various chunking strategies.

Method: POST
Request Body:
{
  "pdf_url": "https://arxiv.org/pdf/1706.03762.pdf",  // URL to the PDF file (required)
  "chunk_strategy": "fixed",                          // "fixed" or "semantic" (optional, default: "fixed")
  "chunk_size": 2000                                  // Character count for fixed chunks (optional, default: 2000)
}
Example Request:
curl -X POST -H "Content-Type: application/json" \
    -d '{"pdf_url": "https://arxiv.org/pdf/1706.03762.pdf", "chunk_strategy": "fixed", "chunk_size": 2000}' \
    https://pdf.betadev.biz/api/process
Response:
{
  "success": true,
  "total_pages": 12,
  "chunks": [
    {
      "start_page": 1,
      "end_page": 10,
      "text": "...(extracted text)...",
      "text_chunks": ["...(chunk 1)...", "...(chunk 2)..."]
    },
    {
      "start_page": 9,
      "end_page": 12,
      "text": "...(extracted text)...",
      "text_chunks": ["...(chunk 1)...", "...(chunk 2)..."]
    }
  ]
}
Note: Very large PDFs may be limited to prevent timeouts.

Physical PDF Splitting API

Endpoint: /api/split-pdf

Split a PDF into multiple smaller PDF files while preserving all content (images, formatting, etc).

Method: POST
Request Body:
{
  "pdf_url": "https://arxiv.org/pdf/1706.03762.pdf",  // URL to the PDF file (required)
  "pages_per_split": 200                              // Number of pages per split (optional, default: 200)
}
Example Request:
curl -X POST -H "Content-Type: application/json" \
    -d '{"pdf_url": "https://arxiv.org/pdf/1706.03762.pdf", "pages_per_split": 200}' \
    https://pdf.betadev.biz/api/split-pdf
Response:
{
  "success": true,
  "split_id": 123,
  "total_pages": 456,
  "num_splits": 3,
  "split_files": [
    {
      "id": 456,
      "start_page": 1,
      "end_page": 200,
      "file_name": "document_pages_1-200.pdf",
      "download_url": "https://pdf.betadev.biz/split-pdf/456"
    },
    {
      "id": 457,
      "start_page": 201,
      "end_page": 400,
      "file_name": "document_pages_201-400.pdf",
      "download_url": "https://pdf.betadev.biz/split-pdf/457"
    },
    {
      "id": 458,
      "start_page": 401,
      "end_page": 456,
      "file_name": "document_pages_401-456.pdf",
      "download_url": "https://pdf.betadev.biz/split-pdf/458"
    }
  ]
}

Previous Splits Lookup API

Endpoint: /api/splits

Get a list of previously split PDFs.

Method: GET
Query Parameters:
  • limit: Maximum number of records to return (optional, default: 50)
Example Request:
curl https://pdf.betadev.biz/api/splits?limit=10
Response:
{
  "success": true,
  "splits": [
    {
      "id": 123,
      "original_filename": "document.pdf",
      "original_url": "https://example.com/document.pdf",
      "total_pages": 456,
      "parts_count": 3,
      "created_at": "2025-04-19T12:34:56Z"
    },
    // more splits...
  ]
}

Split Details API

Endpoint: /api/splits/{split_id}

Get detailed information about a specific split, including all parts.

Method: GET
Path Parameters:
  • split_id: ID of the split to retrieve (required)
Example Request:
curl https://pdf.betadev.biz/api/splits/123
Response:
{
  "success": true,
  "split": {
    "id": 123,
    "original_filename": "document.pdf",
    "original_url": "https://example.com/document.pdf",
    "total_pages": 456,
    "created_at": "2025-04-19T12:34:56Z",
    "parts": [
      {
        "id": 456,
        "start_page": 1,
        "end_page": 200,
        "file_name": "document_pages_1-200.pdf",
        "download_url": "https://pdf.betadev.biz/split-pdf/456"
      },
      // more parts...
    ]
  }
}

Error Handling

All API endpoints return standard HTTP status codes:

  • 200 OK: Request successful
  • 400 Bad Request: Invalid request parameters
  • 404 Not Found: Resource not found
  • 500 Internal Server Error: Server-side error
Error Response Format:
{
  "error": "Error message describing what went wrong"
}