RESTful endpoints for PDF processing, text extraction, and physical PDF splitting
All API endpoints are accessible at the following base URL:
https://pdf.betadev.biz
/api/processExtract text from a PDF with various chunking strategies.
{
"pdf_url": "https://arxiv.org/pdf/1706.03762.pdf", // URL to the PDF file (required)
"chunk_strategy": "fixed", // "fixed" or "semantic" (optional, default: "fixed")
"chunk_size": 2000 // Character count for fixed chunks (optional, default: 2000)
}
curl -X POST -H "Content-Type: application/json" \
-d '{"pdf_url": "https://arxiv.org/pdf/1706.03762.pdf", "chunk_strategy": "fixed", "chunk_size": 2000}' \
https://pdf.betadev.biz/api/process
{
"success": true,
"total_pages": 12,
"chunks": [
{
"start_page": 1,
"end_page": 10,
"text": "...(extracted text)...",
"text_chunks": ["...(chunk 1)...", "...(chunk 2)..."]
},
{
"start_page": 9,
"end_page": 12,
"text": "...(extracted text)...",
"text_chunks": ["...(chunk 1)...", "...(chunk 2)..."]
}
]
}
/api/split-pdfSplit a PDF into multiple smaller PDF files while preserving all content (images, formatting, etc).
{
"pdf_url": "https://arxiv.org/pdf/1706.03762.pdf", // URL to the PDF file (required)
"pages_per_split": 200 // Number of pages per split (optional, default: 200)
}
curl -X POST -H "Content-Type: application/json" \
-d '{"pdf_url": "https://arxiv.org/pdf/1706.03762.pdf", "pages_per_split": 200}' \
https://pdf.betadev.biz/api/split-pdf
{
"success": true,
"split_id": 123,
"total_pages": 456,
"num_splits": 3,
"split_files": [
{
"id": 456,
"start_page": 1,
"end_page": 200,
"file_name": "document_pages_1-200.pdf",
"download_url": "https://pdf.betadev.biz/split-pdf/456"
},
{
"id": 457,
"start_page": 201,
"end_page": 400,
"file_name": "document_pages_201-400.pdf",
"download_url": "https://pdf.betadev.biz/split-pdf/457"
},
{
"id": 458,
"start_page": 401,
"end_page": 456,
"file_name": "document_pages_401-456.pdf",
"download_url": "https://pdf.betadev.biz/split-pdf/458"
}
]
}
/api/splitsGet a list of previously split PDFs.
limit: Maximum number of records to return (optional, default: 50)curl https://pdf.betadev.biz/api/splits?limit=10
{
"success": true,
"splits": [
{
"id": 123,
"original_filename": "document.pdf",
"original_url": "https://example.com/document.pdf",
"total_pages": 456,
"parts_count": 3,
"created_at": "2025-04-19T12:34:56Z"
},
// more splits...
]
}
/api/splits/{split_id}Get detailed information about a specific split, including all parts.
split_id: ID of the split to retrieve (required)curl https://pdf.betadev.biz/api/splits/123
{
"success": true,
"split": {
"id": 123,
"original_filename": "document.pdf",
"original_url": "https://example.com/document.pdf",
"total_pages": 456,
"created_at": "2025-04-19T12:34:56Z",
"parts": [
{
"id": 456,
"start_page": 1,
"end_page": 200,
"file_name": "document_pages_1-200.pdf",
"download_url": "https://pdf.betadev.biz/split-pdf/456"
},
// more parts...
]
}
}
All API endpoints return standard HTTP status codes:
200 OK: Request successful400 Bad Request: Invalid request parameters404 Not Found: Resource not found500 Internal Server Error: Server-side error{
"error": "Error message describing what went wrong"
}