Skip to content

Migration Guide: Parse Upload Endpoint v1 to v2

This guide will help you migrate from the v1 Parse upload endpoint to the new v2 endpoint, which introduces a structured configuration approach and improved organization of parsing options.

⚠️ Alpha Version Warning: The v2 endpoint is currently in alpha (v2alpha1) and is subject to breaking changes until the stable release. We recommend testing thoroughly and being prepared for potential API changes during development.

The v2 endpoint replaces individual form parameters with a single JSON configuration string, providing:

  • Better organization: Related options are grouped into logical sections
  • Type safety: Structured validation with clear schemas
  • Extensibility: Easier to add new features without endpoint bloat
  • Validation: Better error messages and configuration validation
POST /api/v1/parsing/upload
Content-Type: multipart/form-data
- 70+ individual form parameters
- Flat parameter structure
- All parameters available regardless of parse mode
POST /api/v2alpha1/parse (existing file by ID)
POST /api/v2alpha1/parse/url (fetch from URL)
POST /api/v2alpha1/parse/upload (multipart file upload)
- Specialized endpoints for different input methods
- Single 'configuration' JSON parameter per endpoint
- Hierarchical, structured configuration
- Always-enabled optimizations
- Strict validation with clear error messages

Before (v1):

POST https://api.cloud.llamaindex.ai/api/v1/parsing/upload

After (v2): Choose the appropriate endpoint based on your input method:

Terminal window
# For parsing existing files by ID (recommended)
POST https://api.cloud.llamaindex.ai/api/v2alpha1/parse
# For parsing files from URLs
POST https://api.cloud.llamaindex.ai/api/v2alpha1/parse/url
# For multipart file uploads
POST https://api.cloud.llamaindex.ai/api/v2alpha1/parse/upload

2. Choose the Appropriate Endpoint and Configuration

Section titled “2. Choose the Appropriate Endpoint and Configuration”

v2 provides specialized endpoints for different input methods. Choose the one that matches how you’re providing the document:

For parsing an already uploaded file, use /parse with the file ID in the request body. This is the most efficient method as it reuses existing files.

For parsing documents from web URLs, use /parse/url with the URL and proxy settings in the request body.

For traditional file uploads, use /parse/upload with multipart form data and a configuration parameter.

3. Replace Form Parameters with Configuration JSON

Section titled “3. Replace Form Parameters with Configuration JSON”

The configuration approach depends on your chosen endpoint:

  • /: Uses JSON request body with file_id and configuration fields (recommended)
  • /url: Uses JSON request body with source_url, http_proxy, and configuration fields
  • /upload: Uses multipart form data with file and configuration parameters

Before migrating, review this checklist:

  • Choose the right endpoint: Select /upload, /, or /url based on your input method
  • Update request format: Change from form parameters to endpoint-specific configuration
  • Replace parse modes with tiers: Use tier instead of parse_mode (fast, cost_effective, agentic, agentic_plus)
  • Remove model selection: Models are now automatically selected based on tier
  • Remove prompts: Custom prompts are no longer supported for API simplification
  • Remove external provider configs: Azure OpenAI and external API keys are no longer supported
  • Check for always-enabled parameters: adaptive_long_table, high_res_ocr, merge_tables_across_pages_in_markdown, outlined_table_extraction and others are always enabled in v2
  • Update page indexing: Change target_pages from 0-based to 1-based indexing
  • Move language parameter: Move language to tier-specific ocr_parameters
  • Update cache parameters: Replace invalidate_cache + do_not_cache with single disable_cache
  • Convert webhooks: Change from single webhook_url to webhook_configurations array
  • Remove header/footer customization: Header/footer handling is now automatic
  • Remove URL fields from non-URL endpoints: Only /url endpoint accepts source_url and http_proxy
  • Test thoroughly: The alpha API may have additional breaking changes

The v2 configuration structure varies by endpoint:

{
"file_id": "existing-file-id",
"parse_options": {
"tier": "fast|cost_effective|agentic|agentic_plus",
"version": "latest|2025-12-11",
// Tier-specific options (see examples below)
},
"webhook_configurations": [...],
"input_options": {...},
"crop_box": {...},
"page_ranges": {...},
"disable_cache": "boolean (optional)",
"output_options": {...},
"processing_control": {...}
}
{
"source_url": "https://example.com/document.pdf",
"http_proxy": "https://proxy.example.com (optional)",
"parse_options": {
"tier": "fast|cost_effective|agentic|agentic_plus",
"version": "latest|2025-12-11",
// Tier-specific options (see examples below)
},
"webhook_configurations": [...],
"input_options": {...},
"crop_box": {...},
"page_ranges": {...},
"disable_cache": "boolean (optional)",
"output_options": {...},
"processing_control": {...}
}
{
"parse_options": {
"tier": "fast|cost_effective|agentic|agentic_plus",
"version": "latest|2025-12-11",
// Tier-specific options (see examples below)
},
"webhook_configurations": [...],
"input_options": {...},
"crop_box": {...},
"page_ranges": {...},
"disable_cache": "boolean (optional)",
"output_options": {...},
"processing_control": {...}
}
v1 Parameterv2 LocationNotes
input_urlsource_url (URL endpoint only)Moved to URL endpoint, renamed from nested structure
http_proxyhttp_proxy (URL endpoint only)Only available in URL endpoint
max_pagespage_ranges.max_pagesSame functionality
target_pagespage_ranges.target_pagesBreaking change: Now uses 1-based indexing (user inputs “1,2,3” instead of “0,1,2”)
invalidate_cache and do_not_cachedisable_cacheBreaking change: Single boolean combines both v1 parameters
languageparse_options.{mode}_options.ocr_parameters.languagesSame functionality

Important: In v1, target_pages used 0-based indexing (e.g., “0,1,2” for pages 1, 2, 3). In v2, it uses 1-based indexing (e.g., “1,2,3” for the same pages) to be homogenous with the rest of the platform.

The following parameters are always enabled in v2 across all tiers and cannot be disabled. We’re doing this to simplify calling LlamaParse and because these options give better results:

v1 Parameterv2 BehaviorBreaking Change
adaptive_long_tableAlways trueBreaking: Cannot be disabled in v2
high_res_ocrAlways trueBreaking: Cannot be disabled in v2
merge_tables_across_pages_in_markdownAlways trueBreaking: Cannot be disabled in v2
outlined_table_extractionAlways trueBreaking: Cannot be disabled in v2
guess_xlsx_sheet_nameAlways trueBreaking: Cannot be disabled in v2
v1 Parameterv2 LocationNotes
parse_modeparse_options.tierBreaking: Now uses tier-based system
modelAutomatic selectionBreaking: Model is selected automatically based on tier
parsing_instructionRemovedBreaking: Prompts are no longer supported for simplification
formatting_instructionRemovedBreaking: Prompts are no longer supported for simplification
system_promptRemovedBreaking: Prompts are no longer supported for simplification
user_promptRemovedBreaking: Prompts are no longer supported for simplification
languageparse_options.{tier}_options.ocr_parameters.languagesSame functionality

The following v1 parameters are not supported in v2:

v1 Parameterv2 StatusMigration Path
use_vendor_multimodal_modelRemoved (was deprecated)Use appropriate tier instead
gpt4o_modeRemovedUse tier: "cost_effective" for GPT-4o-mini or tier: "agentic_plus" for premium
gpt4o_api_keyRemovedExternal provider support removed for simplification
premium_modeRemovedUse tier: "agentic_plus" for highest quality
fast_modeRemovedUse tier: "fast" for fastest processing
continuous_modeRemovedNo direct equivalent
vendor_multimodal_api_keyRemovedBreaking: External providers removed for simplification
azure_openai_*RemovedBreaking: External providers removed for simplification
bounding_boxRenamedUse crop_box object instead
disable_image_extractionRemovedBreaking: Image extraction is now always optimized automatically
hide_headersRemovedBreaking: Header handling is now automatic
hide_footersRemovedBreaking: Footer handling is now automatic
page_header_prefixRemovedBreaking: Header formatting removed for simplification
page_footer_prefixRemovedBreaking: Footer formatting removed for simplification
page_prefixRemovedBreaking: Page prefix formatting removed for simplification
page_separatorRemovedBreaking: Custom page separators removed for simplification
keep_page_separator_when_merging_tablesRemovedBreaking: Table merging behavior is now optimized automatically
input_s3_path and input_s3_regionRemovedNot supported in v2alpha1
output_s3_path_prefix and output_s3_regionRemovedNot supported in v2alpha1
v1 Parameterv2 LocationNotes
webhook_urlwebhook_configurations[0].webhook_urlBreaking: Now an array, but only first entry is used at the moment
webhook_configurations (string)webhook_configurations (array)Breaking: Format changed from JSON string to structured array

The following options exist in the v2 schema but are not yet implemented:

  • ignore_strikethrough_text (exists in schema but not processed)
  • input_options.pdf.password (placeholder for future implementation)
v1 Parameterv2 Location
bbox_topcrop_box.top
bbox_bottomcrop_box.bottom
bbox_leftcrop_box.left
bbox_rightcrop_box.right
v1 Parameterv2 Location
html_make_all_elements_visibleinput_options.html.make_all_elements_visible
html_remove_fixed_elementsinput_options.html.remove_fixed_elements
html_remove_navigation_elementsinput_options.html.remove_navigation_elements
disable_image_extractioninput_options.pdf.disable_image_extraction
spreadsheet_extract_sub_tablesinput_options.spreadsheet.detect_sub_tables_in_sheets
v1 Parameterv2 Location
skip_diagonal_textparse_options.{mode}_options.ignore.ignore_diagonal_text
disable_ocrparse_options.{mode}_options.ignore.ignore_text_in_image
v1 Parameterv2 Location
annotate_linksoutput_options.markdown.annotate_links
page_suffixRemoved
hide_headersRemoved
hide_footersRemoved
compact_markdown_tableoutput_options.markdown.tables.compact_markdown_tables
output_tables_as_HTMLoutput_options.markdown.tables.output_tables_as_markdown (inverted)
guess_xlsx_sheet_nameoutput_options.tables_as_spreadsheet.guess_sheet_name
extract_layoutRemoved
take_screenshotoutput_options.screenshots.enable
output_pdf_of_documentoutput_options.export_pdf.enable
v1 Parameterv2 Location
job_timeout_in_secondsprocessing_control.timeouts.base_in_seconds
job_timeout_extra_time_per_page_in_secondsprocessing_control.timeouts.extra_time_per_page_in_seconds
page_error_toleranceprocessing_control.job_failure_conditions.allowed_page_failure_ratio

v2 provides more detailed error messages:

400: Invalid parameter combination
{
"detail": [
{
"type": "value_error",
"loc": ["parse_options", "tier"],
"msg": "Unsupported tier: invalid_tier. Must be one of: fast, cost_effective, agentic, agentic_plus",
"input": {...}
}
]
}

The v1 endpoint will remain available for the foreseeable future, so you can migrate at your own pace. However, new features and improvements will be focused on the v2 endpoint structure.