Start Scrape Job
Starts a scrape job for a given URL.
Method: client.scrape.start(params: StartScrapeJobParams): StartScrapeJobResponse
Endpoint: POST /api/scrape
Parameters:
StartScrapeJobParams
:
url: string
- URL to scrape
Response:
Example:
response = client.scrape.start(StartScrapeJobParams(url="https://example.com"))
print(response.jobId)
Get Scrape Job
Retrieves details of a specific scrape job.
Method: client.scrape.get(id: str): ScrapeJobResponse
Endpoint: GET /api/scrape/{id}
Parameters:
id: string
- Scrape job ID
Example:
response = client.scrape.get(
"182bd5e5-6e1a-4fe4-a799-aa6d9a6ab26e"
)
print(response.status)
Start Scrape Job and Wait
Start a scrape job and wait for it to complete
Method: client.scrape.start_and_wait(params: StartScrapeJobParams): ScrapeJobResponse
Parameters:
StartScrapeJobParams
:
url: string
- URL to scrape
Example:
response = client.scrape.start_and_wait(StartScrapeJobParams(url="https://example.com"))
print(response.status)
Types
ScrapeFormat = Literal["markdown", "html", "links", "screenshot"]
ScrapeJobStatus
ScrapeJobStatus = Literal["pending", "running", "completed", "failed"]
ScrapeOptions
class ScrapeOptions(BaseModel):
formats: Optional[List[ScrapeFormat]] = None
include_tags: Optional[List[str]] = Field(
default=None, serialization_alias="includeTags"
)
exclude_tags: Optional[List[str]] = Field(
default=None, serialization_alias="excludeTags"
)
only_main_content: Optional[bool] = Field(
default=None, serialization_alias="onlyMainContent"
)
wait_for: Optional[int] = Field(default=None, serialization_alias="waitFor")
timeout: Optional[int] = Field(default=None, serialization_alias="timeout")
StartScrapeJobResponse
class StartScrapeJobResponse(BaseModel):
job_id: str = Field(alias="jobId")
ScrapeJobData
class ScrapeJobData(BaseModel):
metadata: Optional[dict[str, Union[str, list[str]]]] = None
html: Optional[str] = None
markdown: Optional[str] = None
links: Optional[List[str]] = None
ScrapeJobResponse
class ScrapeJobResponse(BaseModel):
job_id: str = Field(alias="jobId")
status: ScrapeJobStatus
error: Optional[str] = None
data: Optional[ScrapeJobData] = None