AzureAIDocumentIntelligenceLoader#
- class langchain_community.document_loaders.doc_intelligence.AzureAIDocumentIntelligenceLoader(
- api_endpoint: str,
- api_key: str | None = None,
- file_path: str | None = None,
- url_path: str | None = None,
- bytes_source: bytes | None = None,
- api_version: str | None = None,
- api_model: str = 'prebuilt-layout',
- mode: str = 'markdown',
- *,
- analysis_features: List[str] | None = None,
- azure_credential: 'TokenCredential' | None = None,
Load a PDF with Azure Document Intelligence.
Initialize the object for file processing with Azure Document Intelligence (formerly Form Recognizer).
This constructor initializes a AzureAIDocumentIntelligenceParser object to be used for parsing files using the Azure Document Intelligence API. The load method generates Documents whose content representations are determined by the mode parameter.
Parameters:#
- api_endpoint: str
The API endpoint to use for DocumentIntelligenceClient construction.
- api_key: str
The API key to use for DocumentIntelligenceClient construction.
- file_pathOptional[str]
The path to the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- url_pathOptional[str]
The URL to the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- bytes_sourceOptional[bytes]
The bytes array of the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- api_version: Optional[str]
The API version for DocumentIntelligenceClient. Setting None to use the default value from azure-ai-documentintelligence package.
- api_model: str
Unique document model name. Default value is “prebuilt-layout”. Note that overriding this default value may result in unsupported behavior.
- mode: Optional[str]
The type of content representation of the generated Documents. Use either “single”, “page”, or “markdown”. Default value is “markdown”.
- analysis_features: Optional[List[str]]
List of optional analysis features, each feature should be passed as a str that conforms to the enum DocumentAnalysisFeature in azure-ai-documentintelligence package. Default value is None.
- azure_credential: Optional[TokenCredential]
The credentials to use for DocumentIntelligenceClient construction, when using credentials other than api_key (like AD).
Examples:#
>>> obj = AzureAIDocumentIntelligenceLoader( ... file_path="path/to/file", ... api_endpoint="https://endpoint.azure.com", ... api_key="APIKEY", ... api_version="2023-10-31-preview", ... api_model="prebuilt-layout", ... mode="markdown" ... )
Methods
__init__
(api_endpoint[, api_key, file_path, ...])Initialize the object for file processing with Azure Document Intelligence (formerly Form Recognizer).
A lazy loader for Documents.
aload
()Load data into Document objects.
Lazy load the document as pages.
load
()Load data into Document objects.
load_and_split
([text_splitter])Load Documents and split into chunks.
- __init__(
- api_endpoint: str,
- api_key: str | None = None,
- file_path: str | None = None,
- url_path: str | None = None,
- bytes_source: bytes | None = None,
- api_version: str | None = None,
- api_model: str = 'prebuilt-layout',
- mode: str = 'markdown',
- *,
- analysis_features: List[str] | None = None,
- azure_credential: 'TokenCredential' | None = None,
Initialize the object for file processing with Azure Document Intelligence (formerly Form Recognizer).
This constructor initializes a AzureAIDocumentIntelligenceParser object to be used for parsing files using the Azure Document Intelligence API. The load method generates Documents whose content representations are determined by the mode parameter.
Parameters:#
- api_endpoint: str
The API endpoint to use for DocumentIntelligenceClient construction.
- api_key: str
The API key to use for DocumentIntelligenceClient construction.
- file_pathOptional[str]
The path to the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- url_pathOptional[str]
The URL to the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- bytes_sourceOptional[bytes]
The bytes array of the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- api_version: Optional[str]
The API version for DocumentIntelligenceClient. Setting None to use the default value from azure-ai-documentintelligence package.
- api_model: str
Unique document model name. Default value is “prebuilt-layout”. Note that overriding this default value may result in unsupported behavior.
- mode: Optional[str]
The type of content representation of the generated Documents. Use either “single”, “page”, or “markdown”. Default value is “markdown”.
- analysis_features: Optional[List[str]]
List of optional analysis features, each feature should be passed as a str that conforms to the enum DocumentAnalysisFeature in azure-ai-documentintelligence package. Default value is None.
- azure_credential: Optional[TokenCredential]
The credentials to use for DocumentIntelligenceClient construction, when using credentials other than api_key (like AD).
Examples:#
>>> obj = AzureAIDocumentIntelligenceLoader( ... file_path="path/to/file", ... api_endpoint="https://endpoint.azure.com", ... api_key="APIKEY", ... api_version="2023-10-31-preview", ... api_model="prebuilt-layout", ... mode="markdown" ... )
- Parameters:
api_endpoint (str)
api_key (Optional[str])
file_path (Optional[str])
url_path (Optional[str])
bytes_source (Optional[bytes])
api_version (Optional[str])
api_model (str)
mode (str)
analysis_features (Optional[List[str]])
azure_credential (Optional['TokenCredential'])
- Return type:
None
- async alazy_load() AsyncIterator[Document] #
A lazy loader for Documents.
- Return type:
AsyncIterator[Document]
- lazy_load() Iterator[Document] [source]#
Lazy load the document as pages.
- Return type:
Iterator[Document]
- load_and_split(
- text_splitter: TextSplitter | None = None,
Load Documents and split into chunks. Chunks are returned as Documents.
Do not override this method. It should be considered to be deprecated!
- Parameters:
text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Returns:
List of Documents.
- Return type:
list[Document]
- Parameters:
api_endpoint (str)
api_key (Optional[str])
file_path (Optional[str])
url_path (Optional[str])
bytes_source (Optional[bytes])
api_version (Optional[str])
api_model (str)
mode (str)
analysis_features (Optional[List[str]])
azure_credential (Optional['TokenCredential'])
Examples using AzureAIDocumentIntelligenceLoader