ChatDeepSeek#

class langchain_deepseek.chat_models.ChatDeepSeek[source]#

Bases: BaseChatOpenAI

DeepSeek chat model integration to access models hosted in DeepSeek’s API.

Setup:

Install langchain-deepseek and set environment variable DEEPSEEK_API_KEY.

pip install -U langchain-deepseek
export DEEPSEEK_API_KEY="your-api-key"

Key init args — completion params:

model: str: Name of DeepSeek model to use, e.g. “deepseek-chat”.
temperature: float: Sampling temperature.
max_tokens: Optional[int]: Max number of tokens to generate.

Key init args — client params:

timeout: Optional[float]: Timeout for requests.
max_retries: int: Max number of retries.
api_key: Optional[str]: DeepSeek API key. If not passed in will be read from env var DEEPSEEK_API_KEY.

See full list of supported init args and their descriptions in the params section.

Instantiate:

from langchain_deepseek import ChatDeepSeek

llm = ChatDeepSeek(
    model="...",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # api_key="...",
    # other params...
)

Invoke:

messages = [
    ("system", "You are a helpful translator. Translate the user sentence to French."),
    ("human", "I love programming."),
]
llm.invoke(messages)

Stream:

for chunk in llm.stream(messages):
    print(chunk.text(), end="")

stream = llm.stream(messages)
full = next(stream)
for chunk in stream:
    full += chunk
full

Async:

await llm.ainvoke(messages)

# stream:
# async for chunk in (await llm.astream(messages))

# batch:
# await llm.abatch([messages])

Tool calling:

from pydantic import BaseModel, Field

class GetWeather(BaseModel):
    '''Get the current weather in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

class GetPopulation(BaseModel):
    '''Get the current population in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

llm_with_tools = llm.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke("Which city is hotter today and which is bigger: LA or NY?")
ai_msg.tool_calls

See ChatDeepSeek.bind_tools() method for more.

Structured output:

from typing import Optional

from pydantic import BaseModel, Field

class Joke(BaseModel):
    '''Joke to tell user.'''

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(description="How funny the joke is, from 1 to 10")

structured_llm = llm.with_structured_output(Joke)
structured_llm.invoke("Tell me a joke about cats")

See ChatDeepSeek.with_structured_output() for more.

Token usage:

ai_msg = llm.invoke(messages)
ai_msg.usage_metadata

{'input_tokens': 28, 'output_tokens': 5, 'total_tokens': 33}

Response metadata

ai_msg = llm.invoke(messages)
ai_msg.response_metadata

Note

ChatDeepSeek implements the standard Runnable Interface. 🏃

The Runnable Interface has additional methods that are available on runnables, such as with_config, with_types, with_retry, assign, bind, get_graph, and more.

param api_base: str [Optional]#: DeepSeek API base URL

param api_key: SecretStr | None [Optional]#: DeepSeek API key

param cache: BaseCache | bool | None = None#

Whether to cache the response.

If true, will use the global cache.
If false, will not use a cache
If None, will use the global cache if it’s set, otherwise no cache.
If instance of BaseCache, will use the provided cache.

Caching is not currently supported for streaming methods of models.

param callback_manager: BaseCallbackManager | None = None#: Deprecated since version 0.1.7: Use callbacks() instead. It will be removed in pydantic==1.0.

Callback manager to add to the run trace.

param callbacks: Callbacks = None#: Callbacks to add to the run trace.

param custom_get_token_ids: Callable[[str], list[int]] | None = None#: Optional encoder to use for counting tokens.

param default_headers: Mapping[str, str] | None = None#

param default_query: Mapping[str, object] | None = None#

param disable_streaming: bool | Literal['tool_calling'] = False#

Whether to disable streaming for this model.

If streaming is bypassed, then stream()/astream()/astream_events() will defer to invoke()/ainvoke().

If True, will always bypass streaming case.
If “tool_calling”, will bypass streaming case only when the model is called with a tools keyword argument.
If False (default), will always use streaming case if available.

param disabled_params: dict[str, Any] | None = None#

Parameters of the OpenAI client or chat.completions endpoint that should be disabled for the given model.

Should be specified as {"param": None | ['val1', 'val2']} where the key is the parameter and the value is either None, meaning that parameter should never be used, or it’s a list of disabled values for the parameter.

For example, older models may not support the ‘parallel_tool_calls’ parameter at all, in which case disabled_params={"parallel_tool_calls": None} can be passed in.

If a parameter is disabled then it will not be used by default in any methods, e.g. in with_structured_output(). However this does not prevent a user from directly passed in the parameter during invocation.

param extra_body: Mapping[str, Any] | None = None#: Optional additional JSON properties to include in the request parameters when making requests to OpenAI compatible APIs, such as vLLM.

param frequency_penalty: float | None = None#: Penalizes repeated tokens according to frequency.

param http_async_client: Any | None = None#: Optional httpx.AsyncClient. Only used for async invocations. Must specify http_client as well if you’d like a custom client for sync invocations.

param http_client: Any | None = None#: Optional httpx.Client. Only used for sync invocations. Must specify http_async_client as well if you’d like a custom client for async invocations.

param include_response_headers: bool = False#: Whether to include response headers in the output message response_metadata.

param logit_bias: dict[int, int] | None = None#: Modify the likelihood of specified tokens appearing in the completion.

param logprobs: bool | None = None#: Whether to return logprobs.

param max_retries: int | None = None#: Maximum number of retries to make when generating.

param max_tokens: int | None = None#: Maximum number of tokens to generate.

param metadata: dict[str, Any] | None = None#: Metadata to add to the run trace.

param model_kwargs: dict[str, Any] [Optional]#: Holds any model parameters valid for create call not explicitly specified.

param model_name: str [Required] (alias 'model')#: The name of the model

param n: int | None = None#: Number of chat completions to generate for each prompt.

param openai_api_base: str | None = None (alias 'base_url')#: Base URL path for API requests, leave blank if not using a proxy or service emulator.

param openai_api_key: SecretStr | None [Optional] (alias 'api_key')#

param openai_organization: str | None = None (alias 'organization')#: Automatically inferred from env var OPENAI_ORG_ID if not provided.

param openai_proxy: str | None [Optional]#

param presence_penalty: float | None = None#: Penalizes repeated tokens.

param rate_limiter: BaseRateLimiter | None = None#: An optional rate limiter to use for limiting the number of requests.

param reasoning_effort: str | None = None#

Constrains effort on reasoning for reasoning models.

Reasoning models only, like OpenAI o1 and o3-mini.

Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.