ChatDeepSeek#

class langchain_deepseek.chat_models.ChatDeepSeek[source]#

Bases: BaseChatOpenAI

DeepSeek chat model integration to access models hosted in DeepSeek’s API.

Setup:

Install langchain-deepseek and set environment variable DEEPSEEK_API_KEY.

pip install -U langchain-deepseek
export DEEPSEEK_API_KEY="your-api-key"
Key init args — completion params:
model: str

Name of DeepSeek model to use, e.g. “deepseek-chat”.

temperature: float

Sampling temperature.

max_tokens: Optional[int]

Max number of tokens to generate.

Key init args — client params:
timeout: Optional[float]

Timeout for requests.

max_retries: int

Max number of retries.

api_key: Optional[str]

DeepSeek API key. If not passed in will be read from env var DEEPSEEK_API_KEY.

See full list of supported init args and their descriptions in the params section.

Instantiate:
from langchain_deepseek import ChatDeepSeek

llm = ChatDeepSeek(
    model="...",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # api_key="...",
    # other params...
)
Invoke:
messages = [
    ("system", "You are a helpful translator. Translate the user sentence to French."),
    ("human", "I love programming."),
]
llm.invoke(messages)
Stream:
for chunk in llm.stream(messages):
    print(chunk.text(), end="")
stream = llm.stream(messages)
full = next(stream)
for chunk in stream:
    full += chunk
full
Async:
await llm.ainvoke(messages)

# stream:
# async for chunk in (await llm.astream(messages))

# batch:
# await llm.abatch([messages])
Tool calling:
from pydantic import BaseModel, Field

class GetWeather(BaseModel):
    '''Get the current weather in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

class GetPopulation(BaseModel):
    '''Get the current population in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

llm_with_tools = llm.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke("Which city is hotter today and which is bigger: LA or NY?")
ai_msg.tool_calls

See ChatDeepSeek.bind_tools() method for more.

Structured output:
from typing import Optional

from pydantic import BaseModel, Field

class Joke(BaseModel):
    '''Joke to tell user.'''

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(description="How funny the joke is, from 1 to 10")

structured_llm = llm.with_structured_output(Joke)
structured_llm.invoke("Tell me a joke about cats")

See ChatDeepSeek.with_structured_output() for more.

Token usage:
ai_msg = llm.invoke(messages)
ai_msg.usage_metadata
{'input_tokens': 28, 'output_tokens': 5, 'total_tokens': 33}
Response metadata
ai_msg = llm.invoke(messages)
ai_msg.response_metadata

Note

ChatDeepSeek implements the standard Runnable Interface. 🏃

The Runnable Interface has additional methods that are available on runnables, such as with_config, with_types, with_retry, assign, bind, get_graph, and more.

param api_base: str [Optional]#

DeepSeek API base URL

param api_key: SecretStr | None [Optional]#

DeepSeek API key

param cache: BaseCache | bool | None = None#

Whether to cache the response.

  • If true, will use the global cache.

  • If false, will not use a cache

  • If None, will use the global cache if it’s set, otherwise no cache.

  • If instance of BaseCache, will use the provided cache.

Caching is not currently supported for streaming methods of models.

param callback_manager: BaseCallbackManager | None = None#

Deprecated since version 0.1.7: Use callbacks() instead. It will be removed in pydantic==1.0.

Callback manager to add to the run trace.

param callbacks: Callbacks = None#

Callbacks to add to the run trace.

param custom_get_token_ids: Callable[[str], list[int]] | None = None#

Optional encoder to use for counting tokens.

param default_headers: Mapping[str, str] | None = None#
param default_query: Mapping[str, object] | None = None#
param disable_streaming: bool | Literal['tool_calling'] = False#

Whether to disable streaming for this model.

If streaming is bypassed, then stream()/astream()/astream_events() will defer to invoke()/ainvoke().

  • If True, will always bypass streaming case.

  • If “tool_calling”, will bypass streaming case only when the model is called with a tools keyword argument.

  • If False (default), will always use streaming case if available.

param disabled_params: dict[str, Any] | None = None#

Parameters of the OpenAI client or chat.completions endpoint that should be disabled for the given model.

Should be specified as {"param": None | ['val1', 'val2']} where the key is the parameter and the value is either None, meaning that parameter should never be used, or it’s a list of disabled values for the parameter.

For example, older models may not support the ‘parallel_tool_calls’ parameter at all, in which case disabled_params={"parallel_tool_calls": None} can be passed in.

If a parameter is disabled then it will not be used by default in any methods, e.g. in with_structured_output(). However this does not prevent a user from directly passed in the parameter during invocation.

param extra_body: Mapping[str, Any] | None = None#

Optional additional JSON properties to include in the request parameters when making requests to OpenAI compatible APIs, such as vLLM.

param frequency_penalty: float | None = None#

Penalizes repeated tokens according to frequency.

param http_async_client: Any | None = None#

Optional httpx.AsyncClient. Only used for async invocations. Must specify http_client as well if you’d like a custom client for sync invocations.

param http_client: Any | None = None#

Optional httpx.Client. Only used for sync invocations. Must specify http_async_client as well if you’d like a custom client for async invocations.

param include_response_headers: bool = False#

Whether to include response headers in the output message response_metadata.

param logit_bias: dict[int, int] | None = None#

Modify the likelihood of specified tokens appearing in the completion.

param logprobs: bool | None = None#

Whether to return logprobs.

param max_retries: int | None = None#

Maximum number of retries to make when generating.

param max_tokens: int | None = None#

Maximum number of tokens to generate.

param metadata: dict[str, Any] | None = None#

Metadata to add to the run trace.

param model_kwargs: dict[str, Any] [Optional]#

Holds any model parameters valid for create call not explicitly specified.

param model_name: str [Required] (alias 'model')#

The name of the model

param n: int | None = None#

Number of chat completions to generate for each prompt.

param openai_api_base: str | None = None (alias 'base_url')#

Base URL path for API requests, leave blank if not using a proxy or service emulator.

param openai_api_key: SecretStr | None [Optional] (alias 'api_key')#
param openai_organization: str | None = None (alias 'organization')#

Automatically inferred from env var OPENAI_ORG_ID if not provided.

param openai_proxy: str | None [Optional]#
param presence_penalty: float | None = None#

Penalizes repeated tokens.

param rate_limiter: BaseRateLimiter | None = None#

An optional rate limiter to use for limiting the number of requests.

param reasoning_effort: str | None = None#

Constrains effort on reasoning for reasoning models.

Reasoning models only, like OpenAI o1 and o3-mini.

Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.