How to get the incomplete/truncated response in case of json mode instead of (or in addition to) LengthFinishReasonError when the output max_tokens limit is met? #28932

AramSar · 2024-12-26T13:41:37Z

AramSar
Dec 26, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o", max_tokens=10)

prompt = [
    ("system", "You are a helpful assistant designed to output JSON. Respond to all question a json of the format {{winner: ..., info: ..., description: ...}}"),
    ("user", "{input}")
]

prompt = ChatPromptTemplate.from_messages(prompt)
llm = llm.bind(response_format={"type": "json_object"})
agent = prompt | llm

agent.invoke({"input": "Who won the world series in 2020?"})

Description

I am trying to retrieve the truncated response from the completion when the output token limit is met in json_mode.
I expect to receive the part that has been generated before the limit was met, instead I receive LengthFinishReasonError and not any response.

In the older version of langchain and langchain-openai the behavior was different, instead of error the truncated response was returned with a finish_reason: 'length' in the response metadata.

System Info

System Information

Python Version: 3.9.10

Package Information

langchain_core: 0.3.19
langchain: 0.3.7
langchain_community: 0.3.2
langsmith: 0.1.134
langchain_anthropic: 0.1.8
langchain_experimental: 0.3.2
langchain_openai: 0.2.9
langchain_text_splitters: 0.3.0
langchainhub: 0.1.14
langgraph: 0.0.40

Other Dependencies

aiohttp: 3.8.3
anthropic: 0.23.1
async-timeout: 4.0.2
httpx: 0.25.2
jsonpatch: 1.33
numpy: 1.26.4
openai: 1.58.1
pydantic: 2.9.2
pydantic-settings: 2.5.2
PyYAML: 6.0
requests: 2.31.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get the incomplete/truncated response in case of json mode instead of (or in addition to) LengthFinishReasonError when the output max_tokens limit is met? #28932

{{title}}

Replies: 0 comments

Select a reply

How to get the incomplete/truncated response in case of json mode instead of (or in addition to) LengthFinishReasonError when the output max_tokens limit is met? #28932

AramSar Dec 26, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

System Information

Package Information

Other Dependencies

Replies: 0 comments

AramSar
Dec 26, 2024