Structured output with responses API returns tons of \n\n\n\n

Using gpt-4.1 with structured outputs I get occasional (about 1 in 10) repeated \n\n\n\n added to my structured output. The structured output BEFORE the \n starts is valid. But there are a lot of \n added (a 100k characters! ) they are neatly order in groups

\n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n \n\n\n\n\n\n 

This seems to be the same as this old one The GPT-4-1106-preview model keeps generating "\\n\\n\\n\\n\\n\\n\\n\\n" for an hour when using functions - #9 by Diet
but I started a new one because a) a different model (4.1) and b) technically valid json (but with 100k extra \n

For anyone at OpenAI that wants to check here’s an example response_id resp_6857d0cb11388198ac8ed8d5362e40d70ed5adbbe3a34036

3 Likes

In some integrations I’ve seen it output the structured output multiple times, with or without differing data, as a way to (undesirably) provide JSONL instead of using a simple array as intended. So, the main issue here appears to be that the model is allowed to continue outputting after it has completed its object, and being constrained to valid JSON symbols isn’t enough to prevent these issues.

As a workaround, the Chat Completions API has parameters for presence and frequency penalties. Using either may encourage the model to terminate its output when it’s finished.

I had another one - new pattern.

revolutionize plastic recycling and construction in Southwest Virginia."}]}\n \n\n\n\n\n\n \t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t\t\t\t \t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t \t\t (id: resp_68626b34d1c0819a84826ba7e61f77f10652f8198a192ebf )

Hey everyone!

We’re aware that GPT‑4.1 (and occasionally GPT‑4o) can add a long string of blank lines or extra array items after the correct JSON when you stream a structured‑output response. Engineering is on it. Until the fix ships, you can avoid most cases by turning off streaming or by parsing only the first complete JSON object and discarding anything that follows.

Let us know if that helps at all!

3 Likes

Thanks for the note - much appreciated.
I’m not (ever) using streaming, so it is not related to streaming.

It also takes a very long time for OpenaAI api to come back with the response in those cases, and they are 100k bytes long (filling all the way up to the max 32k tokens I defined) - but I guess I can add some code in the exception handler retry. It’s not that trivial since the sequences are not always the same - I guess I’ll ask Codex :slight_smile:

3 Likes

*** humbled *** Actually it WAS easy to work around and so I will post the problem and the solution here for anyone that runs into this problem (in Python) and needs a work around.

My original code looked something like this:

json =  json.loads(self.response.strip(), strict=False)

Turns out (thank you, o4-mini-high) that there is also just raw_decode and you do:

dec = json. JSONDecoder()
json, garbage = dec.raw_decode(self.response)

HOWEVER this of course ONLY works when proper JSON is returned in the first place. Which is not always the case. So I am still VERY MUCH waiting on a fix for this.
We do have a repeatable (goes wrong every time) query BTW if you’re interested.

5 Likes

🙌 Thank you for sharing your solution and would definitely appreciate the repeatably broken query!

2 Likes

Happy to share privately - or through responseid ( resp_6862d752b11c819a9fd7ffddb1d8eb860ae883b7f8718bd3 )

2 Likes

Thank you for sharing the solution @jlvanhulst!

There is still the issue of being charged for all those redundant \n and \t\t output tokens being generated, correct?

Also, when using the Chat Completions API, would setting the frequency_penalty to some positive number (as default is 0) help address the cost problem by truncating the response early, as in theory, every new line of \n or
\t\t would be penalized? What do you think?

The alternative for the Responses API could perhaps be setting the max_output_tokens value to a conservative number that’s well below the maximum limit for the model you’re using, depending on the use case. Maybe that could work as well?

2 Likes

Yes it is not ‘THE’ solution since even with the extras removed the JSON is not always valid. Also, the bad queries take forever to produce. But this ‘fix’ makes the problem about 50% less problematic by my estimate.

2 Likes

This should be able to be countered with logit_bias. Chat Completions. You could find all these tab combos that OpenAI trained the AI models on as token numbers (and NOBODY wants tab-indented multi-line JSON…), and harshly demote them - make only single-line JSON possible, without whitespace.

This in-string tuning could be done at the same time OpenAI is enforcing a context-free grammar and then releasing the AI model into a string where it can write these bad characters. Tabs are possible in a JSON string, but highly unlikely to be desired in any use case, as JSON itself is the data structure, not table data in a string.

Then after coming up with a long list of things the AI is trying to write (JSON structure but within the JSON data) and you killing them off, in regular interactions, and trying json_object mode, try it on your over-specified non-strict (non enforced) JSON schema…

Unfortunately, OpenAI also messed up the way that logit_bias is supposed to work. It is completely broken and without effect if you use either temperature or top_p.

Then, messed with the logprobs and delivering them for examinations of the precise production within functions or structured output, leaving you to infer token numbers and token strings yourself.

Even being able to promote special tokens (so the model is more likely to finish instead of going into a loop of multiple outputs) is blocked.

Doesn’t matter, Responses is completely feature-less. You can’t even add a crapload of tabs as a stop sequence.

So: bad models, broken API parameters violating API reference, and then…bad endpoint “Responses” completely blocking any such self-service.

Hi @OpenAI_Support
Is there an expected timeline for the fix?

1 Like

@OpenAI_Support I also observe this very frequently when using structured outputs with both o3 and o4-mini. These tabs and newlines get added for 5-10 minutes before resolving.

Might the fix you are talking about also help resolve things with these reasoning models (rather than just GPT 4.1 as you mentioned?)

1 Like

Hey everyone,

It looks like what you’re seeing is that the API enforces that the JSON itself is valid, but it doesn’t automatically stop generation beyond that.

A couple of ways to address this:

  • Add a stop sequence to your request, for example stop: ["}\n"], so generation halts right after the closing brace.
  • If you’re streaming, try setting stream: false so the API can validate and return the response before extra padding appears.
  • As a fallback, you can trim everything after the final } in your code before parsing.

If you’re using response_format with json_schema, also ensure "strict": true is set in the schema for the strongest enforcement.

Let me know if this helps at all or if you're still blocked anywhere!

1 Like

This topic is about the “Responses” API endpoint.

stop is not a supported parameter. As I just said above:

What would the API be validating? It is more likely you get a 500 server error from the JSON not finishing when in a context-free grammar enforcement, whereas with a stream, you at least receive something you can deal with - and see that the issue is all the escaped tabs and newlines being written.

With a stream, you can close() instead of letting the model run up a 16k token bill.

The simple fact is: the API should be emitting a special token of ChatML format to end the output, the context-free grammar must never block the stop token, and that should be a built-in stop sequence that is always caught by the API. One of those is not happening and must be fixed.

Provide real logprobs without the special tokens stripped out, and logit_bias for Responses that also works when including sampling parameters (which is now broken on Chat Completions) and takes special token numbers, and then developer can actually see a misbehaving model or generation continuing past where termination was predicted - or not.


internal prompt: <|im_start|>assistant (token 200006)
generation: <|im_sep|>Sure, I can write JSON.<|im_end> (token 200008, 007)
or generation: to=image_gen.text2im<|im_sep|>\n{"prompt"...

stop: [200007]

(token strings are not decoded)

Now you have nothing proprietary left to hide and can turn on a good version of logprobs that starts from token 0 at doesn’t shut off.

Thanks for the response.
Since I’m using Chat Completions API, I can try to add a stop sequence.
But turning off streaming is not an option for me. And not sure if it’s possible to trim the response when I make an API call using this:
async with

self.async_openai_client.beta.chat.completions.stream(
    model=self._llm.model,
    temperature=self._llm.temperature,
    response_format=output_model_cls,
    messages=messages,
    stop=["}\n"],
) as stream:
    async for event in stream:
        if (
            event.type == "content.delta"
            and event.delta is not None
        ):

When the issue occurs, it hangs for at least 5mn until the exception openai.LengthFinishReasonError is raised, and therefore I cannot even access the (partial?) output at all.
"strict": true is always set when I use response_format with json_schema, by the way.

Hi @OpenAI_Support
Adding stop sequence [“}\n”] does not work for me.
Would appreciate if there is an actual fix soon as the issue affects our customers.

You can send the parameter max_tokens so that the termination of sequence generation occurs earlier than simply the model’s maximum output or maximum context window. Budget just beyond the maximum length a valid output would ever produce.

For a stop sequence on Chat Completions, the suggestion by OpenAI staff is also bad, as a stop sequence is stripped from the output besides terminating. Removing ["}\n"] from the tail would break the JSON and leave it unclosed. There is no separate “stop” finish_reason for your own stop sequence vs the internal one.

Instead, you can set "stop": to ["\n\n\n", "\t\t"] to catch the start of a linefeed or tab loop, whitespace that should never be in an AI-written JSON.

You can break any repetitive pattern by increasing frequency_penalty. Every token produced demotes that same token in the future. This also can change the production from a sequence to something else, hopefully something that terminates output.

The SDK library event-producing beta method hasn’t been maintained. You can code from scratch with httpx making your JSON REST requests, and if it fails, it is because of you, your issue-handling, and not because of a library you can do nothing about.

1 Like