Zhixidong reported on August 9th that OpenAI announced on August 6th the addition of structured output functionality to its API, marking that OpenAI can now accurately generate output results that meet the requirements based on the JSON schema provided by developers. The official also announced that with this feature, the newly launched gpt-4o-2024-08-06 model achieved a 100% accuracy rate in the evaluation, perfectly matching the expected output pattern.

JSON (JavaScript Object Notation) is a lightweight data interchange format that is easy to read and write for humans, and also easy for machines to parse and generate. Last year at DevDay, OpenAI introduced JSON schemas to help developers build applications. The new structured output feature will further improve the model's conflict with JSON, thereby reducing the "hallucination" phenomenon and reducing output costs.

Advertisement

I. Structured Output Feature API Launched, JSON Accuracy 100%

Obtaining structured output data from unstructured input is one of the core use cases of AI in today's applications. Developers have used the OpenAI API to build powerful assistants that can retrieve data, answer questions, extract structured data, and build multi-step intelligent workflows through function calls, enabling large language models (LLMs) to act effectively. For a long time, developers have resolved the limitations of LLMs in this area through open source tools, prompt engineering, and repeated requests to ensure that the model's output meets the system's requirements. Structured output has solved this problem well by training the model to better understand complex patterns and allowing the OpenAI model to follow the patterns provided by developers.

OpenAI tracked and evaluated complex JSON schemas, and the new model with structured output, gpt-4o-2024-08-06, scored a perfect 100%; in contrast, gpt-4-0613 scored less than 40%.

OpenAI announced that the structured output feature is now officially available in the API. All models that support function calls can implement this feature, including the latest gpt-4o and gpt-4o-mini models, as well as fine-tuned models such as gpt-4-0613 and gpt-3.5-turbo-0613. This feature can be used in the Chat Completions API, Assistants API, and Batch API, and is compatible with visual input.

II. Dual Innovation: Structured Output Joins Native SDKOpenAI has introduced structured output capabilities in two new forms within its API:

1. Function Calls: Structured output can be obtained through tools by setting "strict: true" in the function definition. This feature is available for all models that support tools, including gpt-4-0613 and gpt-3.5-turbo-0613 and higher versions. Once structured output is enabled, the model's output will match the provided tool definition.

2. New Option for the "response_format" Parameter: Developers can now provide a JSON schema through structured input, which is a new option for the "response_format" parameter. This will be more efficient when the model responds to users in a structured manner instead of invoking tools. This feature is available for OpenAI's latest GPT-4o models: the gpt-4o-2024-08-06 released today and the gpt-4o-mini-2024-07-18. When the "response_format" is provided with the "strict: true" parameter, the model's output will match the provided schema.

OpenAI's Python and Node SDKs have been updated to provide native support for structured output. By providing the schema as a tool or response format, developers can easily use "Pydantic" or "Zod" objects, and OpenAI's SDK will handle the data type conversion to supported JSON schemas, automatically deserialize the JSON response into typed data structures, and analyze rejections when they occur.

The following example demonstrates the native support for structured output using function calls:

Developers often use OpenAI's models to generate structured data for various use cases.

Some other examples include:Translate the following article into English:

1. Dynamically generate the user interface based on user intent.

2. Separate the final answer from supporting reasoning or additional comments.

3. Extract structured data from unstructured data.

The new structured output feature will strictly adhere to OpenAI's safety policies while still allowing the model to refuse unsafe requests. A new rejection string value has been added to the API response, allowing developers to programmatically detect whether the model has generated a rejection instruction.

When the response does not contain a rejection and the model's response has not been prematurely interrupted, the model will reliably generate a valid JSON schema that matches the provided schema.

III. Technical Revelation: The Dual Constraint Mechanism of Structured Output

OpenAI has adopted a dual approach to improve the reliability of model output matching with JSON schema. First, OpenAI trained its latest model, GPT-4o-2024-08-06, to understand complex schemas and generate the most suitable output.

Secondly, OpenAI has adopted constrained decoding technology. Although the model's performance has significantly improved, achieving 93% accuracy in benchmark tests, the inherent uncertainty of the model still exists. To ensure the robustness of developers' application construction, OpenAI provides a deterministic method to constrain the model's output, achieving 100% reliability.

By default, the model is not constrained when generating output and may choose any token from the vocabulary. This flexibility may lead to the model generating invalid JSON characters. To avoid this error, OpenAI uses a dynamic constrained decoding method, forcing the model to only choose valid tokens that conform to the provided schema. After generating each token, the inference engine determines the next valid token based on the previous token and schema rules. This method ensures that the generated output always conforms to the provided schema by shielding the possibility of invalid tokens.

Implementing this constraint can be challenging because valid tokens are dynamically changing throughout the model's output. For example, the initial valid tokens include {, {", and {\n, but once the model generates {"val, then { is no longer a valid token. Therefore, OpenAI needs to dynamically determine which tokens are valid after generating each token, rather than pre-determining them at the beginning of the response.For this purpose, OpenAI converts the provided JSON schema into a context-free grammar (CFG). A CFG is a set of rules that define the valid syntax of a language, and developers can regard JSON and JSON schemas as a specific language with rules. Just as a sentence in English without a verb is invalid, a JSON with a trailing comma is also invalid.

Therefore, for each JSON schema, OpenAI calculates the syntax that represents the schema and efficiently accesses the pre-processed components during the sampling process. This is why the first use of a new schema may require additional processing time — OpenAI must preprocess the schema to generate components that can be effectively used in sampling.

During sampling, OpenAI's inference engine determines the next generated token based on previously generated tokens and syntax rules (indicating whether the next token is valid). OpenAI uses a list of tokens to mask invalid tokens, reducing the probability of invalid tokens to 0. Since the schema has been pre-processed, OpenAI can efficiently perform this operation using cached data structures while minimizing latency overhead.

In addition to the CFG method, finite state machines (FSMs) or regular expressions can also be used for constrained decoding. They function similarly, dynamically updating which tokens are valid after generating each token. However, it is worth noting that the CFG method can express a broader range of language categories than FSMs. For simple schemas, this difference may not be apparent. However, OpenAI has found that the CFG method performs better for complex schemas involving nested or recursive data structures. For example, FSMs typically cannot express recursive types, so it may be difficult to match parentheses in deeply nested JSON, while JSON schemas that support recursive patterns have been implemented on the OpenAI API, but the FSM method is difficult to handle.

Please note that each UI element can have arbitrary child elements, which recursively reference the root schema. This flexibility is provided by the CFG method.

IV. Boundary Exploration: Partial Limitations of Structured Output

While enjoying the efficiency and accuracy brought by structured output, we also need to be aware of its limitations to better guide development and application. When using structured output, developers should pay attention to the following limitations:

Supported subset of JSON schemas: Structured output only supports a subset of JSON schemas to ensure optimal performance. Specific limitations can be referred to in OpenAI's official documentation.

Initial request delay: The first request for a new schema may bring additional latency, but subsequent responses will be fast without latency loss. During the first request, OpenAI processes and caches the schema for subsequent use. Typically, the schema can be processed within 10 seconds during the first request, but more complex schemas may take up to a minute.Refusing Unsafe Requests: If the model opts to refuse unsafe requests, it may not follow the pattern. When a refusal is chosen, the return message will include a rejection boolean to indicate this.

Stop Condition Limitations: If the maximum token count or other stop conditions are reached during the generation process, the model may not be able to follow the pattern.

Model Errors: Structured outputs cannot completely prevent all model errors. For instance, the model may still make mistakes in the values of a JSON object (such as mathematical calculation errors). Developers can reduce errors by providing examples in the system instructions or by breaking the task into simpler subtasks.

Incompatibility with Parallel Function Calls: When generating parallel function calls, they may not conform to the provided pattern. Developers can disable parallel function calls by using the "parallel_tool_calls: false" setting.

Not Eligible for Zero Data Retention (ZDR): Structured outputs with JSON schema are not eligible for Zero Data Retention (ZDR).

Conclusion: The introduction of structured outputs aids in cost reduction and efficiency enhancement.

Compared to the gpt-4o-2024-05-13 version, the gpt-4o-2024-08-06 version offers more cost advantages. Developers can save 50% on input costs, with each million input tokens equivalent to $2.50 (approximately RMB 17.95); output costs are reduced by 33%, with each million output tokens equivalent to $10.00 (approximately RMB 71.81).