Structured Outputs with LLMs: JSON Mode, Operate Calling, and When to Use Every

, we’ve talked loads about standard methods for optimizing the efficiency and price of AI purposes, like response streaming or immediate caching. Immediately, I need to discuss one thing a bit completely different however equally vital for constructing actual AI apps. That’s, structured, machine-readable outputs.

To date in a lot of the examples I’ve shared, we’ve been coping with free-text responses from an AI mannequin. The person asks a query, the mannequin responds in pure language, and we simply show that response to the person indirectly. Pretty easy and simple. However what occurs after we want the mannequin to return knowledge in a particular format (e.g., a JSON object) in order that we will additional course of it programmatically in a while? What if we want the mannequin to extract particular fields from a textual content or picture, populate a database entry, or set off a subsequent motion primarily based on its response? In these instances, getting again a wall of textual content gained’t be very handy. 🤔

Fortunately, there are a number of options for this concern. There are two primary approaches for acquiring structured, machine-readable outputs from an LLM: JSON Mode and Operate Calling (additionally referred to as instrument use). These two are sometimes confused with each other (which is to be anticipated since they each take care of structured outputs, duh), however they serve fairly completely different functions. On high of this, OpenAI has launched a stricter variant of Operate Calling referred to as Structured Outputs, which takes schema enforcement one step additional, as we’ll see. On this put up, we’ll take a better take a look at all three, perceive how every one works beneath the hood, and work out when to make use of every.

So, let’s have a look!

1. What’s JSON Mode?

JSON Mode is the less complicated strategy for reaching machine-readable outputs from an LLM. It’s basically a parameter you’ll be able to set in an API request to instruct the mannequin to at all times return a sound JSON object. And that’s actually all there may be to it! Nonetheless, this simplicity comes at a value, since there aren’t any ensures on the construction or schema of the JSON (keep in mind we didn’t outline any schema, area names, or sorts, or something like this), simply that it is going to be legitimate, parseable JSON.

For instance, utilizing OpenAI’s API in Python, we will allow JSON Mode by including the parameter response_format={"kind": "json_object"} to our name to the mannequin. Extra particularly, it could look one thing like this:

from openai import OpenAI

consumer = OpenAI(api_key="your_api_key")

response = consumer.chat.completions.create(
    mannequin="gpt-4o-mini",
    response_format={"kind": "json_object"},
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant. Always respond in JSON format."
        },
        {
            "role": "user",
            "content": "Extract the name, age, and city from this text: 'Maria is 32 years old and lives in Athens.'"
        }
    ]
)

print(response.selections[0].message.content material)

And the response would look one thing like this:

{
  "title": "Maria",
  "age": 32,
  "metropolis": "Athens"
}

And voilà! ✨ With only one easy parameter change, we get a sound JSON again each time. No want for string parsing or unusual regex hacks.

There’s a catch, although. JSON Mode does assure that the output is legitimate JSON, however it does not assure a particular construction. If we run the identical instance a number of instances, we could get barely completely different area names or a barely completely different construction every time. For instance, one run may return "title" , and one other "full_name". That’s an issue if we’re attempting to reliably extract particular fields programmatically.

One other factor is that past setting response_format={"kind": "json_object"}, it’s a good apply to additionally at all times explicitly instruct the mannequin to reply in JSON within the system immediate. Within the instance above, discover how we additionally added “All the time reply in JSON format” within the system immediate. With out this, the mannequin could return a sound JSON typically, however not at all times, since its behaviour could change into unpredictable.

2. What’s Operate Calling?

Operate Calling (or instrument use) is a extra superior strategy for getting structured, machine-readable outputs from an LLM. As an alternative of simply asking the mannequin to format its response as JSON, we outline a particular schema. That’s, we explicitly outline a proper description of the construction we wish the output to observe, and on this method, the mannequin is extra constrained to return knowledge that matches that schema precisely. In different phrases, with Operate Calling we outline upfront what fields we count on, what sorts these fields needs to be, that are required, and which aren’t, and so forth.

Right here’s how the identical extraction instance would look utilizing Operate Calling:

from openai import OpenAI
import json

consumer = OpenAI(api_key="your_api_key")

# outline the schema of the output we count on
instruments = [
    {
        "type": "function",
        "function": {
            "name": "extract_person_info",
            "description": "Extract personal information from a text",
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {
                        "type": "string",
                        "description": "The full name of the person"
                    },
                    "age": {
                        "type": "integer",
                        "description": "The age of the person"
                    },
                    "city": {
                        "type": "string",
                        "description": "The city the person lives in"
                    }
                },
                "required": ["name", "age", "city"]
            }
        }
    }
]

response = consumer.chat.completions.create(
    mannequin="gpt-4o-mini",
    instruments=instruments,
    tool_choice={"kind": "operate", "operate": {"title": "extract_person_info"}},
    messages=[
        {
            "role": "user",
            "content": "Extract the name, age, and city from this text: 'Maria is 32 years old and lives in Athens.'"
        }
    ]
)

# parse the structured output
tool_call = response.selections[0].message.tool_calls[0]
end result = json.hundreds(tool_call.operate.arguments)
print(end result)

And the output would seem like this:

{
  "title": "Maria",
  "age": 32,
  "metropolis": "Athens"
}

The output for this instance with Operate Calling is an identical to the one we received utilizing JSON Mode. Nonetheless, the important thing distinction is that, in contrast to JSON Mode, with Operate Calling, the output goes to be constant; it’s going to at all times observe the precise outlined schema, with constant area names, sorts, and another attributes we outline on it.

🍨 DataCream is a e-newsletter providing tales and tutorials on AI, knowledge, and tech. If you’re inquisitive about these subjects, subscribe right here!

Bonus: A bit extra on Operate Calling

Earlier than shifting on to Structured Outputs, it’s price pausing and elaborating some extra on the unique motivation and use behind Operate Calling, which works properly past simply getting structured outputs. Primarily, the idea of Operate Calling is the inspiration of agentic AI workflows. Extra particularly, in an agentic setup, the LLM is not simply responding to a person’s query, however reasonably it’s deciding which motion to take subsequent primarily based on the person’s enter.

For instance, let’s think about a buyer help assistant that may both lookup an order, concern a refund, or escalate to a human agent, relying on what the person is asking. With Operate Calling, we will outline all three of those candidate actions as “instruments” (features), and the mannequin’s output will outline which one to name and with what arguments primarily based on its enter.

instruments = [
    {
        "type": "function",
        "function": {
            "name": "lookup_order",
            "description": "Look up the status of a customer order",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string", "description": "The order ID"}
                },
                "required": ["order_id"]
            }
        }
    },
    {
        "kind": "operate",
        "operate": {
            "title": "issue_refund",
            "description": "Subject a refund for a buyer order",
            "parameters": {
                "kind": "object",
                "properties": {
                    "order_id": {"kind": "string"},
                    "cause": {"kind": "string"}
                },
                "required": ["order_id", "reason"]
            }
        }
    }
]

response = consumer.chat.completions.create(
    mannequin="gpt-4o-mini",
    instruments=instruments,
    messages=[
        {"role": "user", "content": "I want a refund for order #12345, it arrived broken."}
    ]
)

tool_call = response.selections[0].message.tool_calls[0]
print(tool_call.operate.title)       # "issue_refund"
print(tool_call.operate.arguments)  # '{"order_id": "12345", "cause": "arrived damaged"}'

So, the API response object appears one thing like this:

ChatCompletionMessage(
    content material=None,
    function='assistant',
    tool_calls=[
        ChatCompletionMessageToolCall(
            id='call_abc123',
            type='function',
            function=Function(
                name='issue_refund',
                arguments='{"order_id": "12345", "reason": "arrived broken"}'
            )
        )
    ]
)

And the print statements would hypothetically output:

issue_refund
{"order_id": "12345", "cause": "arrived damaged"}

So, what is occurring right here? The mannequin returns a tool_calls object as an alternative of an everyday textual content response (try howcontent material is None). Contained in the tool_calls object, we will see that the mannequin determined to name issue_refund (not lookup_order), and stuffed within the arguments by itself primarily based on what the person mentioned. We then parse these arguments and execute the precise refund logic in our system.

Discover how the mannequin didn’t simply return the requested knowledge, however reasonably determined which of the candidate actions is essentially the most acceptable to carry out, then stuffed within the acceptable arguments in its response. On this method, we will then take these arguments and truly execute the corresponding motion in our system. That is the true energy of Operate Calling, and it’s why it’s such a foundational element in agentic AI purposes.

However let’s get again to machine-readable outputs now, and we’ll speak extra about agentic AI workflows and Operate Calling in another put up.

3. What about Structured Outputs?

A stricter variation of Operate Calling is Structured Outputs. Even when Operate Calling guides the mannequin to supply an output following an outlined schema, this isn’t actually hard-constrained. In apply, which means some deviations from this outlined schema should happen. Such deviations could also be:

A area marked as required that’s, in actual fact, omitted if the mannequin struggles to determine its worth
Further fields not outlined in our schema are added
A area outlined as integer comes again as a string "32" as an alternative of 32

…and so forth.

This occurs as a result of, in Operate Calling, the mannequin is attempting to observe the schema, however that is nonetheless a best-effort era. Like all LLM output, the output right here remains to be basically tokens being predicted one after the other, with the schema being only a sturdy trace. There’s nonetheless a great probability for that token-by-token era to be derailed someplace alongside the route and produce outputs that deviate from the outlined schema.

Structured Outputs, however, takes Operate Calling one step additional by guaranteeing that each area within the outlined schema will at all times seem within the output precisely as outlined, with no surprises, no lacking or additional fields. The important thing differentiator is that OpenAI makes use of constrained decoding behind the scenes. Which means that at every token step, the mannequin is simply allowed to generate tokens that maintain the output legitimate in line with the schema. In different phrases, the schema is enforced on the era degree, as an alternative of simply being requested via the system immediate.

OpenAI’s Structured Outputs could be activated by merely setting strict: true within the operate definition:

instruments = [
    {
        "type": "function",
        "function": {
            "name": "extract_person_info",
            "strict": True,  # enables Structured Outputs
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "city": {"type": "string"}
                },
                "required": ["name", "age", "city"],
                "additionalProperties": False
            }
        }
    }
]

However once more, this comes at a value. Structured Outputs is out there on GPT-4o and later fashions, with older fashions falling again to JSON mode. Not each JSON construction is supported, and it might be a bit slower since OpenAI preprocesses the outcomes.

Nonetheless, it’s the strictest and most secure technique to implement a particular schema for the mannequin’s outputs with no room for deviation. For manufacturing techniques the place reliability and consistency actually matter, that is usually the most secure possibility.

However aren’t all these the identical factor?

JSON Mode, Operate Calling, and Structured Outputs might sound to do the identical factor, since all of them basically get you JSON again from the mannequin. Nonetheless, as we’ve already seen, they’re meaningfully completely different in what they assure and what they’re designed for. Specifically:

Schema enforcement: JSON Mode returns a sound JSON, however with no structural ensures. Operate Calling returns a sound JSON that matches an outlined schema, following particular area names, sorts, and required fields, however deviations are nonetheless doable. Structured Outputs goes one step additional, imposing that schema on the era degree, rendering deviations unattainable.
Use case: JSON Mode is for instances the place we want a machine-readable response however can stay with a variable format. Operate Calling was primarily designed for instances the place the mannequin must set off an motion or cross arguments to an exterior instrument, thus is actually the final case of machine-readable outputs. Structured Outputs is Operate Calling with a reliability assure, making it superb for manufacturing pipelines the place we want consistency in outputs.
Ease of setup: JSON Mode is the lightest choice to arrange; only a single parameter change with no schema definition. On the flip facet, for Operate Calling and Structured Outputs, we additionally want to consider and arrange the JSON schema.

Having mentioned that, OpenAI itself recommends at all times utilizing Structured Outputs as an alternative of JSON Mode each time doable, as a basic rule of thumb.

On my thoughts

Acquiring machine-readable outputs from LLMs and selecting the suitable strategy for doing so could make an enormous distinction within the reliability and maintainability of any AI utility. Freetext responses are nice for conversational interfaces, however the second our LLM is a element in a bigger system (like feeding knowledge downstream, triggering actions, populating databases, and so forth.), structured responses are important. JSON Mode, Operate Calling, and Structured Outputs can present such outputs, every at a special degree of strictness. Like many choices in AI engineering, the suitable selection is dependent upon what you’re constructing and the way a lot variability you’ll be able to tolerate.

In case you made it this far, you may discover pialgorithms helpful — a platform we’ve been constructing that helps groups securely handle organizational information in a single place.

Liked this put up? Be a part of me on 💌Substack and 💼LinkedIn

All photos by the creator, besides talked about in any other case.

Structured Outputs with LLMs: JSON Mode, Operate Calling, and When to Use Every

Get again hours every single day with autonomous brokers in Amazon Fast

Constructing an Finish-to-Finish Sentiment Evaluation Pipeline with Scikit-LLM

Constructing an Finish-to-Finish Sentiment Evaluation Pipeline with Scikit-LLM

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

How Cursor Really Indexes Your Codebase

Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

Context Engineering — A Complete Fingers-On Tutorial with DSPy

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

About Us

Category

Recent Posts