Device Calling, Defined: How AI Brokers Determine What to Do Subsequent

In my newest submit, learn how to get structured, machine-readable outputs as a response from an LLM, utilizing JSON Mode, perform calling, and structured outputs. In that submit, we briefly touched on the thought of perform calling, approaching it as a way for acquiring structured responses. Nonetheless, perform calling is one thing that goes properly past simply getting structured information again from a mannequin, since it’s primarily the spine of agentic AI workflows. So, in at the moment’s submit, we’re going to take a better have a look at precisely this subject.

In all the examples we’ve got lined to this point, the LLM is simply used as a passive responder, that means it receives a query after which generates a solution, and that’s it. However what if we would like the LLM not simply to reply with one thing however as an alternative to do one thing? Or to place it extra exactly, what if we would like an motion to be triggered primarily based on the mannequin’s response? This motion could also be something: search for into reside information, ship a message, question a database, name an exterior API, and so forth.

That is made doable with device calling. Device calling is what transforms an LLM from a really sensible textual content generator into one thing that may really set off actions and work together with the world round it.

So, let’s have a look!

What’s Device Calling?

Device calling (additionally known as perform calling) is the mechanism by which an LLM can request the execution of exterior capabilities or APIs as a part of producing its response. In different phrases, as an alternative of simply returning textual content, the mannequin can execute a particular perform with particular arguments, as a response to the person’s request.

The important thing factor to grasp right here is that the mannequin itself doesn’t execute the device. It solely decides which device to name and with what arguments. The precise execution of the chosen device occurs in our personal code, wherein the request to the AI mannequin is included. We then feed the device’s end result again to the AI mannequin, which makes use of it to generate a ultimate response to the person.

That is the device calling loop, which incorporates the next steps:

The person submits a message
The AI mannequin takes the message as enter and produces an output, which is actually a choice on which device to utilise and with which arguments
The mannequin’s response containing the device choice and respective arguments for use is handed again to the code. The code – with no involvement of the AI mannequin – executes the chosen device with the chosen arguments. This execution produces some type of end result (e.g., a calculation, data obtained from an API, and many others.), and this result’s then handed again to the AI mannequin.
The AI mannequin takes as enter the results of the device and produces a ultimate response to the person primarily based on that.

Once more, the mannequin generates a device name, not a device execution. The 2 are very various things, and conflating them is among the commonest sources of confusion.

However what precisely is a device name? In apply, it implies that the mannequin returns a structured, machine-readable response utilizing Perform Calling, as we noticed within the earlier submit. On this response, content material is None; there isn’t any pure language reply, only a structured instruction indicating which device to name and with what arguments. It is just after we execute the device and move the end result again that the mannequin generates an precise textual content response for the person.

However let’s see this in apply!

We’ll begin with a easy instance utilizing only one device and one name, after which progressively construct as much as some extra attention-grabbing situations.

1. A single device: climate API

I believe that the commonest instance of device use with AI that involves thoughts is a climate API (the cornerstone of customized, reside information), so let’s think about we’re constructing a climate assistant. Specifically, we wish to create a mechanism wherein the person asks concerning the climate, and as an alternative of simply letting the AI mannequin make one thing up (which the mannequin would very fortunately do 🙃), we would like it to name an actual climate perform and get precise information concerning the climate from elsewhere, exterior the LLM. To get the climate information, I will probably be utilizing Open-Meteo, a free, open-source climate API that fortunately requires no API key.

To make use of a device, we’ve got to initially declare it in instruments.

from openai import OpenAI
import json

consumer = OpenAI(api_key="your_api_key")

# Step 1: outline the device
instruments = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a given city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The name of the city, e.g. Athens"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The temperature unit to make use of"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

Discover how the precise device for use (the climate API) is talked about nowhere up up to now. As a substitute, the mannequin decides which device to name primarily based on three issues: the perform description (“Get the present climate for a given metropolis”), the parameter descriptions (“The identify of the town, e.g., Athens”), and the enforced schema. It’s purely from this data that the mannequin figures out whether or not that is the correct device to name for a given person message and with what arguments. Thus, writing clear and correct descriptions when defining our instruments is of key significance for the mannequin to efficiently determine and name the correct device primarily based on the person’s enter.

So, after we’ve got outlined the instruments variable, we will then make a request to the AI mannequin:

# Step 2: ship the person message together with the device definition
messages = [
    {"role": "user", "content": "What's the weather like in Athens right now?"}
]

response = consumer.chat.completions.create(
    mannequin="gpt-4o-mini",
    instruments=instruments,
    messages=messages
)

print(response.decisions[0].message)

Right here’s what occurs after we make this request. The mannequin reads the person’s message, “What’s the climate like in Athens proper now?”, and understands that the accessible device get_current_weather will help reply this question with actual, reside information. So, fairly than producing a textual content response immediately, it decides to name the device first. Extra particularly, the mannequin’s response at this level seems like this:

ChatCompletionMessage(
    content material=None,
    function='assistant',
    tool_calls=[
        ChatCompletionMessageToolCall(
            id='call_abc123',
            type='function',
            function=Function(
                name='get_current_weather',
                arguments='{"city": "Athens", "unit": "celsius"}'
            )
        )
    ]
)

Discover how content material is None, as a result of the mannequin isn’t returning a textual content response, however a device name. Now it’s our job to really execute the device, the mannequin chosen, and return the end result again to it. In our case, that is going to be making the API request to the climate API, utilizing the arguments (that’s, the town and unit of measurement) offered within the AI mannequin’s response:

# Step 3: execute the device utilizing the Open-Meteo API
import requests

def get_current_weather(metropolis: str, unit: str = "celsius"):
    # geocode the town identify to coordinates
    geo = requests.get(
        "https://geocoding-api.open-meteo.com/v1/search",
        params={"identify": metropolis, "rely": 1}
    ).json()
    lat = geo["results"][0]["latitude"]
    lon = geo["results"][0]["longitude"]

    # fetch present climate
    climate = requests.get(
        "https://api.open-meteo.com/v1/forecast",
        params={
            "latitude": lat,
            "longitude": lon,
            "present": "temperature_2m,weather_code",
            "temperature_unit": unit
        }
    ).json()

    temp = climate["current"]["temperature_2m"]
    return {"metropolis": metropolis, "temperature": temp, "unit": unit}

# extract the device name from the response
tool_call = response.decisions[0].message.tool_calls[0]
arguments = json.masses(tool_call.perform.arguments)

# name the precise perform
weather_result = get_current_weather(**arguments)

we will then append the device’s end result to the message historical past after which ship the whole lot again to the mannequin:

# Step 4: add the assistant's device name AND the device end result to the message historical past
messages.append(response.decisions[0].message)  # essential: append the device name first
messages.append({
    "function": "device",
    "tool_call_id": tool_call.id,  # hyperlinks the end result again to the particular device name
    "content material": json.dumps(weather_result)
})

# Step 5: ship the whole lot again to the mannequin for a ultimate response
final_response = consumer.chat.completions.create(
    mannequin="gpt-4o-mini",
    instruments=instruments,
    messages=messages
)

print(final_response.decisions[0].message.content material)

And now, we lastly get a correct textual content response:

It is at present 29°C in Athens. Appears like a fantastic day to be exterior!

🍨 DataCream is a publication providing tales and tutorials on AI, information, and tech. In case you are fascinated about these matters, subscribe right here!

2. Letting the mannequin select from a number of instruments

Now let’s check out a extra lifelike instance. In a real-world agentic utility, the mannequin sometimes has entry to not one, however a number of instruments, and consequently, it wants to determine which one (or ones) must be used primarily based on what the person is asking.

Let’s lengthen our preliminary climate API instance by including an extra device for currencies. For this, we’ll use Frankfurter, a forex API offering European Central Financial institution every day charges, once more with no API key requirement. So, let’s replace our instruments variable by including a second device for changing currencies:

instruments = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather for a given city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "The name of the city"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["city"]
            }
        }
    },
    {
        "sort": "perform",
        "perform": {
            "identify": "convert_currency",
            "description": "Convert an quantity from one forex to a different",
            "parameters": {
                "sort": "object",
                "properties": {
                    "quantity": {"sort": "quantity", "description": "The quantity to transform"},
                    "from_currency": {"sort": "string", "description": "The supply forex code, e.g. USD"},
                    "to_currency": {"sort": "string", "description": "The goal forex code, e.g. EUR"}
                },
                "required": ["amount", "from_currency", "to_currency"]
            }
        }
    }
]

And in addition arrange the precise convert_currency perform utilizing the Frankfurter API:

def convert_currency(quantity: float, from_currency: str, to_currency: str):
    response = requests.get(
        f"https://api.frankfurter.dev/v2/charge/{from_currency}/{to_currency}"
    ).json()

    charge = response["rate"]
    transformed = spherical(quantity * charge, 2)
    return {
        "quantity": quantity,
        "from_currency": from_currency,
        "to_currency": to_currency,
        "converted_amount": transformed,
        "charge": charge
    }

On this manner, the mannequin can deal with a a lot wider vary of person requests; it may possibly now additionally reply about currencies, on prime of the climate 😋. Now, if the person asks “What’s the climate in Athens?”, the mannequin ought to name get_current_weather. In the event that they ask “How a lot is 100 USD in EUR?”, it ought to name convert_currency. And if we ask one thing irrelevant to each climate and currencies for which neither of the accessible instruments will help, the mannequin will merely reply in textual content with out calling any device in any respect.

However let’s see this in motion:

messages = [
    {"role": "user", "content": "How much is 200 USD in EUR?"}
]

response = consumer.chat.completions.create(
    mannequin="gpt-4o-mini",
    instruments=instruments,
    messages=messages
)

tool_call = response.decisions[0].message.tool_calls[0]

Let’s take a look on the response:

print(tool_call.perform.identify)

from which we get convert_currency. So, the mannequin understood that the query “How a lot is 200 USD in EUR?” is related to the convert_currency device. Let’s additionally check out the arguments:

print(tool_call.perform.arguments)

from which we get

'{"quantity": 200, "from_currency": "USD", "to_currency": "EUR"}'

So, the mannequin appropriately identifies convert_currency as the correct device and fills within the acceptable arguments, with out us doing something apart from offering acceptable device descriptions, and the person offering an acceptable message. This actual decision-making mechanism is what makes tool-calling the inspiration of agentic programs.

3. Calling a number of instruments without delay

One other attention-grabbing device calling state of affairs is that many fashions, like gpt-4o, can name a number of instruments in a single response when the person’s request requires it. This is called parallel device calling.

For instance, let’s think about a state of affairs the place the person asks in a single request one thing that requires the usage of each the get_current_weather and convert_currency instruments to acquire the required information:

messages = [
    {"role": "user", "content": "What's the weather in Athens and how much is 100 USD in EUR?"}
]

response = consumer.chat.completions.create(
    mannequin="gpt-4o-mini",
    instruments=instruments,
    messages=messages
)

for tool_call in response.decisions[0].message.tool_calls:
    print(tool_call.perform.identify)
    print(tool_call.perform.arguments)

On this case, the response we get is the next:

get_current_weather
{"metropolis": "Athens"}

convert_currency
{"quantity": 100, "from_currency": "USD", "to_currency": "EUR"}

Discover how each instruments are known as in a single mannequin response. We will then execute the respective instruments with the offered arguments and move again the device outcomes to the mannequin collectively. That is way more environment friendly than sequential calls, and it’s how extra superior brokers deal with multi-part requests.

On my thoughts: So, what makes this agentic?

One factor that has all the time gotten on my nerves is the time period “agentic” being slapped on the whole lot. Brokers, agentic workflows, something originating from the phrase agent may be very horny these days, however as you’ll have already found your self, not the whole lot offered as agentic actually is.

So let’s take a step again and take into consideration what an agent really is within the first place. At its core, an agent is one thing that perceives its atmosphere, processes that data in a roundabout way, has a objective, after which decides what motion to take with the intention to obtain it. Take into consideration what our device calling mechanism is doing: it perceives the instruments accessible, decides which one is acceptable to deal with the person’s request (if any), and passes that call on to the remainder of the code for execution. That, in its easiest kind, is company.

In real-world agentic purposes, the device calling loop runs not one however a number of instances, with the mannequin utilizing the outcomes of 1 device name to determine whether or not, and which, device to name subsequent. That is typically known as a ReAct loop (Cause + Act), and it’s what permits brokers to deal with complicated, multi-step duties that may’t be solved in a single name.

Finally, what I discover most fascinating about device calling is the way it modifications the character of what an LLM is. Up up to now, a language mannequin was primarily a very refined input-output perform, which takes textual content as enter and generates textual content as output. However with the device calling, we achieve entry to an countless assortment of further functionalities, which we will mix with the reasoning energy of the LLM to create programs which might be much more succesful than both alone.

✨ Thanks for studying! ✨

In the event you made it this far, you would possibly discover pialgorithms helpful — a platform we’ve been constructing that helps groups securely handle organizational data in a single place.

Cherished this submit? Be a part of me on 💌Substack and 💼LinkedIn

All photographs by the writer, besides talked about in any other case.

Device Calling, Defined: How AI Brokers Determine What to Do Subsequent

Amazon Bedrock AgentCore harness is now usually obtainable: Go from concept to production-grade agent in minutes

Embed the world: Multimodal AI for searchable aerial imagery at scale

Embed the world: Multimodal AI for searchable aerial imagery at scale

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

How Cursor Really Indexes Your Codebase

Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

Context Engineering — A Complete Fingers-On Tutorial with DSPy

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

About Us

Category

Recent Posts