have entered the world of pc science at a document tempo. LLMs are highly effective fashions able to successfully performing all kinds of duties. Nonetheless, LLM outputs are stochastic, making them unreliable. On this article, I focus on how one can guarantee reliability in your LLM purposes by correctly prompting the mannequin and dealing with the output.

You can too learn my articles on Attending NVIDIA GTC Paris 2025 and Creating Highly effective Embeddings for Machine Studying.
Desk of Contents
Motivation
My motivation for this text is that I’m constantly growing new purposes utilizing LLMs. LLMs are generalized instruments that may be utilized to most text-dependent duties reminiscent of classification, summarization, data extraction, and rather more. Moreover, the rise of imaginative and prescient language fashions additionally allow us to deal with pictures just like how we deal with textual content.
I typically encounter the issue that my LLM purposes are inconsistent. Typically the LLM doesn’t reply within the desired format, or I’m unable to correctly parse the LLM response. This can be a large drawback if you find yourself working in a manufacturing setting and are absolutely depending on consistency in your utility. I’ll thus focus on the methods I take advantage of to make sure reliability for my purposes in a manufacturing setting.
Guaranteeing output consistency
Markup tags
To make sure output consistency, I take advantage of a way the place my LLM solutions in markup tags. I take advantage of a system immediate like:
immediate = f"""
Classify the textual content into "Cat" or "Canine"
Present your response in tags
"""
And the mannequin will nearly at all times reply with:
Cat
or
Canine
Now you can simply parse out the response utilizing the next code:
def _parse_response(response: str):
return response.break up("")[1].break up(" ")[0]
The rationale utilizing markup tags works so nicely is that that is how the mannequin is educated to behave. When OpenAI, Qwen, Google, and others practice these fashions, they use markup tags. The fashions are thus tremendous efficient at using these tags and can, in nearly all instances, adhere to the anticipated response format.
For instance, with reasoning fashions, which have been on the rise these days, the fashions first do their considering enclosed in
Moreover, I additionally attempt to use as many markup tags as doable elsewhere in my prompts. For instance, if I’m offering a couple of shot examples to my mannequin, I’ll do one thing like:
immediate = f"""
Classify the textual content into "Cat" or "Canine"
Present your response in tags
That is a picture exhibiting a cat -> Cat
That is a picture exhibiting a canine -> Canine
"""
I do two issues that assist the mannequin carry out right here:
- I present examples in
tags. - In my examples, I guarantee to stick to my very own anticipated response format, utilizing the
Utilizing markup tags, you possibly can thus guarantee a excessive degree of output consistency out of your LLM
Output validation
Pydantic is a instrument you should use to make sure and validate the output of your LLMs. You’ll be able to outline varieties and validate that the output of the mannequin adheres to the sort we anticipate. For instance, you possibly can comply with the instance beneath, based mostly on this text:
from pydantic import BaseModel
from openai import OpenAI
shopper = OpenAI()
class Profile(BaseModel):
title: str
electronic mail: str
cellphone: str
resp = shopper.chat.completions.create(
mannequin="gpt-4o",
messages=[
{
"role": "user",
"content": "Return the `name`, `email`, and `phone` of user {user} in a json object."
},
]
)
Profile.model_validate_json(resp.selections[0].message.content material)
As you possibly can see, we immediate GPT to reply with a JSON object, and we then run Pydantic to make sure the response is as we anticipate.
I might additionally like to notice that typically it’s simpler to easily create your personal output validation perform. Within the final instance, the one necessities for the response object are primarily that the response object comprises the keys title, electronic mail, and cellphone, and that each one of these are of the string kind. You’ll be able to validate this in Python with a perform:
def validate_output(output: str):
assert "title" in output and isinstance(output["name"], str)
assert "electronic mail" in output and isinstance(output["email"], str)
assert "cellphone" in output and isinstance(output["phone"], str)
With this, you wouldn’t have to put in any packages, and in a variety of instances, it’s simpler to arrange.
Tweaking the system immediate
You can too make a number of different tweaks to your system immediate to make sure a extra dependable output. I at all times advocate making your immediate as structured as doable, utilizing:
- Markup tags as talked about earlier
- Lists, such because the one I’m writing in right here
On the whole, you must also at all times guarantee clear directions. You need to use the next to make sure the standard of your immediate
For those who gave the immediate to a different human, that had by no means seen the duty earlier than, and with no prior information of the duty. Would the human be capable to carry out the duty successfully?
For those who can not have a human do the duty, you often can not anticipate an AI to do it (at the very least for now).
Dealing with errors
Errors are inevitable when coping with LLMs. For those who carry out sufficient API calls, it’s nearly sure that typically the response is not going to be in your required format, or one other challenge.
In these eventualities, it’s vital that you’ve got a sturdy utility outfitted to deal with such errors. I take advantage of the next methods to deal with errors:
- Retry mechanism
- Improve the temperature
- Have backup LLMs
Now, let me elaborate on every level.
Exponential backoff retry mechanism
It’s vital to have a retry mechanism in place, contemplating a variety of points can happen when making an API name. You would possibly encounter points reminiscent of fee limiting, incorrect output format, or a sluggish response. In these eventualities, you have to guarantee to wrap the LLM name in a try-catch and retry. Often, it’s additionally good to make use of an exponential backoff, particularly for rate-limiting errors. The rationale for that is to make sure you wait lengthy sufficient to keep away from additional rate-limiting points.
Temperature improve
I additionally typically advocate rising the temperature a bit. For those who set the temperature to 0, you inform the mannequin to behave deterministically. Nonetheless, typically this may have a unfavorable impact.
For instance, if in case you have an enter instance the place the mannequin failed to reply within the correct output format. For those who retry this utilizing a temperature of 0, you’re more likely to simply expertise the identical challenge. I thus advocate you set the temperature to a bit greater, for instance 0.1, to make sure some stochasticness within the mannequin, whereas additionally making certain its outputs are comparatively deterministic.
This is identical logic that a variety of brokers use: a better temperature.
They should keep away from being stuch in a loop. Having a better temperature may help them keep away from repetitive errors.
Backup LLMs
One other highly effective technique to take care of errors is to have backup LLMs. I like to recommend utilizing a sequence of LLM suppliers for all of your API calls. For instance, you first strive OpenAI, if that fails, you employ Gemini, and if that fails, you should use Claude.
This ensures reliability within the occasion of provider-specific points. These may very well be points reminiscent of:
- The server is down (for instance, if OpenAI’s API is just not obtainable for a time frame)
- Filtering (typically, an LLM supplier will refuse to reply your request if it believes your request is in violation of jailbreak insurance policies or content material moderation)
On the whole, it’s merely good apply to not be absolutely depending on one supplier.
Conclusion
On this article, I’ve mentioned how one can guarantee reliability in your LLM utility. LLM purposes are inherently stochastic since you can not immediately management the output of an LLM. It’s thus vital to make sure you have correct insurance policies in place, each to reduce the errors that happen and to deal with the errors after they happen.
I’ve mentioned the next approaches to reduce errors and deal with errors:
- Markup tags
- Output validation
- Tweaking the system immediate
- Retry mechanism
- Improve the temperature
- Have backup LLMs
For those who mix these methods into your utility, you possibly can obtain each a robust and strong LLM utility.
👉 Observe me on socials:
🧑💻 Get in contact
🌐 Private Weblog
🔗 LinkedIn
🐦 X / Twitter
✍️ Medium
🧵 Threads