How To Considerably Improve LLMs by Leveraging Context Engineering

is the science of offering LLMs with the proper context to maximise efficiency. Once you work with LLMs, you usually create a system immediate, asking the LLM to carry out a sure job. Nevertheless, when working with LLMs from a programmer’s perspective, there are extra components to think about. It’s important to decide what different knowledge you may feed your LLM to enhance its skill to carry out the duty you requested it to do.

On this article, I’ll focus on the science of context engineering and how one can apply context engineering methods to enhance your LLM’s efficiency.

Context engineering. LLM — On this article, I focus on context engineering: The science of offering the proper context on your LLMs. Accurately using context engineering can considerably enhance the efficiency of your LLM. Picture by ChatGPT.

You may also learn my articles on Reliability for LLM Purposes and Doc QA utilizing Multimodal LLMs

Desk of Contents

Definition

Earlier than I begin, it’s necessary to outline the time period context engineering. Context engineering is actually the science of deciding what to feed into your LLM. This could, for instance, be:

The system immediate, which tells the LLM methods to act
Doc knowledge fetch utilizing RAG vector search
Few-shot examples
Instruments

The closest earlier description of this has been the time period immediate engineering. Nevertheless, immediate engineering is a much less descriptive time period, contemplating it implies solely altering the system immediate you might be feeding to the LLM. To get most efficiency out of your LLM, it’s a must to take into account all of the context you might be feeding into it, not solely the system immediate.

Motivation

My preliminary motivation for this text got here from studying this Tweet by Andrej Karpathy.

+1 for “context engineering” over “immediate engineering”.

Individuals affiliate prompts with brief job descriptions you’d give an LLM in your day-to-day use. When in each industrial-strength LLM app, context engineering is the fragile artwork and science of filling the context window… https://t.co/Ne65F6vFcf

— Andrej Karpathy (@karpathy) June 25, 2025

I actually agreed with the purpose Andrej made on this tweet. Immediate engineering is certainly an necessary science when working with LLMs. Nevertheless, immediate engineering doesn’t cowl every little thing we enter into LLMs. Along with the system immediate you write, you even have to think about components akin to:

Which knowledge must you insert into your immediate
How do you fetch that knowledge
solely present related data to the LLM
And so on.

I’ll focus on all of those factors all through this text.

API vs Console utilization

One necessary distinction to make clear is whether or not you might be utilizing the LLMs from an API (calling it with code), or by way of the console (for instance, by way of the ChatGPT web site or software). Context engineering is certainly necessary when working with LLMs by way of the console; nonetheless, my focus on this article will probably be on API utilization. The explanation for that is that when utilizing an API, you’ve extra choices for dynamically altering the context you might be feeding the LLM. For instance, you are able to do RAG, the place you first carry out a vector search, and solely feed the LLM crucial bits of data, slightly than all the database.

These dynamic adjustments aren’t out there in the identical approach when interacting with LLMs by way of the console; thus, I’ll deal with utilizing LLMs by way of an API.

Context engineering methods

Zero-shot prompting

Zero-shot prompting is the baseline for context engineering. Doing a job zero-shot means the LLM is performing a job it hasn’t seen earlier than. You might be basically solely offering a job description as context for the LLM. For instance, offering an LLM with a protracted textual content and asking it to categorise the textual content into class A or B, based on some definition of the courses. The context (immediate) you might be feeding the LLM might look one thing like this:

You might be an knowledgeable textual content classifier, and tasked with classifying texts into
class A or class B. 
- Class A: The textual content incorporates a constructive sentiment
- Class B: The following incorporates a damaging sentiment

Classify the textual content: {textual content}

Relying on the duty, this might work very nicely. LLMs are generalists and are in a position to carry out simplest text-based duties. Classifying a textual content into considered one of two courses will normally be a easy job, and zero-shot prompting will thus normally work fairly nicely.

Few-shot prompting

This infographic highlights methods to carry out few-shot prompting:

The follow-up from zero-shot prompting is few-shot prompting. With few-shot prompting, you present the LLM with a immediate much like the one above, however you additionally present it with examples of the duty it’s going to carry out. This added context will assist the LLM enhance at performing the duty. Following up on the immediate above, a few-shot immediate might appear to be:

You might be an knowledgeable textual content classifier, and tasked with classifying texts into
class A or class B. 
- Class A: The textual content incorporates a constructive sentiment
- Class B: The following incorporates a damaging sentiment


{textual content 1} -> Class A


{textual content 2} -> class B


Classify the textual content: {textual content}

You may see I’ve supplied the mannequin some examples wrapped in tags. I’ve mentioned the subject of making strong LLM prompts in my article on LLM reliability beneath:

Few-shot prompting works nicely since you are offering the mannequin with examples of the duty you might be asking it to carry out. This normally will increase efficiency.

You may think about this works nicely on people as nicely. For those who ask a human a job they’ve by no means carried out earlier than, simply by describing the duty, they could carry out decently (in fact, relying on the issue of the duty). Nevertheless, for those who additionally present the human with examples, their efficiency will normally enhance.

Total, I discover it helpful to consider LLM prompts as if I’m asking a human to carry out a job. Think about as an alternative of prompting an LLM, you merely present the textual content to a human, and also you ask your self the query:

Given this immediate, and no different context, will the human have the ability to carry out the duty?

If the reply isn’t any, it’s best to work on clarifying and enhancing your immediate.

I additionally need to point out dynamic few-shot prompting, contemplating it’s a method I’ve had quite a lot of success with. Historically, with few-shot prompting, you’ve a set listing of examples you feed into each immediate. Nevertheless, you may usually obtain greater efficiency utilizing dynamic few-shot prompting.

Dynamic few-shot prompting means deciding on the few-shot examples dynamically when creating the immediate for a job. For instance, in case you are requested to categorise a textual content into courses A and B, and you have already got a listing of 200 texts and their corresponding labels. You may then carry out a similarity search between the brand new textual content you might be classifying and the instance texts you have already got. Persevering with, you may measure the vector similarity between the texts and solely select essentially the most related texts (out of the 200 texts) to feed into your immediate as context. This fashion, you’re offering the mannequin with extra related examples of methods to carry out the duty.

RAG

Retrieval augmented technology is a well known method for rising the data of LLMs. Assume you have already got a database consisting of hundreds of paperwork. You now obtain a query from a consumer, and need to reply it, given the data inside your database.

Sadly, you may’t feed all the database into the LLM. Although we now have LLMs akin to Llama 4 Scout with a 10-million context size window, databases are normally a lot bigger. You due to this fact have to seek out essentially the most related data within the database to feed into your LLM. RAG does this equally to dynamic few-shot prompting:

Carry out a vector search
Discover essentially the most related paperwork to the consumer query (most related paperwork are assumed to be most related)
Ask the LLM to reply the query, given essentially the most related paperwork

By performing RAG, you might be doing context engineering by solely offering the LLM with essentially the most related knowledge for performing its job. To enhance the efficiency of the LLM, you may work on the context engineering by enhancing your RAG search. This could, for instance, be carried out by enhancing the search to seek out solely essentially the most related paperwork.

You may learn extra about RAG in my article about creating a RAG system on your private knowledge:

Instruments (MCP)

You may also present the LLM with instruments to name, which is a crucial a part of context engineering, particularly now that we see the rise of AI brokers. Software calling at this time is usually carried out utilizing Mannequin Context Protocol (MCP), an idea began by Anthropic.

AI brokers are LLMs able to calling instruments and thus performing actions. An instance of this may very well be a climate agent. For those who ask an LLM with out entry to instruments in regards to the climate in New York, it will be unable to supply an correct response. The explanation for that is naturally that details about the climate must be fetched in actual time. To do that, you may, for instance, give the LLM a software akin to:

@software
def get_weather(metropolis):
    # code to retrieve the present climate for a metropolis
    return climate

For those who give the LLM entry to this software and ask it in regards to the climate, it may possibly then seek for the climate for a metropolis and give you an correct response.

Offering instruments for LLMs is extremely necessary, because it considerably enhances the skills of the LLM. Different examples of instruments are:

Search the web
A calculator
Search by way of Twitter API

Matters to think about

On this part, I make a number of notes on what it’s best to take into account when creating the context to feed into your LLM

Utilization of context size

The context size of an LLM is a crucial consideration. As of July 2025, you may feed most frontier mannequin LLMs with over 100,000 enter tokens. This supplies you with quite a lot of choices for methods to make the most of this context. It’s important to take into account the tradeoff between:

Together with quite a lot of data in a immediate, thus risking a number of the data getting misplaced within the context
Lacking some necessary data within the immediate, thus risking the LLM not having the required context to carry out a particular job

Normally, the one approach to determine the steadiness, is to check your LLMs efficiency. For instance with a classificaition job, you may verify the accuracy, given completely different prompts.

If I uncover the context to be too lengthy for the LLM to work successfully, I typically break up a job into a number of prompts. For instance, having one immediate summarize a textual content, and a second immediate classifying the textual content abstract. This may help the LLM make the most of its context successfully and thus enhance efficiency.

Moreover, offering an excessive amount of context to the mannequin can have a major draw back, as I describe within the subsequent part:

Context rot

Final week, I learn an fascinating article about context rot. The article was about the truth that rising the context size lowers LLM efficiency, though the duty problem doesn’t enhance. This means that:

Offering an LLM irrelevant data, will lower its skill to carry out duties succesfully, even when job problem doesn’t enhance

The purpose right here is actually that it’s best to solely present related data to your LLM. Offering different data decreases LLM efficiency (i.e., efficiency isn’t impartial to enter size)

Conclusion

On this article, I’ve mentioned the subject of context engineering, which is the method of offering an LLM with the appropriate context to carry out its job successfully. There are quite a lot of methods you may make the most of to replenish the context, akin to few-shot prompting, RAG, and instruments. These are all highly effective methods you should utilize to considerably enhance an LLM’s skill to carry out a job successfully. Moreover, you even have to think about the truth that offering an LLM with an excessive amount of context additionally has downsides. Growing the variety of enter tokens reduces efficiency, as you would examine within the article about context rot.

👉 Comply with me on socials:

🧑‍💻 Get in contact
🔗 LinkedIn
🐦 X / Twitter
✍️ Medium
🧵 Threads

How To Considerably Improve LLMs by Leveraging Context Engineering

Construct an AI-powered automated summarization system with Amazon Bedrock and Amazon Transcribe utilizing Terraform

Past accelerators: Classes from constructing basis fashions on AWS with Japan’s GENIAC program

Past accelerators: Classes from constructing basis fashions on AWS with Japan’s GENIAC program

Leave a Reply Cancel reply

Popular News

How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

Diffusion Mannequin from Scratch in Pytorch | by Nicholas DiSalvo | Jul, 2024

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

Proton launches ‘Privacy-First’ AI Email Assistant to Compete with Google and Microsoft

Streamlit fairly styled dataframes half 1: utilizing the pandas Styler

About Us

Category

Recent Posts