Prompting for the perfect price-performance

Within the drive to stay aggressive, companies at the moment are turning to AI to assist them decrease price and maximize effectivity. It’s incumbent on them to search out probably the most appropriate AI mannequin—the one that can assist them obtain extra whereas spending much less. For a lot of companies, the migration from OpenAI’s mannequin household to Amazon Nova represents not solely a shift in mannequin however a strategic transfer towards scalability, effectivity, and broader multimodal capabilities.

On this weblog, we focus on find out how to optimize prompting in Amazon Nova for the perfect price-performance.

Why migrate from OpenAI to Amazon Nova?

OpenAI’s fashions stay highly effective, however their operational prices could be prohibitive when scaled. Contemplate these figures from Synthetic Evaluation:

Mannequin	Enter Token Price (per Million Tokens)	Output Token Price (per Million Tokens)	Context Window	Output Velocity (Tokens per Second)	Latency (Seconds per first token)
GPT-4o	~$2.50	~$10.00	As much as 128K tokens	~63	~0.49
GPT-4o Mini	~$0.15	~$0.60	As much as 128K tokens	~90	~0.43
Nova Micro	~$0.035	~$0.14	As much as 128K tokens	~195	~0.29
Nova Lite	~$0.06	~$0.24	As much as 300K tokens	~146	~0.29
Nova Professional	~$0.80	~$3.20	As much as 300K tokens	~90	~0.34

For top-volume purposes—like world buyer help or large-scale doc evaluation—these price variations are disruptive. Not solely does Amazon Nova Professional supply over thrice the cost-efficiency, its longer context window additionally permits it to deal with extra in depth and complicated inputs.

Breaking down the Amazon Nova suite

Amazon Nova isn’t a single mannequin—it’s a set designed for numerous wants:

Amazon Nova Professional – A strong multimodal mannequin that may course of textual content, photographs, and video. It excels at duties reminiscent of doc evaluation and deep information visualization. Benchmark comparisons present Amazon Nova Professional matching and even surpassing GPT-4o on advanced reasoning duties, in response to part 2.1.1 of the Nova technical report and mannequin card.
Amazon Nova Lite – Affords a balanced mixture of multimodal processing and pace. Amazon Nova Lite is good for purposes reminiscent of doc summarization, translation, and even fundamental visible search, delivering high quality outputs at decrease latency and value in comparison with GPT-4o Mini. You could find these benchmark leads to part 2.1.2 of the Nova Technical Report and Mannequin Card.
Amazon Nova Micro – A text-only mannequin engineered for ultra-low latency. With output pace of as much as 195 tokens per second, Amazon Nova Micro is ideal for real-time purposes reminiscent of chat-based assistants and automatic FAQs. Its token prices are dramatically decrease than these of GPT-4o Mini—roughly 4.3 occasions cheaper on a per-token foundation.

The decrease per-token prices and better output per second of Amazon Nova provide the flexibility to simplify prompts for real-time purposes so you’ll be able to stability high quality, pace, and value to your use case.

Understanding the foundations

To make the perfect choice about which mannequin household matches your wants, it’s necessary to grasp the variations in immediate engineering finest practices in each OpenAI and Amazon Nova. Every mannequin household has its personal set of strengths, however there are some issues that apply to each households. Throughout each mannequin households, high quality accuracy is achieved via readability of directions, structured prompts, and iterative refinement. Whether or not you’re utilizing robust output directives or clearly outlined use circumstances, the objective is to cut back ambiguity and enhance response high quality.

The OpenAI method

OpenAI makes use of a layered messaging system for immediate engineering, the place system, developer, and consumer prompts work in concord to regulate tone, security, and output format. Their method emphasizes:

Hierarchical message roles – Setting the mannequin’s function and habits utilizing system messages makes positive that the overarching security and magnificence tips (set in system prompts) are preserved
Instruction placement and delimiters – Directives are positioned originally, with clear separation between context, examples, and queries
Selective chain-of-thought – Detailed, step-by-step reasoning is used when it advantages advanced duties
Formatting and construction – Utilizing robust directives reminiscent of DO, MUST, and DO NOT to offer constant outputs (for instance, in JSON)

The Amazon Nova method

Outline the immediate use case
- Job – What precisely the mannequin ought to do
- Function – Which function the mannequin ought to assume
- Response model – The construction or tone of the output
- Directions – Tips the mannequin should observe
Chain-of-thought
- Express state response – Present clear and robust directions to restrict the mannequin’s response
- Structural considering – Considering step-by-step encourages structural considering
Formatting and construction
- Use delimiters to sectioned your prompts, for instance, ##Job##, ##Context##, or ##Instance##
- Specify the output format, for instance, JSON, YAML, or Markdown
- Use robust directions and caps, reminiscent of DO, DO NOT, or MUST
- Prefill the responses to information the mannequin, for instance, begin with “{“ or “json…”

Evolving your immediate technique: Migrating from OpenAI to Amazon Nova

Transitioning to Amazon Nova isn’t merely a change in API endpoints—it requires retooling your immediate engineering to align with the strengths of Amazon Nova. You must reframe your use case definition. Start by breaking down your present GPT-4o or GPT-4o Mini immediate into its core parts of activity, function, response model, and directions and success standards. Be certain to construction these parts clearly to offer a blueprint for the mannequin.

To know find out how to migrate an current OpenAI immediate to work optimally for Amazon Nova Professional, think about the next instance utilizing the assembly notes summarizer. Right here is the GPT-4o system immediate:

You'll be supplied with assembly notes, and your activity is to summarize the assembly 
as follows: 
-Total abstract of dialogue 
-Motion gadgets (what must be achieved and who's doing it) 
-If relevant, a listing of matters that must be mentioned extra absolutely within the subsequent assembly.

The consumer immediate is the assembly notes that must be summarized:

Assembly Date: March fifth, 2050
Assembly Time: 2:00 PM
Location: Convention Room 3B, Intergalactic Headquarters...

GPT produces this beneficial response:

Total Abstract of Dialogue: The crew mentioned the latest mission to Planet Zog, 
the area pirate problem in Sector 7, the annual Intergalactic Bake-Off, planning for 
the upcoming charity fundraiser, and a proposed team-building retreat. The mission to 
Planet Zog was profitable, however communication with the Zogians was difficult. The crew 
additionally mentioned the necessity for a greater technique to take care of area pirates. The crew positioned 
second within the Intergalactic Bake-Off and is planning for the upcoming charity fundraiser. 
A team-building retreat to the Moon Resort and Spa was proposed.

Motion Gadgets:
1. Dr. Quasar to begin engaged on a Zogian-English dictionary.
2. Captain Stardust to talk with Admiral Starbeam about rising patrols in Sector 7.
3. Dr. Quasar to finalize the prototype of the brand new cloaking know-how.
4. All crew members to begin gathering provides and getting ready video games for the Intergalactic 
Charity Bazaar.
5. Captain Stardust to test the finances for the proposed team-building retreat to the Moon 
Resort and Spa.

Subjects for Subsequent Assembly:
1. Replace on the Zogian-English dictionary.
2. Progress report on the cloaking know-how.
3. Outcomes of elevated patrols in Sector 7.
4. Ultimate preparations for the Intergalactic Charity Bazaar.

To fulfill or exceed the standard of the response from GPT-4o, here’s what an Amazon Nova Professional immediate would possibly appear like. The immediate makes use of the identical finest practices mentioned on this put up, beginning with the system immediate. We used a temperature of .2 and a topP of .9 right here:

You're an skilled govt assistant expert in assembly observe evaluation and 
summarization. Your major obligations embrace distilling advanced discussions
into clear, actionable summaries.
Observe these directions:

##INSTRUCTIONS##
1. Learn perceive the assembly notes present in ##NOTES##
2. Put your entire outputs in a piece referred to as ##OUTPUTS## in markdown formatting
3. Summarize the assembly notes in 5 sentences or much less. Put this in a piece referred to as 
"Total Abstract".
4. Numerically record any motion gadgets for particular folks and what must be accomplished. 
Put this record in a piece referred to as "Motion Gadgets".
5. If relevant, record the matters that must be mentioned extra absolutely within the subsequent assembly. 
Put this in a piece referred to as "Subjects for Subsequent Assembly".

Right here’s the consumer immediate, utilizing prefilled responses:

##NOTES##
Assembly Date: March fifth, 2050
Assembly Time: 2:00 PM
Location: Convention Room 3B, Intergalactic Headquarters
Attendees:
- Captain Stardust
- Dr. Quasar
- Girl Nebula
- Sir Supernova
- Ms. Comet
Assembly referred to as to order by Captain Stardust at 2:05 PM
1. Introductions and welcome to our latest crew member, Ms. Comet
2. Dialogue of our latest mission to Planet Zog
- Captain Stardust: "Total, a hit, however communication with the Zogians was troublesome. 
We have to enhance our language abilities."
- Dr. Quasar: "Agreed. I am going to begin engaged on a Zogian-English dictionary instantly."
- Girl Nebula: "The Zogian meals was out of this world, actually! We must always think about having 
a Zogian meals night time on the ship."
3. Addressing the area pirate problem in Sector 7
- Sir Supernova: "We'd like a greater technique for coping with these pirates. They've already 
plundered three cargo ships this month."
- Captain Stardust: "I am going to communicate with Admiral Starbeam about rising patrols in that space.
- Dr. Quasar: "I have been engaged on a brand new cloaking know-how that might assist our ships keep away from 
detection by the pirates. I am going to want just a few extra weeks to finalize the prototype."
4. Overview of the annual Intergalactic Bake-Off
- Girl Nebula: "I am comfortable to report that our crew positioned second within the competitors! Our Martian Mud 
Pie was a giant hit!"
- Ms. Comet: "Let's purpose for first place subsequent yr. I've a secret recipe for Jupiter Jello that I 
suppose might be a winner."
5. Planning for the upcoming charity fundraiser
- Captain Stardust: "We'd like some artistic concepts for our sales space on the Intergalactic Charity Bazaar."
- Sir Supernova: "How a few 'Dunk the Alien' recreation? We are able to have folks throw water balloons at a 
volunteer dressed as an alien."
- Dr. Quasar: "I can arrange a 'Identify That Star' trivia recreation with prizes for the winners."
- Girl Nebula: "Nice concepts, everybody. Let's begin gathering the provides and getting ready the video games."
6. Upcoming team-building retreat
- Ms. Comet: "I wish to suggest a team-building retreat to the Moon Resort and Spa. It is an incredible 
alternative to bond and calm down after our latest missions."
- Captain Stardust: "Feels like a unbelievable thought. I am going to test the finances and see if we will make it occur."
7. Subsequent assembly agenda gadgets
- Replace on the Zogian-English dictionary (Dr. Quasar)
- Progress report on the cloaking know-how (Dr. Quasar)
- Outcomes of elevated patrols in Sector 7 (Captain Stardust)
- Ultimate preparations for the Intergalactic Charity Bazaar (All)
Assembly adjourned at 3:15 PM. Subsequent assembly scheduled for March nineteenth, 2050 at 2:00 PM in Convention Room 3B, 
Intergalactic Headquarters.

##OUTPUTS##

The next instance reveals that the Amazon Nova response meets and exceeds the accuracy of the OpenAI instance, codecs the output in Markdown, and has discovered clear homeowners for every motion merchandise:

### Total Abstract
The assembly lined the profitable mission to Planet Zog, the continuing problem with 
area pirates in Sector 7, the outcomes of the annual Intergalactic Bake-Off, plans 
for the upcoming charity fundraiser, and a proposed team-building retreat. Motion gadgets 
had been assigned, and the following assembly's agenda was set.

### Motion Gadgets
1. **Dr. Quasar**: Start engaged on a Zogian-English dictionary.
2. **Captain Stardust**: Communicate with Admiral Starbeam about rising patrols in Sector 7.
3. **Dr. Quasar**: Finalize the prototype for the brand new cloaking know-how inside just a few weeks.
4. **Crew**: Begin gathering provides and getting ready video games for the Intergalactic Charity Bazaar.
5. **Captain Stardust**: Test the finances for the proposed team-building retreat to the Moon Resort and Spa.

### Subjects for Subsequent Assembly
1. Replace on the Zogian-English dictionary (Dr. Quasar)
2. Progress report on the cloaking know-how (Dr. Quasar)
3. Outcomes of elevated patrols in Sector 7 (Captain Stardust)
4. Ultimate preparations for the Intergalactic Charity Bazaar (All)

Just a few updates to the immediate can obtain comparable or higher outcomes from Amazon Nova Professional whereas having fun with a a lot cheaper price of inference.

Make use of Amazon Nova prolonged context

Amazon Nova Lite and Amazon Nova Professional can help as much as 300,000 enter tokens, which suggests that you may embrace extra context in your immediate if wanted. Increase your background information and detailed directions accordingly—in case your unique OpenAI immediate was optimized for 128,000 tokens, modify it to make use of the Amazon Nova prolonged window.

Tailor output constraints

In case your GPT immediate required strict formatting (for instance, “Reply in JSON solely”), ensure that your Amazon Nova immediate contains these directives. Moreover, in case your activity entails multimodal inputs, specify when to incorporate photographs or video references.

Operate calling

The rise of generative AI brokers has made perform calling, or software calling, some of the necessary skills of a given massive language mannequin (LLM). A mannequin’s means to appropriately choose the suitable software for the job, in a low-latency method, is usually the distinction between success and failure of an agentic system.

Each OpenAI and Amazon Nova fashions share similarities in perform calling, particularly their help for structured API calls. Each mannequin households help software choice via outlined software schemas, which we focus on later on this put up. In addition they each present a mechanism to resolve when to invoke these instruments or not.

OpenAI’s perform calling makes use of versatile JSON schemas to outline and construction API interactions. The fashions help a variety of schema configurations, which give builders the flexibility to shortly implement exterior perform calls via easy JSON definitions tied to their API endpoints.

Right here is an instance of a perform:

instruments = [{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current temperature for a given location.",
            "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and country e.g. Montevideo, Uruguay"
                        }            
                },            
                "required": [                
                    "location"
        ],
        "additionalProperties": False
    },
    "strict": True
    }
}]

completion = shopper.chat.completions.create(
    mannequin="gpt-4o",
    messages=[{"role": "user", "content": "What is the weather like in Punta del Este today?"}],
    instruments=instruments

Just like OpenAI’s method, Amazon Nova can name instruments when handed a configuration schema as proven within the following code instance. Amazon Nova has made heavy use of Grasping Decoding when calling instruments, and it’s suggested to set temperature, topP, and topK to 1. This makes positive that the mannequin has the best accuracy in software choice. These Grasping Decoding parameters and different nice examples of software use are lined in nice element in Instrument use (perform calling) with Amazon Nova.

The next is an instance of perform calling with out utilizing additionalModelRequestFields:

tool_config = {
    "instruments": [{
        "toolSpec": {
            "name": "get_recipe",
            "description": "Structured recipe generation system",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "recipe": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "ingredients": {
                                    "type": "array",
                                    "items": {
                                        "type": "object",
                                        "properties": {
                                            "item": {"type": "string"},
                                            "amount": {"type": "number"},
                                            "unit": {"type": "string"}
                                        }
                                    }
                                },
                                "instructions": {
                                    "type": "array",
                                    "items": {"type": "string"}
                                }
                            },
                            "required": ["name", "ingredients", "instructions"]
                        }
                    }
                }
            }
        }
    }]
}

# Base configuration with out topK=1
input_text = "I want a recipe for chocolate lava cake"
messages = [{
    "role": "user",
    "content": [{"text": input_text}]
}]

# Inference parameters
inf_params = {"topP": 1, "temperature": 1}

response = shopper.converse(
    modelId="us.amazon.nova-lite-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig=inf_params
)
# Usually produces much less structured or incomplete output

The next instance reveals how perform calling accuracy could be improved through the use of

additionalModelRequestFields:

# Enhanced configuration with topK=1
response = shopper.converse(
    modelId="us.amazon.nova-lite-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig=inf_params,
    additionalModelRequestFields={"inferenceConfig": {"topK": 1}}
)
# Produces extra correct and structured perform name

To maximise Amazon Nova perform calling potential and enhance accuracy, at all times use additionalModelRequestFields with topk=1. This forces the mannequin to pick out the one most possible token and prevents random token choice. This will increase deterministic output technology and improves perform name precision by about 30–40%.

The next code examples additional clarify find out how to conduct software calling efficiently. The primary situation reveals recipe technology with out an express software. The instance doesn’t use topK, which usually leads to responses which can be much less structured:

input_text = """
I am searching for a decadent chocolate dessert that is fast to organize. 
One thing that appears fancy however is not difficult to make.
"""

messages = [{
    "role": "user",
    "content": [{"text": input_text}]
}]

response = shopper.converse(
    modelId="us.amazon.nova-lite-v1:0",
    messages=messages,
    inferenceConfig={"topP": 1, "temperature": 1}
)
# Generates a conversational recipe description
# Much less structured, extra narrative-driven response

On this instance, the situation reveals recipe technology with a structured software. We add topK set to 1, which produces a extra structured output:

response = shopper.converse(
    modelId="us.amazon.nova-lite-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig={"topP": 1, "temperature": 1},
    additionalModelRequestFields={"inferenceConfig": {"topK": 1}}
)
# Generates a extremely structured, JSON-compliant recipe
# Contains exact ingredient measurements
# Gives step-by-step directions

Total, OpenAI gives extra versatile, broader schema help. Amazon Nova gives extra exact, managed output technology and is your best option when working with high-stakes, structured information eventualities, as demonstrated in Amazon Nova’s efficiency on the IFEval benchmark mentioned in part 2.1.1 of the technical report and mannequin card. We advocate utilizing Amazon Nova for purposes requiring predictable, structured responses as a result of its perform calling methodology gives superior management and accuracy.

Conclusion

The evolution from OpenAI’s fashions to Amazon Nova represents a big shift in utilizing AI. It reveals a transition towards fashions that ship comparable or superior efficiency at a fraction of the fee, with expanded capabilities in multimodal processing and prolonged context dealing with.

Whether or not you’re utilizing the sturdy, enterprise-ready Amazon Nova Professional, the agile and economical Amazon Nova Lite, or the versatile Amazon Nova Micro, the advantages are clear:

Price financial savings – With token prices as much as 4 occasions decrease, companies can scale purposes extra economically
Enhanced response efficiency – Quicker response occasions (as much as 190 tokens per second) make real-time purposes extra viable
Expanded capabilities – A bigger context window and multimodal help unlock new purposes, from detailed doc evaluation to built-in visible content material

By evolving your immediate technique—redefining use circumstances, exploiting the prolonged context, and iteratively refining directions—you’ll be able to easily migrate your current workflows from OpenAI’s o4 and o4-mini fashions to the revolutionary world of Amazon Nova.

In regards to the Authors

Claudio Mazzoni is a Sr Specialist Options Architect on the Amazon Bedrock GTM crew. Claudio exceeds at guiding costumers via their Gen AI journey. Outdoors of labor, Claudio enjoys spending time with household, working in his backyard, and cooking Uruguayan meals.

Pat Reilly is a Sr. Specialist Options Architect on the Amazon Bedrock Go-to-Market crew. Pat has spent the final 15 years in analytics and machine studying as a guide. When he’s not constructing on AWS, you’ll find him fumbling round with wooden tasks.