Within the drive to stay aggressive, companies at the moment are turning to AI to assist them decrease price and maximize effectivity. It’s incumbent on them to search out probably the most appropriate AI mannequin—the one that can assist them obtain extra whereas spending much less. For a lot of companies, the migration from OpenAI’s mannequin household to Amazon Nova represents not solely a shift in mannequin however a strategic transfer towards scalability, effectivity, and broader multimodal capabilities.
On this weblog, we focus on find out how to optimize prompting in Amazon Nova for the perfect price-performance.
Why migrate from OpenAI to Amazon Nova?
OpenAI’s fashions stay highly effective, however their operational prices could be prohibitive when scaled. Contemplate these figures from Synthetic Evaluation:
Mannequin | Enter Token Price (per Million Tokens) | Output Token Price (per Million Tokens) | Context Window | Output Velocity (Tokens per Second) | Latency (Seconds per first token) |
GPT-4o | ~$2.50 | ~$10.00 | As much as 128K tokens | ~63 | ~0.49 |
GPT-4o Mini | ~$0.15 | ~$0.60 | As much as 128K tokens | ~90 | ~0.43 |
Nova Micro | ~$0.035 | ~$0.14 | As much as 128K tokens | ~195 | ~0.29 |
Nova Lite | ~$0.06 | ~$0.24 | As much as 300K tokens | ~146 | ~0.29 |
Nova Professional | ~$0.80 | ~$3.20 | As much as 300K tokens | ~90 | ~0.34 |
For top-volume purposes—like world buyer help or large-scale doc evaluation—these price variations are disruptive. Not solely does Amazon Nova Professional supply over thrice the cost-efficiency, its longer context window additionally permits it to deal with extra in depth and complicated inputs.
Breaking down the Amazon Nova suite
Amazon Nova isn’t a single mannequin—it’s a set designed for numerous wants:
- Amazon Nova Professional – A strong multimodal mannequin that may course of textual content, photographs, and video. It excels at duties reminiscent of doc evaluation and deep information visualization. Benchmark comparisons present Amazon Nova Professional matching and even surpassing GPT-4o on advanced reasoning duties, in response to part 2.1.1 of the Nova technical report and mannequin card.
- Amazon Nova Lite – Affords a balanced mixture of multimodal processing and pace. Amazon Nova Lite is good for purposes reminiscent of doc summarization, translation, and even fundamental visible search, delivering high quality outputs at decrease latency and value in comparison with GPT-4o Mini. You could find these benchmark leads to part 2.1.2 of the Nova Technical Report and Mannequin Card.
- Amazon Nova Micro – A text-only mannequin engineered for ultra-low latency. With output pace of as much as 195 tokens per second, Amazon Nova Micro is ideal for real-time purposes reminiscent of chat-based assistants and automatic FAQs. Its token prices are dramatically decrease than these of GPT-4o Mini—roughly 4.3 occasions cheaper on a per-token foundation.
The decrease per-token prices and better output per second of Amazon Nova provide the flexibility to simplify prompts for real-time purposes so you’ll be able to stability high quality, pace, and value to your use case.
Understanding the foundations
To make the perfect choice about which mannequin household matches your wants, it’s necessary to grasp the variations in immediate engineering finest practices in each OpenAI and Amazon Nova. Every mannequin household has its personal set of strengths, however there are some issues that apply to each households. Throughout each mannequin households, high quality accuracy is achieved via readability of directions, structured prompts, and iterative refinement. Whether or not you’re utilizing robust output directives or clearly outlined use circumstances, the objective is to cut back ambiguity and enhance response high quality.
The OpenAI method
OpenAI makes use of a layered messaging system for immediate engineering, the place system, developer, and consumer prompts work in concord to regulate tone, security, and output format. Their method emphasizes:
- Hierarchical message roles – Setting the mannequin’s function and habits utilizing system messages makes positive that the overarching security and magnificence tips (set in system prompts) are preserved
- Instruction placement and delimiters – Directives are positioned originally, with clear separation between context, examples, and queries
- Selective chain-of-thought – Detailed, step-by-step reasoning is used when it advantages advanced duties
- Formatting and construction – Utilizing robust directives reminiscent of DO, MUST, and DO NOT to offer constant outputs (for instance, in JSON)
The Amazon Nova method
- Outline the immediate use case
- Job – What precisely the mannequin ought to do
- Function – Which function the mannequin ought to assume
- Response model – The construction or tone of the output
- Directions – Tips the mannequin should observe
- Chain-of-thought
- Express state response – Present clear and robust directions to restrict the mannequin’s response
- Structural considering – Considering step-by-step encourages structural considering
- Formatting and construction
- Use delimiters to sectioned your prompts, for instance, ##Job##, ##Context##, or ##Instance##
- Specify the output format, for instance, JSON, YAML, or Markdown
- Use robust directions and caps, reminiscent of DO, DO NOT, or MUST
- Prefill the responses to information the mannequin, for instance, begin with “{“ or “json…”
Evolving your immediate technique: Migrating from OpenAI to Amazon Nova
Transitioning to Amazon Nova isn’t merely a change in API endpoints—it requires retooling your immediate engineering to align with the strengths of Amazon Nova. You must reframe your use case definition. Start by breaking down your present GPT-4o or GPT-4o Mini immediate into its core parts of activity, function, response model, and directions and success standards. Be certain to construction these parts clearly to offer a blueprint for the mannequin.
To know find out how to migrate an current OpenAI immediate to work optimally for Amazon Nova Professional, think about the next instance utilizing the assembly notes summarizer. Right here is the GPT-4o system immediate:
The consumer immediate is the assembly notes that must be summarized:
GPT produces this beneficial response:
To fulfill or exceed the standard of the response from GPT-4o, here’s what an Amazon Nova Professional immediate would possibly appear like. The immediate makes use of the identical finest practices mentioned on this put up, beginning with the system immediate. We used a temperature of .2 and a topP of .9 right here:
Right here’s the consumer immediate, utilizing prefilled responses:
The next instance reveals that the Amazon Nova response meets and exceeds the accuracy of the OpenAI instance, codecs the output in Markdown, and has discovered clear homeowners for every motion merchandise:
Just a few updates to the immediate can obtain comparable or higher outcomes from Amazon Nova Professional whereas having fun with a a lot cheaper price of inference.
Make use of Amazon Nova prolonged context
Amazon Nova Lite and Amazon Nova Professional can help as much as 300,000 enter tokens, which suggests that you may embrace extra context in your immediate if wanted. Increase your background information and detailed directions accordingly—in case your unique OpenAI immediate was optimized for 128,000 tokens, modify it to make use of the Amazon Nova prolonged window.
Tailor output constraints
In case your GPT immediate required strict formatting (for instance, “Reply in JSON solely”), ensure that your Amazon Nova immediate contains these directives. Moreover, in case your activity entails multimodal inputs, specify when to incorporate photographs or video references.
Operate calling
The rise of generative AI brokers has made perform calling, or software calling, some of the necessary skills of a given massive language mannequin (LLM). A mannequin’s means to appropriately choose the suitable software for the job, in a low-latency method, is usually the distinction between success and failure of an agentic system.
Each OpenAI and Amazon Nova fashions share similarities in perform calling, particularly their help for structured API calls. Each mannequin households help software choice via outlined software schemas, which we focus on later on this put up. In addition they each present a mechanism to resolve when to invoke these instruments or not.
OpenAI’s perform calling makes use of versatile JSON schemas to outline and construction API interactions. The fashions help a variety of schema configurations, which give builders the flexibility to shortly implement exterior perform calls via easy JSON definitions tied to their API endpoints.
Right here is an instance of a perform:
instruments = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Montevideo, Uruguay"
}
},
"required": [
"location"
],
"additionalProperties": False
},
"strict": True
}
}]
completion = shopper.chat.completions.create(
mannequin="gpt-4o",
messages=[{"role": "user", "content": "What is the weather like in Punta del Este today?"}],
instruments=instruments
Just like OpenAI’s method, Amazon Nova can name instruments when handed a configuration schema as proven within the following code instance. Amazon Nova has made heavy use of Grasping Decoding when calling instruments, and it’s suggested to set temperature, topP, and topK to 1. This makes positive that the mannequin has the best accuracy in software choice. These Grasping Decoding parameters and different nice examples of software use are lined in nice element in Instrument use (perform calling) with Amazon Nova.
The next is an instance of perform calling with out utilizing additionalModelRequestFields:
tool_config = {
"instruments": [{
"toolSpec": {
"name": "get_recipe",
"description": "Structured recipe generation system",
"inputSchema": {
"json": {
"type": "object",
"properties": {
"recipe": {
"type": "object",
"properties": {
"name": {"type": "string"},
"ingredients": {
"type": "array",
"items": {
"type": "object",
"properties": {
"item": {"type": "string"},
"amount": {"type": "number"},
"unit": {"type": "string"}
}
}
},
"instructions": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["name", "ingredients", "instructions"]
}
}
}
}
}
}]
}
# Base configuration with out topK=1
input_text = "I want a recipe for chocolate lava cake"
messages = [{
"role": "user",
"content": [{"text": input_text}]
}]
# Inference parameters
inf_params = {"topP": 1, "temperature": 1}
response = shopper.converse(
modelId="us.amazon.nova-lite-v1:0",
messages=messages,
toolConfig=tool_config,
inferenceConfig=inf_params
)
# Usually produces much less structured or incomplete output
The next instance reveals how perform calling accuracy could be improved through the use of
additionalModelRequestFields:
# Enhanced configuration with topK=1
response = shopper.converse(
modelId="us.amazon.nova-lite-v1:0",
messages=messages,
toolConfig=tool_config,
inferenceConfig=inf_params,
additionalModelRequestFields={"inferenceConfig": {"topK": 1}}
)
# Produces extra correct and structured perform name
To maximise Amazon Nova perform calling potential and enhance accuracy, at all times use additionalModelRequestFields with topk=1. This forces the mannequin to pick out the one most possible token and prevents random token choice. This will increase deterministic output technology and improves perform name precision by about 30–40%.
The next code examples additional clarify find out how to conduct software calling efficiently. The primary situation reveals recipe technology with out an express software. The instance doesn’t use topK, which usually leads to responses which can be much less structured:
input_text = """
I am searching for a decadent chocolate dessert that is fast to organize.
One thing that appears fancy however is not difficult to make.
"""
messages = [{
"role": "user",
"content": [{"text": input_text}]
}]
response = shopper.converse(
modelId="us.amazon.nova-lite-v1:0",
messages=messages,
inferenceConfig={"topP": 1, "temperature": 1}
)
# Generates a conversational recipe description
# Much less structured, extra narrative-driven response
On this instance, the situation reveals recipe technology with a structured software. We add topK set to 1, which produces a extra structured output:
response = shopper.converse(
modelId="us.amazon.nova-lite-v1:0",
messages=messages,
toolConfig=tool_config,
inferenceConfig={"topP": 1, "temperature": 1},
additionalModelRequestFields={"inferenceConfig": {"topK": 1}}
)
# Generates a extremely structured, JSON-compliant recipe
# Contains exact ingredient measurements
# Gives step-by-step directions
Total, OpenAI gives extra versatile, broader schema help. Amazon Nova gives extra exact, managed output technology and is your best option when working with high-stakes, structured information eventualities, as demonstrated in Amazon Nova’s efficiency on the IFEval benchmark mentioned in part 2.1.1 of the technical report and mannequin card. We advocate utilizing Amazon Nova for purposes requiring predictable, structured responses as a result of its perform calling methodology gives superior management and accuracy.
Conclusion
The evolution from OpenAI’s fashions to Amazon Nova represents a big shift in utilizing AI. It reveals a transition towards fashions that ship comparable or superior efficiency at a fraction of the fee, with expanded capabilities in multimodal processing and prolonged context dealing with.
Whether or not you’re utilizing the sturdy, enterprise-ready Amazon Nova Professional, the agile and economical Amazon Nova Lite, or the versatile Amazon Nova Micro, the advantages are clear:
- Price financial savings – With token prices as much as 4 occasions decrease, companies can scale purposes extra economically
- Enhanced response efficiency – Quicker response occasions (as much as 190 tokens per second) make real-time purposes extra viable
- Expanded capabilities – A bigger context window and multimodal help unlock new purposes, from detailed doc evaluation to built-in visible content material
By evolving your immediate technique—redefining use circumstances, exploiting the prolonged context, and iteratively refining directions—you’ll be able to easily migrate your current workflows from OpenAI’s o4 and o4-mini fashions to the revolutionary world of Amazon Nova.
In regards to the Authors
Claudio Mazzoni is a Sr Specialist Options Architect on the Amazon Bedrock GTM crew. Claudio exceeds at guiding costumers via their Gen AI journey. Outdoors of labor, Claudio enjoys spending time with household, working in his backyard, and cooking Uruguayan meals.
Pat Reilly is a Sr. Specialist Options Architect on the Amazon Bedrock Go-to-Market crew. Pat has spent the final 15 years in analytics and machine studying as a guide. When he’s not constructing on AWS, you’ll find him fumbling round with wooden tasks.