Textual content-to-SQL era stays a persistent problem in enterprise AI purposes, notably when working with customized SQL dialects or domain-specific database schemas. Whereas basis fashions (FMs) reveal robust efficiency on commonplace SQL, reaching production-grade accuracy for specialised dialects requires fine-tuning. Nevertheless, fine-tuning introduces an operational trade-off: internet hosting customized fashions on persistent infrastructure incurs steady prices, even during times of zero utilization.
The on-demand inference of Amazon Bedrock with fine-tuned Amazon Nova Micro fashions presents another. By combining the effectivity of LoRA (Low-Rank Adaptation) fine-tuning with serverless and pay-per-token inference, organizations can obtain customized text-to-SQL capabilities with out the overhead price incurred by persistent mannequin internet hosting. Regardless of the extra inference time overhead of making use of LoRA adapters, testing demonstrated latency appropriate for interactive text-to-SQL purposes, with prices scaling by utilization moderately than provisioned capability.
On this submit, we reveal two approaches to fine-tune Amazon Nova Micro for customized SQL dialect era to ship each price effectivity and manufacturing prepared efficiency. Our instance workload maintained a price of $0.80 month-to-month with a pattern visitors of twenty-two,000 queries per thirty days, which resulted in prices financial savings in comparison with a persistently hosted mannequin infrastructure.
Conditions
To deploy these options, you will want the next:
- An AWS account with billing enabled
- Customary IAM permissions and position configured to entry:
- Quota for ml.g5.48xl occasion for Amazon SageMaker AI coaching.
Answer overview
The answer consists of the next high-level steps:
- Put together your customized SQL coaching dataset with I/O pairs particular to your group’s SQL dialect and enterprise necessities.
- Begin the fine-tuning course of on Amazon Nova Micro mannequin utilizing your ready dataset and chosen fine-tuning method.
- Amazon Bedrock mannequin customization for streamlined deployment
- Amazon SageMaker AI for fine-grained coaching customization and management
- Deploy the customized mannequin on Amazon Bedrock to make use of on-demand inference, eradicating infrastructure administration whereas paying just for token utilization.
- Validate mannequin efficiency with check queries particular to your customized SQL dialect and enterprise use instances.
To reveal this method in apply, we offer two full implementation paths that handle totally different organizational wants. The primary makes use of the managed mannequin customization of Amazon Bedrock for groups prioritizing simplicity and speedy deployment. The second makes use of Amazon SageMaker AI coaching jobs for organizations requiring extra granular management over hyperparameters and coaching infrastructure. Each implementations share the identical knowledge preparation pipeline and deploy to Amazon Bedrock for on-demand inference. The next are hyperlinks to every GitHub code pattern:
The next structure diagram illustrates the end-to-end workflow, which encompasses knowledge preparation, each fine-tuning approaches, and the Bedrock deployment path that allows serverless inference.

1. Dataset preparation
Our demonstration makes use of the sql-create-context dataset. This dataset is a curated mixture of WikiSQL and Spider datasets containing over 78,000 examples of pure language questions paired with SQL queries throughout various database schemas. This dataset supplies a perfect basis for text-to-SQL fine-tuning as a consequence of its selection in question complexity, from easy SELECT statements to advanced multi-table joins with aggregations.
Knowledge formatting and construction
The Coaching knowledge is structured as outlined within the documentation. This entails creating JSONL information that include system immediate directions paired with consumer queries and corresponding SQL responses of various complexity. The formatted coaching dataset is then cut up into coaching and validation units, saved as JSONL information, and uploaded to Amazon Easy Storage Service (Amazon S3) for the fine-tuning course of.
Pattern Transformed File
{
"schemaVersion": "bedrock-conversation-2024",
"system": [
{
"text": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You can use the following table schema for context: CREATE TABLE head (age INTEGER)"
}
],
"messages": [
{
"role": "user",
"content": [
{
"text": "Return the SQL query that answers the following question: How many heads of the departments are older than 56 ?"
}
]
},
{
"position": "assistant",
"content material": [
{
"text": "SELECT COUNT(*) FROM head WHERE age > 56"
}
]
}
]
}
Amazon Bedrock fine-tuning method
The mannequin customization of Amazon Bedrock supplies a streamlined, absolutely managed method to fine-tuning Amazon Nova fashions with out the necessity to provision or handle coaching infrastructure. This technique is right for groups searching for speedy iteration and minimal operational overhead whereas reaching customized mannequin efficiency tailor-made to their text-to-SQL use case.
Utilizing the customization capabilities of Amazon Bedrock, coaching knowledge is uploaded to Amazon S3, and fine-tuning jobs are configured by the AWS console or API. AWS then handles the underlying coaching infrastructure. The ensuing customized mannequin may be deployed utilizing on-demand inference, sustaining the identical token-based pricing as the bottom Nova Micro mannequin with no extra markup making it a cheap resolution for variable workloads.This method is well-suited when you’ll want to shortly customise a mannequin for customized SQL dialects with out managing ML infrastructure, need to minimal operational complexity, or want serverless inference with computerized scaling.
2a. Making a High quality-tuning Job Utilizing Amazon Bedrock
Amazon Bedrock helps fine-tuning utilizing each the AWS Console and AWS SDK for Python (Boto3). The AWS documentation accommodates common steerage on the best way to submit a coaching job with each approaches. In our implementation, we used the AWS SDK for Python (Boto3). Confer with the pattern pocket book in our GitHub samples repository to view our step-by-step implementation.
Configure hyperparameters
After choosing the mannequin to fine-tune, we then configure our hyperparameters for our use case. For Amazon Nova Micro fine-tuning on Amazon Bedrock, the next hyperparameters may be personalized to optimize our text-to-SQL mannequin:
| Parameter | Vary/Constraints | Goal | What we used |
| Epochs | 1–5 | Variety of full passes by the coaching dataset | 5 epochs |
| Batch Dimension | Mounted at 1 | Variety of samples processed earlier than updating mannequin weights | 1 (fastened for Nova Micro) |
| Studying Charge | 0.000001–0.0001 | Step dimension for gradient descent optimization | 0.00001 for steady convergence |
| Studying Charge Warmup Steps | 0–100 | Variety of steps to step by step enhance studying charge | 10 |
Observe: These hyperparameters have been optimized for our particular dataset and use case. Optimum values could range based mostly on dataset dimension and complexity. Within the pattern dataset, this configuration offered improved stability between mannequin accuracy and coaching time, finishing in roughly 2-3 hours.
Analyzing coaching metrics
Amazon Bedrock routinely generates coaching and validation metrics, that are saved in your specified S3 output location. These metrics embrace:
- Coaching loss: Measures how properly the mannequin suits the coaching knowledge
- Validation loss: Signifies generalization efficiency on unseen knowledge

The coaching and validation loss curves present profitable coaching: each lower persistently, observe related patterns, and converge to comparable ultimate values.
3a. Deploy with on-demand inference
After your fine-tuning job completes efficiently, you may deploy your customized Nova Micro mannequin utilizing on-demand inference. This deployment choice supplies computerized scaling and pay-per-token pricing, making it ultimate for variable workloads with out the necessity to provision devoted compute assets.
Invoking the customized Nova Micro mannequin
After deployment, you may invoke your customized text-to-SQL mannequin through the use of the deployment ARN because the mannequin ID within the Amazon Bedrock Converse API.
# Use the deployment ARN because the mannequin ID
deployment_arn = "arn:aws:bedrock:us-east-1::deployment/"
# Put together the inference request
response = bedrock_runtime.converse(
modelId=deployment_arn,
messages=[
{
"role": "user",
"content": [
{
"text": """Database schema:
CREATE TABLE sales (
id INT,
product_name VARCHAR(100),
category VARCHAR(50),
revenue DECIMAL(10,2),
sale_date DATE
);
Question: What are the top 5 products by revenue in the Electronics category?"""
}
]
}
],
inferenceConfig={
"maxTokens": 512,
"temperature": 0.1, # Low temperature for deterministic SQL era
"topP": 0.9
}
)
# Extract the generated SQL question
sql_query = response['output']['message']['content']['text']
print(f"Generated SQL:
{sql_query}")
Amazon SageMaker AI fine-tuning method
Whereas the Amazon Bedrock method streamlines mannequin customization by a managed coaching expertise, organizations searching for deeper optimization management would possibly profit from the SageMaker AI method. SageMaker AI supplies intensive management over coaching parameters that may considerably influence effectivity and mannequin efficiency. You may alter batch dimension for pace and reminiscence optimzation, fine-tune dropout settings throughout layers to forestall overfitting, and configure studying charge schedules for coaching stability. For LoRA fine-tuning particularly, You need to use SageMaker AI to customise scaling components and regularization parameters with totally different settings optimized for multimodal versus text-only datasets. Moreover, you may alter the context window dimension and optimizer settings to match your particular use case necessities. See the next pocket book for the entire code pattern.
1b. Knowledge preparation and add
The information preparation and add course of for the SageMaker AI fine-tuning method is equivalent to the Amazon Bedrock implementation. Each approaches convert the SQL dataset to the bedrock-conversation-2024 schema format, cut up the info into coaching and check units, and add the JSONL information on to S3.
# S3 prefix for coaching knowledge
training_input_path = f's3://{sess.default_bucket()}/datasets/nova-sql-context'
# Add datasets to S3
train_s3_path = sess.upload_data(
path="knowledge/train_dataset.jsonl",
bucket=bucket_name,
key_prefix=training_input_path
)
test_s3_path = sess.upload_data(
path="knowledge/test_dataset.jsonl",
bucket=bucket_name,
key_prefix=training_input_path
)
print(f'Coaching knowledge uploaded to: {train_s3_path}')
print(f'Check knowledge uploaded to: {test_s3_path}')
2b. Making a fine-tuning job utilizing Amazon SageMaker AI
Choose the mannequin ID, recipe, and picture URI:
# Nova configuration
model_id = "nova-micro/prod"
recipe = "https://uncooked.githubusercontent.com/aws/sagemaker-hyperpod-recipes/refs/heads/predominant/recipes_collection/recipes/fine-tuning/nova/nova_1_0/nova_micro/SFT/nova_micro_1_0_g5_g6_48x_gpu_lora_sft.yaml"
instance_type = "ml.g5.48xlarge"
instance_count = 1
# Nova-specific picture URI
image_uri = f"708977205387.dkr.ecr.{sess.boto_region_name}.amazonaws.com/nova-fine-tune-repo:SM-TJ-SFT-latest"
print(f'Mannequin ID: {model_id}')
print(f'Recipe: {recipe}')
print(f'Occasion sort: {instance_type}')
print(f'Occasion depend: {instance_count}')
print(f'Picture URI: {image_uri}')
Configuring customized coaching recipes
A key differentiator when utilizing Amazon SageMaker AI for Nova mannequin fine-tuning is the power to customise a coaching recipe. Recipes are pre-configured coaching stacks offered by AWS that will help you shortly begin coaching and fine-tuning. Whereas sustaining compatibility with the usual hyperparameter set (epochs, batch dimension, studying charge, and warmup steps) of Amazon Bedrock, the recipes lengthen hyperparameter choices by:
- Regularization parameters: hidden_dropout, attention_dropout, and ffn_dropout to forestall overfitting.
- Optimizer settings: Customizable beta coefficients and weight decay settings.
- Structure controls: Adapter rank and scaling components for LoRA coaching.
- Superior scheduling: Customized studying charge schedules and warmup methods.
The beneficial method is to begin with the default settings to create a baseline, then optimize based mostly in your particular wants. Right here’s a listing of a number of the extra parameters that you could optimize for.
| Parameter | Vary/Constraints | Goal |
max_length |
1024–8192 | Management the utmost context window dimension for enter sequences |
global_batch_size |
16,32,64 | Variety of samples processed earlier than updating mannequin weights |
hidden_dropout |
0.0–1.0 | Regularization for hidden layer states to forestall overfitting |
attention_dropout |
0.0–1.0 | Regularization for consideration mechanism weights |
ffn_dropout |
0.0–1.0 | Regularization for feed ahead community layers |
weight_decay |
0.0–1.0 | L2 Regularization energy for mannequin weights |
Adapter_dropout |
0.0–1.0 | Regularization for LoRA adapter parameters |
The whole recipe that we used may be discovered right here.
Creating and executing a SageMaker AI coaching job
After configuring your mannequin and recipe, initialize the ModelTrainer object and start coaching:
from sagemaker.practice import ModelTrainer
coach = ModelTrainer.from_recipe(
training_recipe=recipe,
recipe_overrides=recipe_overrides,
compute=compute_config,
stopping_condition=stopping_condition,
output_data_config=output_config,
position=position,
base_job_name=job_name,
sagemaker_session=sess,
training_image=image_uri
)
# Configure knowledge channels
from sagemaker.practice.configs import InputData, S3DataSource
train_input = InputData(
channel_name="practice",
data_source=S3DataSource(
s3_uri=train_s3_path,
s3_data_type="Converse",
s3_data_distribution_type="FullyReplicated"
)
)
val_input = InputData(
channel_name="val",
data_source=S3DataSource(
s3_uri=test_s3_path,
s3_data_type="Converse",
s3_data_distribution_type="FullyReplicated"
)
)
# Start coaching
training_job = coach.practice(
input_data_config=[train_input,val_input],
wait=False
)
After coaching, we register the mannequin with Amazon Bedrock by the create_custom_model_deployment Amazon Bedrock API, enabling on-demand inference by the converse API utilizing the deployed mannequin ARN, system prompts, and consumer messages.
In our SageMaker AI coaching job, we used default recipe parameters, together with an epoch of two and batch dimension of 64, our knowledge contained 20,000 strains thus the entire coaching job lasted for 4 hours. With our ml.g5.48xlarge occasion, the overall price for fine-tuning our Nova Micro mannequin was $65.
4. Testing and analysis
For evaluating our mannequin, we carried out each operational and accuracy testing. To guage accuracy, we applied an LLM-as-a-Decide method the place we collected questions and SQL responses from our fine-tuned mannequin and used a choose mannequin to attain them in opposition to the bottom fact responses.
def get_score(system, consumer, assistant, generated):
formatted_prompt = (
"You're a knowledge science trainer that's introducing college students to SQL. "
f"Contemplate the next query and schema:"
f"{consumer} "
f"{system} "
"Right here is the proper reply:"
f"{assistant} "
f"Right here is the scholar's reply:"
f"{generated} "
"Please present a numeric rating from 0 to 100 on how properly the scholar's "
"reply matches the proper reply. Put the rating in XML tags."
)
_, consequence = ask_claude(formatted_prompt)
sample = r'(.*?) '
match = re.search(sample, consequence)
return match.group(1) if match else "0"
For operational testing, we gathered metrics together with TTFT (Time to First Token) and OTPS (Output Tokens Per Second). In comparison with the bottom Nova Micro mannequin, we skilled chilly begin time to first token averaging 639 ms throughout 5 runs (34% enhance). This latency enhance stems from making use of LoRA adapters at inference time moderately than baking them into mannequin weights. Nevertheless, this architectural selection delivers substantial price advantages, because the fine-tuned Nova Micro mannequin prices the identical as the bottom mannequin, enabling on-demand pricing with pay-per-use flexibility and no minimal commitments. Throughout regular operation, our time to first token averages 380 ms throughout 50 calls (7% enhance). Finish-to-end latency totals roughly 477 ms for full response era. Token era maintains a charge of roughly 183 tokens per second, representing solely a 27% lower from the bottom mannequin whereas remaining extremely appropriate for interactive purposes.

Value abstract
One-time prices:
- Amazon Bedrock mannequin coaching price: $0.001 per 1,000 tokens × variety of epochs
- For two,000 examples, 5 epochs and roughly 800 tokens every = $8.00
- SageMaker AI mannequin coaching price: We used the ml.g5.48xlarge occasion, which prices $16.288/hour
- Coaching lasted 4 hours with a 20,000-line dataset = $65.15
- Ongoing prices
- Storage: $1.95 per thirty days per customized mannequin
- On-demand inference: Identical per-token pricing as base Nova Micro
- Enter tokens: $0.000035 per 1,000 tokens (Amazon Nova Micro)
- Output tokens: $0.00014 per 1,000 tokens (Amazon Nova Micro)
Instance calculation for manufacturing workload:
For 22,000 queries per thirty days (100 customers × 10 queries/day × 22 enterprise days):
- Common 800 enter tokens + 60 output tokens per question
- Enter price: (22,000 × 800 / 1,000) × 0.000035 = 0.616
- Output price: (22,000 × 60 / 1,000) × 0.00014 = 0.184
- Whole month-to-month inference price: 0.80 USD
This evaluation validates that for customized dialect text-to-SQL use instances, fine-tuning a Nova mannequin utilizing PEFT LoRA on Amazon Bedrock is considerably cheaper than self-hosting customized fashions on persistent infrastructure. Self-hosted approaches would possibly suite use instances requiring most management over infrastructure, safety configurations, or integration necessities, however the Amazon Bedrock on-demand price mannequin presents vital price financial savings for many manufacturing text-to-SQL workloads.
Conclusion
These implementation choices reveal how Amazon Nova fine-tuning may be tailor-made to organizational wants and technical necessities. We explored two distinct approaches that serve totally different audiences and use instances. Whether or not you select the managed simplicity of Amazon Bedrock or extra management by SageMaker AI coaching, the serverless deployment mannequin and on-demand pricing signifies that you solely pay for what you utilize, whereas eradicating infrastructure administration.
The Amazon Bedrock mannequin customization method supplies a streamlined, managed resolution that eliminates infrastructure complexity. Knowledge scientists can give attention to knowledge preparation and mannequin analysis with out managing coaching infrastructure, making it ultimate for fast experimentation and improvement.
The SageMaker AI coaching method presents elevated management over each side of the fine-tuning course of. Machine studying (ML) engineers achieve granular management over coaching parameters, infrastructure choice, and integration with current MLOps workflows, which allows optimization for required efficiency, price, and operational necessities. For instance, you may alter batch sizes and occasion sorts to optimize coaching pace, or modify studying charges and LoRA parameters to stability mannequin high quality with coaching time based mostly in your particular operational wants
Select Amazon Bedrock mannequin customization when: You want speedy iteration, have restricted ML infrastructure experience, or need to decrease operational overhead whereas nonetheless reaching customized mannequin efficiency.
Select SageMaker AI coaching when: You require fine-grained parameter management, have particular infrastructure or compliance necessities, want integration with current MLOps pipelines, or need to optimize each side of the coaching course of.
Get began
Able to construct your personal cost-effective text-to-SQL resolution? Entry our full implementations:
Each approaches use the identical cost-efficient deployment mannequin, so you may select based mostly in your workforce’s experience and necessities moderately than price constraints.
Concerning the authors

