Speed up your ML lifecycle utilizing the brand new and improved Amazon SageMaker Python SDK – Half 2: ModelBuilder

In Half 1 of this sequence, we launched the newly launched ModelTrainer class on the Amazon SageMaker Python SDK and its advantages, and confirmed you how you can fine-tune a Meta Llama 3.1 8B mannequin on a customized dataset. On this publish, we take a look at the enhancements to the ModelBuilder class, which helps you to seamlessly deploy a mannequin from ModelTrainer to a SageMaker endpoint, and offers a single interface for a number of deployment configurations.

In November 2023, we launched the ModelBuilder class (see Package deal and deploy fashions sooner with new instruments and guided workflows in Amazon SageMaker and Package deal and deploy classical ML and LLMs simply with Amazon SageMaker, half 1: PySDK Enhancements), which lowered the complexity of preliminary setup of making a SageMaker endpoint resembling creating an endpoint configuration, selecting the container, serialization and deserialization, and extra, and helps you create a deployable mannequin in a single step. The current replace enhances usability of the ModelBuilder class for a variety of use instances, significantly within the quickly evolving subject of generative AI. On this publish, we deep dive into the enhancements made to the ModelBuilder class, and present you how you can seamlessly deploy the fine-tuned mannequin from Half 1 to a SageMaker endpoint.

Enhancements to the ModelBuilder class

We’ve made the next usability enhancements to the ModelBuilder class:

Seamless transition from coaching to inference – ModelBuilder now integrates immediately with SageMaker coaching interfaces to make it possible for the proper file path to the newest skilled mannequin artifact is routinely computed, simplifying the workflow from mannequin coaching to deployment.
Unified inference interface – Beforehand, the SageMaker SDK provided separate interfaces and workflows for various kinds of inference, resembling real-time, batch, serverless, and asynchronous inference. To simplify the mannequin deployment course of and supply a constant expertise, now we have enhanced ModelBuilder to function a unified interface that helps a number of inference varieties.
Ease of growth, testing, and manufacturing handoff – We’re including assist for native mode testing with ModelBuilder in order that customers can effortlessly debug and check their processing and inference scripts with sooner native testing with out together with a container, and a brand new operate that outputs the newest container picture for a given framework so that you don’t must replace the code every time a brand new LMI launch comes out.
Customizable inference preprocessing and postprocessing – ModelBuilder now permits you to customise preprocessing and postprocessing steps for inference. By enabling scripts to filter content material and take away personally identifiable info (PII), this integration streamlines the deployment course of, encapsulating the required steps inside the mannequin configuration for higher administration and deployment of fashions with particular inference necessities.
Benchmarking assist – The brand new benchmarking assist in ModelBuilder empowers you to judge deployment choices—like endpoints and containers—based mostly on key efficiency metrics resembling latency and value. With the introduction of a Benchmarking API, you possibly can check situations and make knowledgeable selections, optimizing your fashions for peak efficiency earlier than manufacturing. This enhances effectivity and offers cost-effective deployments.

Within the following sections, we focus on these enhancements in additional element and reveal how you can customise, check, and deploy your mannequin.

Seamless deployment from ModelTrainer class

ModelBuilder integrates seamlessly with the ModelTrainer class; you possibly can merely cross the ModelTrainer object that was used for coaching the mannequin on to ModelBuilder within the mannequin parameter. Along with the ModelTrainer, ModelBuilder additionally helps the Estimator class and the results of the SageMaker Core TrainingJob.create() operate, and routinely parses the mannequin artifacts to create a SageMaker Mannequin object. With useful resource chaining, you possibly can construct and deploy the mannequin as proven within the following instance. For those who adopted Half 1 of this sequence to fine-tune a Meta Llama 3.1 8B mannequin, you possibly can cross the model_trainer object as follows:

# set container URI
image_uri = "763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0"

model_builder = ModelBuilder(
    mannequin=model_trainer,  # ModelTrainer object handed onto ModelBuilder immediately
    role_arn=function,
    image_uri=image_uri,
    inference_spec=inf_spec,
    instance_type="ml.g5.2xlarge"
)
# deploy the mannequin
model_builder.construct().deploy()

Customise the mannequin utilizing InferenceSpec

The InferenceSpec class permits you to customise the mannequin by offering customized logic to load and invoke the mannequin, and specify any preprocessing logic or postprocessing logic as wanted. For SageMaker endpoints, preprocessing and postprocessing scripts are sometimes used as a part of the inference pipeline to deal with duties which can be required earlier than and after the information is shipped to the mannequin for predictions, particularly within the case of advanced workflows or non-standard fashions. The next instance exhibits how one can specify the customized logic utilizing InferenceSpec:

from sagemaker.serve.spec.inference_spec import InferenceSpec

class CustomerInferenceSpec(InferenceSpec):
    def load(self, model_dir):
        from transformers import AutoModel
        return AutoModel.from_pretrained(HF_TEI_MODEL, trust_remote_code=True)

    def invoke(self, x, mannequin):
        return mannequin.encode(x)

    def preprocess(self, input_data):
        return json.masses(input_data)["inputs"]

    def postprocess(self, predictions):
        assert predictions will not be None
        return predictions

Take a look at utilizing native and in course of mode

Deploying a skilled mannequin to a SageMaker endpoint includes making a SageMaker mannequin and configuring the endpoint. This consists of the inference script, any serialization or deserialization required, the mannequin artifact location in Amazon Easy Storage Service (Amazon S3), the container picture URI, the correct occasion kind and depend, and extra. The machine studying (ML) practitioners must iterate over these settings earlier than lastly deploying the endpoint to SageMaker for inference. The ModelBuilder gives two modes for fast prototyping:

In course of mode – On this case, the inferences are made immediately inside the identical inference course of. That is extremely helpful in shortly testing the inference logic offered by means of InferenceSpec and offers fast suggestions throughout experimentation.
Native mode – The mannequin is deployed and run as a neighborhood container. That is achieved by setting the mode to LOCAL_CONTAINER while you construct the mannequin. That is useful to imitate the identical surroundings because the SageMaker endpoint. Seek advice from the next pocket book for an instance.

The next code is an instance of working inference in course of mode, with a customized InferenceSpec:

from sagemaker.serve.spec.inference_spec import InferenceSpec
from transformers import pipeline
from sagemaker.serve import Mode
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.builder.model_builder import ModelBuilder

worth: str = "Girafatron is obsessive about giraffes, essentially the most superb animal on the face of this Earth. Giraftron believes all different animals are irrelevant when in comparison with the wonderful majesty of the giraffe.nDaniel: Hi there, Girafatron!nGirafatron:"
schema = SchemaBuilder(worth,
            {"generated_text": "Girafatron is obsessive about giraffes, essentially the most superb animal on the face of this Earth. Giraftron believes all different animals are irrelevant when in comparison with the wonderful majesty of the giraffe.nDaniel: Hi there, Girafatron!nGirafatron: Hello, Daniel. I used to be simply desirous about how magnificent giraffes are and the way they need to be worshiped by all.nDaniel: You and I believe alike, Girafatron. I believe all animals needs to be worshipped! However I assume that might be a bit impractical...nGirafatron: That is true. However the giraffe is simply such a tremendous creature and may all the time be revered!nDaniel: Sure! And the best way you go on about giraffes, I might let you know actually love them.nGirafatron: I am obsessive about them, and I am glad to listen to you observed!nDaniel: I'"})

# customized inference spec with hugging face pipeline
class MyInferenceSpec(InferenceSpec):
    def load(self, model_dir: str):
        ...
    def invoke(self, enter, mannequin):
        ...
    def preprocess(self, input_data):
        ...
    def postprocess(self, predictions):
        ...
        
inf_spec = MyInferenceSpec()

# Construct ModelBuilder object in IN_PROCESS mode
builder = ModelBuilder(inference_spec=inf_spec,
                       mode=Mode.IN_PROCESS,
                       schema_builder=schema
                      )
                      
# Construct and deploy the mannequin
mannequin = builder.construct()
predictor=mannequin.deploy()

# make predictions
predictor.predict("How are you immediately?")

As the following steps, you possibly can check it in native container mode as proven within the following code, by including the image_uri. You’ll need to incorporate the model_server argument while you embrace the image_uri.

image_uri = '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04'

builder = ModelBuilder(inference_spec=inf_spec,
                       mode=Mode.LOCAL_CONTAINER,  # you possibly can change it to Mode.SAGEMAKER_ENDPOINT for endpoint deployment
                       schema_builder=schema,
                       image_uri=picture,
                       model_server=ModelServer.TORCHSERVE
                      )

mannequin = builder.construct()                      
predictor = mannequin.deploy()

predictor.predict("How are you immediately?")

Deploy the mannequin

When testing is full, now you can deploy the mannequin to a real-time endpoint for predictions by updating the mode to mode.SAGEMAKER_ENDPOINT and offering an occasion kind and dimension:

sm_predictor = mannequin.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    mode=Mode.SAGEMAKER_ENDPOINT,
    function=execution_role,
)

sm_predictor.predict("How is the climate?")

Along with real-time inference, SageMaker helps serverless inference, asynchronous inference, and batch inference modes for deployment. You too can use InferenceComponents to summary your fashions and assign CPU, GPU, accelerators, and scaling insurance policies per mannequin. To be taught extra, see Cut back mannequin deployment prices by 50% on common utilizing the newest options of Amazon SageMaker.

After you have got the ModelBuilder object, you possibly can deploy to any of those choices just by including the corresponding inference configurations when deploying the mannequin. By default, if the mode will not be offered, the mannequin is deployed to a real-time endpoint. The next are examples of different configurations:

from sagemaker.serverless.serverless_inference_config import ServerlessInferenceConfig
predictor = model_builder.deploy(
    endpoint_name="serverless-endpoint",
    inference_config=ServerlessInferenceConfig(memory_size_in_mb=2048))

from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
from sagemaker.s3_utils import s3_path_join

predictor = model_builder.deploy(
    endpoint_name="async-endpoint",
    inference_config=AsyncInferenceConfig(
        output_path=s3_path_join("s3://", bucket, "async_inference/output")))

from sagemaker.batch_inference.batch_transform_inference_config import BatchTransformInferenceConfig

transformer = model_builder.deploy(
    endpoint_name="batch-transform-job",
    inference_config=BatchTransformInferenceConfig(
        instance_count=1,
        instance_type="ml.m5.massive",
        output_path=s3_path_join("s3://", bucket, "batch_inference/output"),
        test_data_s3_path = s3_test_path
    ))
print(transformer)

Deploy a multi-model endpoint utilizing InferenceComponent:

from sagemaker.compute_resource_requirements.resource_requirements import ResourceRequirements

predictor = model_builder.deploy(
    endpoint_name="multi-model-endpoint",
    inference_config=ResourceRequirements(
        requests={
            "num_cpus": 0.5,
            "reminiscence": 512,
            "copies": 2,
        },
        limits={},
))

Clear up

For those who created any endpoints when following this publish, you’ll incur prices whereas it’s up and working. As greatest follow, delete any endpoints if they’re now not required, both utilizing the AWS Administration Console, or utilizing the next code:

predictor.delete_model() 
predictor.delete_endpoint()

Conclusion

On this two-part sequence, we launched the ModelTrainer and the ModelBuilder enhancements within the SageMaker Python SDK. Each courses goal to scale back the complexity and cognitive overhead for information scientists, offering you with an easy and intuitive interface to coach and deploy fashions, each domestically in your SageMaker notebooks and to distant SageMaker endpoints.

We encourage you to check out the SageMaker SDK enhancements (SageMaker Core, ModelTrainer, and ModelBuilder) by referring to the SDK documentation and pattern notebooks on the GitHub repo, and tell us your suggestions within the feedback!

Concerning the Authors

Durga Sury is a Senior Options Architect on the Amazon SageMaker staff. Over the previous 5 years, she has labored with a number of enterprise clients to arrange a safe, scalable AI/ML platform constructed on SageMaker.

Shweta Singh is a Senior Product Supervisor within the Amazon SageMaker Machine Studying (ML) platform staff at AWS, main SageMaker Python SDK. She has labored in a number of product roles in Amazon for over 5 years. She has a Bachelor of Science diploma in Pc Engineering and a Masters of Science in Monetary Engineering, each from New York College.

Speed up your ML lifecycle utilizing the brand new and improved Amazon SageMaker Python SDK – Half 2: ModelBuilder

Why Retrieval-Augmented Technology Is Nonetheless Related within the Period of Lengthy-Context Language Fashions | by Jérôme DIAZ | Dec, 2024

Agentic AI: Constructing Autonomous Programs from Scratch | by Luís Roque | Dec, 2024

Agentic AI: Constructing Autonomous Programs from Scratch | by Luís Roque | Dec, 2024

Leave a Reply Cancel reply

Popular News

How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

Diffusion Mannequin from Scratch in Pytorch | by Nicholas DiSalvo | Jul, 2024

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

Proton launches ‘Privacy-First’ AI Email Assistant to Compete with Google and Microsoft

Streamlit fairly styled dataframes half 1: utilizing the pandas Styler

About Us

Category

Recent Posts