Utilizing Strands Brokers to create a multi-agent answer with Meta’s Llama 4 and Amazon Bedrock

Multi-agent options, by which networks of brokers collaborate, coordinate, and purpose collectively, are altering how we method real-world challenges. Enterprises handle environments with a number of information sources, altering targets, and numerous constraints. That is the place multi-agent architectures shine. By empowering a number of brokers that every have specialised instruments, reminiscence, or views to work together and purpose as a collective, organizations unlock highly effective new capabilities:

Scalability – Multi-agent frameworks deal with duties of rising complexity, distributing workload intelligently and adapting to scale in actual time.
Resilience – When brokers work collectively, failure in a single might be compensated or mitigated by others, creating strong, fault-tolerant techniques.
Specialization – Particular person brokers excel in particular domains (similar to finance, information transformation, and person assist) but can seamlessly cooperate to resolve cross-disciplinary issues.
Dynamic downside fixing – Multi-agent techniques can quickly reconfigure, pivot, and reply to alter, which is important in unstable enterprise, safety, and operations environments.

Current launches in agentic AI frameworks, similar to Strands Brokers, make it simpler for builders to take part within the creation and deployment of model-driven, multi-agent options. You’ll be able to outline prompts and combine toolsets, permitting strong language fashions to purpose, plan, and invoke instruments autonomously relatively than counting on handcrafted, brittle workflows.

In manufacturing, companies similar to Amazon Bedrock AgentCore assist safe, scalable deployment with options like persistent reminiscence, id integration, and enterprise-grade observability. This shift in the direction of collaborative, multi-agent AI options is revolutionizing software program architectures by making them extra autonomous, resilient, and adaptable. From real-time troubleshooting in cloud infrastructure to cross-team automation in monetary companies and chat-based assistants coordinating advanced multistep enterprise processes, organizations adopting multi-agent options are positioning themselves for better agility and innovation. Now, with open frameworks similar to Strands, anybody can begin constructing clever techniques that suppose, work together, and evolve collectively.

On this submit, we discover tips on how to construct a multi-agent video processing workflow utilizing Strands Brokers, Meta’s Llama 4 fashions, and Amazon Bedrock to mechanically analyze and perceive video content material by way of specialised AI brokers working in coordination. To showcase the answer, we’ll use Amazon SageMaker AI to stroll you thru the code.

Meta’s Llama 4: Unlocking the worth of 1M+ context home windows

Llama 4 is Meta’s newest household of giant language fashions (LLMs) that stands out for its context window capabilities and multimodal intelligence. Each fashions use mixture-of-experts (MoE) structure for effectivity, are designed for multimodal inputs, and are optimized to energy agentic techniques and complicated workflows. The flagship variant, Meta’s Llama 4 Scout, helps a ten million token context window—an industry-first—enabling the mannequin to course of and purpose over giant quantities of information in a single immediate.

This helps functions similar to summarizing complete libraries of books, analyzing large codebases, conducting holistic analysis throughout 1000’s of paperwork, and sustaining deep, persistent dialog context throughout lengthy interactions. The Llama 4 Maverick variant additionally provides a 1 million token window, making it appropriate for demanding language, imaginative and prescient, and cross-document duties. These ultralong context home windows open new potentialities for superior summarization, reminiscence retention, and complicated, multistep workflows, positioning Meta’s Llama 4 as a flexible answer for each analysis and enterprise-grade AI functions

Mannequin identify	Context window	Key capabilities and use circumstances
Meta’s Llama 4 Scout	10M tokens (as much as 3.5M utilizing Amazon Bedrock)	Ultralong doc processing, complete e-book or codebase ingestion, large-scale summarization, in depth dialogue reminiscence, superior analysis
Meta’s Llama 4 Maverick	1M tokens	Giant context multimodal duties, superior doc and picture understanding, code evaluation, complete Q&A, strong summarization

Resolution overview

This submit demonstrates tips on how to construct a multi-agent video processing workflow by utilizing the Strands Brokers SDK, Meta’s Llama 4 with its multimodal capabilities and context window, and the scalable infrastructure of Amazon Bedrock. Though this submit focuses totally on constructing specialised brokers to create this video evaluation answer, the practices of making a multi-agent workflow can be utilized to construct your personal adaptable, automated answer on the enterprise degree.

For scaling, this method extends naturally to deal with bigger and extra numerous workloads, similar to processing video streams from thousands and thousands of related gadgets in sensible cities, industrial automation for predictive upkeep by way of steady video and sensor information evaluation, real-time surveillance techniques throughout a number of areas, or media corporations managing huge libraries for indexing and content material retrieval. Utilizing the Strands Brokers built-in integration with Amazon Net Companies (AWS) companies and the managed AI infrastructure of Amazon Bedrock implies that your multi-agent workflows can elastically scale, distribute duties effectively, and preserve excessive availability and fault tolerance. You’ll be able to construct advanced, multistep workflows throughout heterogeneous information sources and use circumstances—from dwell video analytics to customized media experiences—whereas sustaining the agility to adapt and increase as enterprise wants evolve.

Introduction to agentic workflows utilizing Strands Brokers

This submit demonstrates a video processing answer that implements an agent workflow utilizing six specialised brokers. Every agent performs a particular position, passing its output to the subsequent agent to finish multistep duties within the course of. That is carried out by way of the identical evaluation because the deep analysis structure, in which there’s an orchestrator agent that coordinates the method of the opposite brokers working collectively in tandem. This idea in Strands Brokers known as Brokers as Instruments.

This architectural sample in AI techniques permits for specialised AI brokers to be wrapped as callable features (instruments) that can be utilized by different brokers. This agentic workflow has the next specialised brokers:

Llama4_coordinator_agent – Has entry to the opposite brokers and kicks off the method from body extraction agent to abstract era
s3_frame_extraction_agent – Makes use of OpenCV library to extract significant frames from movies, dealing with the complexity of video file operations
s3_visual_analysis_agent – Has crucial instruments that course of the frames by analyzing every picture and storing it as a JSON file to the offered Amazon Easy Storage Service (Amazon S3) bucket
retrieve_json_agent – Retrieves the evaluation on the frames within the type of a JSON file
c_temporal_analysis_agent – AI agent that focuses on temporal sequences in video frames by analyzing photos chronologically
summary_generation_agent – Makes a speciality of making a abstract of the temporal evaluation of the photographs

Modularizing the video evaluation answer with Brokers as Instruments

The method begins with the orchestrator agent, applied utilizing Meta’s Llama 4, which coordinates communication and job delegation amongst specialised brokers. This central agent initiates and screens every step of the video processing pipeline. Utilizing the Brokers as Instruments sample in Strands Brokers, every specialised agent is wrapped as a callable perform (software), enabling seamless inter-agent communication and modular orchestration. This hierarchical delegation sample permits the coordinator agent to dynamically invoke domain-specific brokers, reflecting how collaborative human groups perform.

Customizability – Every agent’s system immediate might be independently tuned for optimum efficiency in its specialised job.
Separation of issues – Brokers deal with what they do greatest, making the system extra easy to develop and preserve.
Workflow flexibility – The coordinator agent can orchestrate parts in numerous sequences for numerous use circumstances.
Scalability – Parts might be optimized individually primarily based on their particular efficiency necessities.
Extensibility – New capabilities might be added by introducing new specialised brokers with out disrupting current ones.

By turning brokers into instruments, we create constructing blocks that may be mixed to resolve advanced video understanding duties, demonstrating how you need to use Strands Brokers to assist multi-agent techniques with specialised LLM-based reasoning. Let’s study the coordinator_agent:

def new_llama4_coordinator_agent() -> Agent:
    """
    Manufacturing unit constructor: creates a NEW agent occasion with a contemporary dialog historical past.
    Use this per video request for clear isolation.
    """
    return Agent(
        system_prompt="""You're a video processing coordinator. Your job is to course of movies step-by-step.
##When requested to course of a video:
1. Extract frames from S3 video utilizing run_frame_extraction
2. Use the body location from step 1 to run_visual_analysis
3. WAIT for visible evaluation to finish sending the json to s3
4. Use the retrieve_json agent to extract the json from step 3
5. Use the textual content results of retrieve_json_from_s3 by passing it to run_temporal_reasoning
6. Move the end result from temporal reasoning to run_summary_generation
7. Add evaluation generated in run_summary_generation and return s3 location
##IMPORTANT:
- Name ONE software at a time and await the end result
- Use the EXACT end result from the earlier step as enter
- Do NOT name a number of instruments concurrently
- Do NOT return uncooked JSON or perform name syntax
""",
        mannequin=bedrock_model,
        instruments=[
            run_frame_extraction,
            run_visual_analysis,
            run_temporal_reasoning,
            run_summary_generation,
            upload_analysis_results,
            retrieve_json_from_s3,
        ],
    )

Calling the coordinator_agent triggers the agent workflow to name the s3_frame_extraction_agent. This specialised agent has the mandatory instruments to extract key frames from the enter video utilizing OpenCV, add the frames to Amazon S3, and determine the folder path to move off to the run_visual_analysis agent. The next diagram reveals this move.

After the frames are saved in Amazon S3, the visual_analysis_agent could have entry to instruments that checklist the frames from the S3 folder, use Meta’s Llama in Amazon Bedrock to course of the photographs, and add the evaluation as a JSON file to Amazon S3.

The code beneath will stroll you thru the completely different key components of the completely different brokers. The next instance reveals the visual_analysis_agent:

@software
def upload_local_json_to_s3(s3_video_path: str, local_filename: str = "visual_analysis_results.json") -> str:
    """Add native JSON file to S3 bucket in video folder"""
    attempt:
        s3_parts = [part for part in s3_video_path.replace('s3://', '').split('/') if part bucket = s3_parts[0]
        video_folder = s3_parts[-1]
        
        if '_' in video_folder:
            base_video_name = video_folder.cut up('_')[0]
        else:
            base_video_name = video_folder
        random_num = randint(1000, 9999)
        
        s3_key = f"movies/{base_video_name}/{random_num}_{local_filename}"
        
        s3_client = boto3.shopper('s3')
		s3_client.upload_file(local_filename, bucket, s3_key)
        
        return f"s3://{bucket}/{s3_key}"
    besides Exception as e:
        return f"Error importing file: {str(e)}"

s_visual_analysis_agent = Agent(
    system_prompt="""You might be a picture evaluation agent that processes frames from S3 buckets.

Your workflow:
1. Use the out there instruments to investigate photos
2. Use the video path folder to put the evaluation outcomes

IMPORTANT:
- Do NOT generate, write, or return any code
- Concentrate on describing what you see within the photos
- Photos are mechanically resized if too giant
- Put numbered labels in entrance of every picture description (e.g., "1. ", "2. ", and so on.)
- At all times save evaluation outcomes regionally first, then add to S3

Return Format:
The uri from the upload_local_json_to_s3 software""",
    mannequin=bedrock_model,
    callback_handler=None,

    instruments=[list_s3_frames, analyze_image, analyze_all_frames, analyze_frames_batch, upload_local_json_to_s3],
)

After importing the JSON to Amazon S3, there’s a specialised agent that retrieves the JSON file from Amazon S3 and analyzes the textual content:

@software
def process_s3_analysis_json(s3_uri: str) -> str:
    """Retrieve JSON from S3 and extract solely the evaluation textual content"""
    attempt:
        # Parse S3 URI and obtain JSON
        s3_parts = s3_uri.substitute('s3://', '').cut up('/', 1)
        bucket = s3_parts[0]
        key = s3_parts[1]
       
        s3_client = boto3.shopper('s3')
        response = s3_client.get_object(Bucket=bucket, Key=key)
        json_content = response['Body'].learn().decode('utf-8')
        
        # Parse and extract textual content
        information = json.hundreds(json_content)
        
        # Deal with each codecs
        if 'analyses' in information:
            analyses = information['analyses']
        elif 'classes' in information:
            analyses = [session['data'] for session in information['sessions'] if 'information' in session]
		else:
            return "Error: No 'analyses' or 'classes' discipline discovered"
        
        # Extract textual content solely
        text_only = []
        for evaluation in analyses:
            if 'evaluation' in evaluation:
                textual content = evaluation['analysis']
                if not textual content.startswith("Failed:"):
                    text_only.append(textual content)
        
        # Clear up native file
        local_file = "visual_analysis_results.json"
		if os.path.exists(local_file):
            os.take away(local_file)
        
        return "n".be part of(text_only)
    besides Exception as e:
        return f"Error processing {s3_uri}: {str(e)}"


bedrock_model = BedrockModel(
    model_id='us.meta.llama4-maverick-17b-instruct-v1:0',
    region_name=area,
    streaming=False,
    temperature=0
)  

retrieve_json_agent = Agent(
system_prompt="Name process_s3_analysis_json with the S3 URI. Your response should be the precise textual content output from the software, nothing else.",
    mannequin=bedrock_model,
    callback_handler=None,

    instruments=[process_s3_analysis_json],
)

This output will then be fed to

the temporal_analysis_agent to achieve temporal consciousness of the sequences within the video frames and supply an in depth description of the visible content material.

After the temporal evaluation output has been generated, the summary_generation_agent will likely be kicked off to supply the ultimate abstract.

Prerequisite and Setup Steps

To run the answer on both the pocket book or the Gradio UI, you want the next:

An AWS account with entry to Amazon Bedrock.

To repeat over the mission,

Clone the Meta-LLama-on-AWS github repository:

git clone https://github.com/aws-samples/Meta-Llama-on-AWS.git
cd brokers/strands/Bedrock/multi-agent-video-processing/

In your terminal, set up the right dependencies:

pip set up -r necessities.txt

Deploy video processing app on Gradio

To deploy the video processing app on Gradio, comply with these utility launch directions:

To launch the Python terminal, open your Python3 command line interface
To put in dependencies, execute pip set up instructions for the required libraries (seek advice from the previous library set up part)
To execute the appliance, run the command python3 gradio_app.py
To entry the interface, select the generated hosted hyperlink displayed within the terminal
To provoke video processing, add your video file by way of the interface after which select Run

The Meta’s Llama video evaluation assistant offers the next output for the video buglifeflik.mp4 offered within the GitHub repository:

Llama Video Evaluation Log
Flik is proven decided in entrance of a tree.
He interacts with different bugs.
Flik gathers gadgets and constructs a tool.
He presents the invention to a bunch of bugs.
The group reacts withskepticism.
Flik is chased by a bunch of birds. 

Key visible parts:
The important thing visible parts embody Flik’s decided expression, his interplay with different bugs, the gadgets he gathers, the advanced machine he constructs, the group’s skeptical response, and the chaotic scene of Flik being chased by birds.
Total Narrative:
The narrative follows Flik’s journey as he prepares and presents an invention, faces rejection, and experiences a dramatic consequence. The story is character-driven, showcasing Flik’s actions and their outcomes, and builds as much as a climactic occasion.

The next screenshot reveals the Gradio UI with this output.

Working within the Jupyter Pocket book

After the mandatory libraries are imported, it is advisable manually add your video to your S3 bucket:

def upload_to_sagemaker_bucket(local_video_path, base_folder="movies/"):
    sagemaker = boto3.shopper('sagemaker')
    s3 = boto3.shopper('s3')

    # Get default SageMaker bucket
    account_id = boto3.shopper('sts').get_caller_identity()['Account']
    area = boto3.Session().region_name
    bucket_name = f"sagemaker-{area}-{account_id}"
    # Get filename and create subfolder identify
    filename = os.path.basename(local_video_path)
    filename_without_ext = os.path.splitext(filename)[0]
    # Create the complete S3 path: movies/filename_without_ext/filename
    s3_key = os.path.be part of(base_folder, filename_without_ext, filename)
    # Add file
    s3.upload_file(local_video_path, bucket_name, s3_key)  
    s3_uri = f"s3://{bucket_name}/{s3_key}"
    print(f"Uploaded to {s3_uri}")  

    s3_folder_path = os.path.be part of(base_folder, filename_without_ext)
    s3_folder_uri = f"s3://{bucket_name}/{s3_folder_path}"

    return s3_folder_uri

# Instance utilization: Provide your native video path right here

s3_video_uri = upload_to_sagemaker_bucket(local_video_path)

After the video is uploaded, you can begin the agent workflow by instantiating a brand new agent with contemporary dialog historical past:

# Begin the workflow
agent = new_llama4_coordinator_agent()
video_instruction = f"Course of a video from {s3_video_uri}. Use instruments on this order: run_frame_extraction, run_visual_analysis, retrieve_json_from_s3, run temporal_reasoning, run_summary_generation_ upload_analysis_results"
response = agent(video_instruction)
print(response)

Device #1: run_frame_extraction

Device #2: run_visual_analysis

Device #3: retrieve_json_from_s3

Device #4: run_temporal_reasoning

Device #5: run_summary_generation

Device #6: run_summary_generation
**What occurs within the video:**
The video follows Flik as he navigates by way of a collection of occasions, ranging from being cautious in a pure setting, in search of assist or speaking with different bugs, collaborating in a vital dialogue or planning, and eventually taking motion with the group.

**Chronological Sequence of Occasions:**
The sequence begins with Flik being cautious close to a tree, adopted by him approaching a bunch of bugs, then being a part of a major gathering or dialogue, and concludes with Flik and the bugs taking motion collectively.

**Sequence of occasions:**
1. Flik is initially seen being cautious in a pure surroundings.
2. He then approaches a bunch of bugs, prone to talk or search assist.
3. A gathering of bugs is proven with Flik on the middle, indicating a vital dialogue or planning.
4. The ultimate scene reveals Flik and the bugs in motion, presumably executing a plan or going through a problem.

**Key visible parts:**
The important thing visible parts embody Flik's cautious preliminary stance, his interplay with different bugs, the gathering or dialogue, and the ultimate motion scene, highlighting the development from solitude to collective motion.

**Total Narrative:**
The narrative follows Flik's journey from warning and in search of assist to collaborating in a vital dialogue and eventually to taking motion with a bunch of bugs, suggesting a narrative arc that includes development, planning, and collective motion.
Device #7: upload_analysis_results
The video processing is complete. The ultimate evaluation outcomes are saved to s3://sagemaker-us-west-2-333633606362/movies/buglifeflik/analysis_results_20250818_190012.json.The video processing is full. The ultimate evaluation outcomes are saved to s3://sagemaker-us-west-2-333633606362/movies/buglifeflik/analysis_results_20250818_190012.json.

Cleanup

To keep away from incurring pointless future expenses, clear up the sources you created as a part of this answer:To delete the Amazon S3 information:

Open the AWS Administration Console
Navigate to Amazon S3
Discover and choose your Amazon SageMaker bucket
Choose the video information you uploaded
Select Delete and ensure

To cease and take away the SageMaker pocket book:

Go to Amazon SageMaker AI within the AWS Administration Console
Select Pocket book situations
Choose your pocket book
Select Cease if it’s operating
After it has stopped, select Delete

Conclusion

This submit highlights how combining the Strands Brokers SDK with Meta’s Llama 4 fashions and Amazon Bedrock infrastructure allows constructing superior, multi-agent video processing workflows. By utilizing extremely specialised brokers that talk and collaborate by way of the Brokers as Instruments sample, builders can modularize advanced duties similar to body extraction, visible evaluation, temporal reasoning, and summarization. This separation of issues enhances maintainability, customization, and scalability whereas permitting seamless integration throughout AWS companies.We encourage builders to discover and prolong this structure by including new specialised brokers and adapting workflows to numerous use circumstances—from sensible cities and industrial automation to media content material administration. The mixture of Strands Brokers, Meta’s Llama 4, and Amazon Bedrock lays a strong basis for creating autonomous, resilient AI options that sort out the complexity of contemporary enterprise environments.

To get began, go to the official GitHub repository for the Meta-Llama-on-AWS brokers mission for code examples and deployment directions. For additional insights on constructing with Strands Brokers, discover the Strands Brokers documentation, which provides a code-first method to integrating modular AI brokers. For broader context on multi-agent AI architectures and orchestration, AWS weblog posts on agent interoperability and autonomous agent frameworks present helpful steering shaping the way forward for clever techniques.

Concerning the authors

Sebastian Bustillo is an Enterprise Options Architect at Amazon Net Companies (AWS), working with airways and is an energetic member of the AI/ML Technical Area Group. At AWS, he helps prospects unlock enterprise worth by way of AI. Exterior of labor, he enjoys spending time along with his household and exploring the outside. He’s additionally keen about brewing specialty coffees.

Utilizing Strands Brokers to create a multi-agent answer with Meta’s Llama 4 and Amazon Bedrock

Tips on how to Apply Agentic Coding to Clear up Issues

Distributed Reinforcement Studying for Scalable Excessive-Efficiency Coverage Optimization

Distributed Reinforcement Studying for Scalable Excessive-Efficiency Coverage Optimization

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

Optimizing Mixtral 8x7B on Amazon SageMaker with AWS Inferentia2

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

The Good-Sufficient Fact | In direction of Knowledge Science

About Us

Category

Recent Posts