Many firms have massive volumes of paper or digital paperwork that comprise untapped enterprise intelligence. With the development of generative AI, varied massive language fashions can be utilized to precisely extract related information from these paperwork. This submit demonstrates an clever doc processing pipeline that consists of each on-demand inference and batch inference choices on Amazon Bedrock to allow the pliability on the doc processing time and value. For time-sensitive requests, one can use the on-demand inference possibility, whereas the batch inference possibility is most price optimized. It additionally explains the way to dynamically specify the big language mannequin and prompts on the doc stage, enabling you to extract information from a number of forms of paperwork utilizing the identical pipelines.
Answer overview
When you, like one in every of our clients, have tons of of tens of millions land lease paperwork in scanned PDF format (PDF that accommodates solely photos with out editable textual content, e.g. on this case, scanned land lease saved as PDF) within the backlog, and new paperwork are nonetheless piling up each day, this can be a answer you should use to successfully extract information from these paperwork. As proven within the following diagram, this answer builds two inference pipelines, on-demand and batch, with a mechanism to invoke them dynamically. Through the use of successfully designed prompts managed in Amazon Bedrock Immediate Administration, the info could be extracted and standardized from scan PDFs, which frequently have various codecs and conventions, or from textual content information.

The pipeline on the left is the on-demand pipeline that extracts information from paperwork one-by-one, returning outcomes inside seconds. This makes it appropriate for time-sensitive requests.
The pipeline on the appropriate is the batch inference pipeline that processes a number of doc requests in a single Amazon Bedrock batch inference job, the place your mannequin invocation will likely be processed asynchronously. Customers can specify the immediate ID and model within the request in each pipelines, and the corresponding immediate textual content will likely be retrieved from Amazon Bedrock Immediate Administration.
The next sections present detailed descriptions of each pipelines.
1. On-demand inference pipeline
An AWS SQS First-In, First-Out (FIFO) queue is created within the on-demand inference pipeline. When a queue message containing the doc ID, LLM mannequin ID, immediate ID/model, and system immediate ID/model arrives, it triggers an AWS Lambda operate. This operate retrieves the PDF doc from the required Amazon S3 bucket, converts the PDF pages to PNG photos, retrieves the related prompts from Amazon Bedrock Immediate Administration, composes the message to name the LLM, and saves the outcome into an Amazon DynamoDB desk.
1.1. AWS SQS FIFO queue
An AWS SQS FIFO queue is used to set off Amazon Bedrock inference when a single doc arrives. The important thing causes for utilizing a FIFO queue are:
- Dependable Message Supply – Makes certain that every message is delivered precisely as soon as.
- First-In, First-Out (FIFO) Processing – Maintains a strict ordering, offering higher predictability for processing.
- Message Grouping – The Message Group ID attribute makes certain the messages are processed so as inside every group. Every producer can use a singular Message Group ID to keep up order for associated messages.
How is a queue message created?
The queue messages could be created externally with AWS CLI or AWS SDK API. The next is an AWS CLI command instance:
The file message_txt.txt on this instance is a JSON file containing the message attributes wanted for the applying. See particulars within the Testing the pipelines part under.
The Lambda operate will delete the queue message after Amazon Bedrock has returned the extracted information.
1.2. Lambda operate – queue message processing and inferencing
1.2.1 Retrieving the paperwork, changing to pictures, and splitting massive information
The Lambda operate downloads the doc utilizing the s3_location attribute within the queue message. If the doc is scanned PDF, it’s then transformed to pictures for the multimodal mannequin to know.
As of this writing, the Claude 4 Sonnet mannequin solely permits a most of 20 photos per multimodal invocation. Due to this fact, if a doc accommodates greater than 20 pages of photos, it should be cut up into chunks of 20 pages. The doc_id, chunk_count and chunk_id are saved in an Amazon DynamoDB desk, together with the extracted outcomes and the mannequin efficiency metrics.
doc_id: the identifier of the docchunk_count: the full variety of chunks for that docchunk_id: the identifier of every chunk of the doc
1.2.2. Retrieving prompts from Amazon Bedrock Immediate Administration
Land lease paperwork range in format – some current land tract attributes in numbered listing, others in tables, and a few even in land drawings. Therefore, utilizing completely different prompts tailor-made to every doc format enhances extraction accuracy.
The prompts used within the LLM name are saved in Amazon Bedrock Immediate Administration. Every immediate has a singular ID and is versioned. The SQS messages should specify the related immediate ID and model, that are then used to retrieve the immediate physique throughout Lambda execution.
Word: There’s a service restrict of fifty prompts per area and 10 variations per immediate.
1.2.3 Composing message for LLM calls and processing the response
The Lambda operate continues with the next steps:
- Compose the messages for LLM by concatenating the immediate physique and pictures.
- Ship request(s) to Amazon Bedrock utilizing the Converse API.
The LLM will return the extract information in a JSON string, you’ll be able to look at the end in your DynamoDB desk as illustrated within the following testing the pipelines part.
1.2.4 Saving the outcomes
Lastly, the Lambda operate completes the method by:
- Parsing the JSON and storing the land tract attributes to the DynamoDB desk.
- If the doc has been efficiently processed and the outcomes are saved, the SQS message is deleted from the queue.
2. Batch inference pipeline
A normal AWS SQS queue is used for the batch inference pipeline due to its excessive throughput. The queue messages are created in the same method as within the on-demand pipeline, besides the message-group-id attribute will not be required.
The principle elements within the batch inference pipeline consists of:
- Amazon EventBridge Scheduler.
- Batch Inference AWS Lambda operate to pre-process the scanned PDFs, create JSONL information and submit the batch inference job.
- Amazon EventBridge rule.
- Submit-processing AWS Lambda operate.
The next sections describe the main points of the batch inference pipeline.
2.1. Amazon EventBridge scheduler
An Amazon EventBridge Scheduler begins the batch inference Lambda operate on a schedule.
2.2. Batch inference Lambda operate
The operate first checks if there are sufficient messages within the queue earlier than continuing. On the time of writing, there’s a minimal variety of data of 100 for Amazon Bedrock batch inference job.
2.2.1 Receiving queue messages
The Lambda operate loops by means of the messages within the queue and extracts the doc ID, LLM mannequin ID, immediate ID/model, and system promptID/model.
2.2.2 Retrieving the paperwork with out duplicates, changing to picture, and splitting massive information
The Lambda operate then retrieves the paperwork, converts them to pictures if they’re scanned PDF, and splits the big information if essential – simply as within the on-demand pipeline. As a result of the usual SQS queues don’t assure exactly-once message supply, the operate additionally makes certain that duplicate messages are ignored.
2.2.3 Permitting completely different prompts in a batch inference job
Just like the on-demand pipeline, completely different doc codecs require completely different consumer prompts for simpler information extraction.
The meant immediate ID and model for every doc are specified within the SQS messages. Throughout Lambda execution, the operate retrieves the immediate physique from Amazon Bedrock Immediate Administration.
2.2.4 Creating JSONL artifacts for batch inference job
The Lambda operate then handles the next duties:
- Making a
metadata.jsonwithin the Batch Inference Information S3 bucket to retailer the message attributes, together with the SQS message ID,doc_id, immediate ID/model, system immediate ID/model, and different project-related attributes. This file is later utilized by the Submit-Processing Lambda to populate the DynamoDB desk. - Processing the paperwork to create the JSONL information required for the Amazon Bedrock batch inference job. This course of is parallelized utilizing Python’s multiprocessing module for effectivity. The JSONL information are uploaded to the Batch Inference Information S3 bucket.
- Deleting the SQS messages after the paperwork have been ready and uploaded to the S3 bucket. This requires setting a big Visibility Timeout for the queue.
2.2.5 Composing messages and submits batch inference job
Lastly, the batch inference Lambda operate creates the Amazon Bedrock batch inference job utilizing the JSONL artifacts from the earlier step. Word that every batch job can solely course of paperwork utilizing one mannequin, which means the SQS messages inside the identical batch job should specify the identical mannequin ID. If there are a couple of mannequin ID specified within the incoming messages, the Lambda operate makes use of a polling mechanism that selects probably the most incessantly specified mannequin ID to make use of.
2.3. The Amazon Bedrock batch inference job
When Amazon Bedrock receives the batch inference job, it locations it in a queue. As soon as the job begins, it proceeds with the next steps.
2.3.1 Retrieving JSONL artifacts for batch inference job
Amazon Bedrock retrieves the JSONL artifacts specified throughout job creation.
2.3.2 Storing batch inference outputs
Upon completion, Amazon Bedrock shops the outputs to the Batch Inference Information S3 bucket, which can be specified within the job creation.
2.3.3 Notifying Amazon EventBridge
After job completion, Amazon Bedrock sends a job standing change occasion to Amazon EventBridge, which is captured by an EventBridge rule.
2.4. Amazon EventBridge rule triggers the post-inference Lambda operate
The EventBridge rule triggers the post-processing Lambda operate to deal with additional mannequin output processing.
2.5. Submit-processing Lambda operate
2.5.1 Retrieving the output JSONL
The Lambda operate fetches the inference output JSONL from the batch inference information S3 bucket.
2.5.2 Saving the inference output
The operate parses the JSONL information and saves the extracted land tract attributes to a DynamoDB desk.
Stipulations
If you wish to do this instance your self, be sure you meet these stipulations:
- An AWS account with entry to the AWS Administration Console
- Acceptable IAM permissions to create and handle CloudFormation stacks, which generally embrace:
- cloudformation:CreateStack
- cloudformation:DescribeStacks
- cloudformation:UpdateStack
- cloudformation:DeleteStack
Deploying the CloudFormation stacks
Deploy the on-demand pipeline:
While you select the Launch Stack hyperlink, you’ll be taken to AWS CloudFormation to launch the CloudFormation stack:
- On the Create stack web page, select Subsequent
- On the Specify stack particulars web page, select Subsequent
- On the Configure stack choices web page, select Subsequent
- On the Assessment and create web page, choose I acknowledge that AWS CloudFormation would possibly create IAM sources
- Select Submit
After it’s submitted, you’ll be able to observe some particulars in regards to the stack akin to Stack information, Occasions, Useful resource, and extra. The next screenshot is the Occasions in your reference:

You can too deploy the batch pipeline following the identical steps.
Testing the pipelines
The next steps information you to check the on-demand pipeline. The batch pipeline can be examined in the same steps in case you have at lease 100 paperwork.
- Obtain the info to your native setting. There are three land paperwork from Winkler County, Andrews County, and Sutton County which might be bought from the Texas Land Data and County Data web site.
- Add downloaded PDF file(s) to the S3 artifact bucket ondemand-data-pipeline-bucket-${account_id} that’s created in CloudFormation stack.
- Create a textual content file message_txt.json utilizing the next instance by changing the immediate ID, system immediate ID and S3 bucket which might be created out of your CloudFormation stack.
- Create a shell script send2queue.sh through the use of the above AWS CLI instance by changing the queue identify in and execute it. You will notice a message to your SQS queue ondemand-data-pipeline-queue.fifo.
- The queue message will set off the Lambda operate ondemand-data-pipeline-queue-processor.
- Look at the Lambda log in Amazon CloudWatch, the log group is /aws/lambda/ondemand-data-pipeline-queue-processor.
- Look at the Amazon Bedrock inference output within the DynamoDB ondemand-data-pipeline-table desk. The JSON outcome within the
model_responsecolumn for the Winkler County instance ought to seem like the next:
Cleanup
To scrub up the sources:
- Register to the AWS Administration Console
- Navigate to the CloudFormation service
- Within the CloudFormation dashboard, discover and choose the stack you wish to delete
- Select the “Delete” button on the high of the web page
- Affirm the deletion when prompted
CloudFormation will mechanically delete the sources that had been created as a part of the stack within the appropriate order, dealing with dependencies appropriately.
Deleting the CloudFormation stacks doesn’t delete the S3 buckets and the DynamoDB as a result of their deletion coverage is ready to retain to assist stop information loss. To delete these sources, go to every service’s web page within the AWS Administration Console and delete them.
Conclusion
The on-demand and batch Amazon Bedrock inference pipelines offered on this submit clarify how one can dynamically course of paperwork based mostly on the time sensitivity and information quantity. You also needs to contemplate the associated fee info when deciding which pipeline to make use of. With the batch pipeline, as present in our checks, the price of Amazon Bedrock is 50% decrease in comparison with on-demand pipeline.
One other key characteristic on this answer is the power to specify the big language mannequin (for on-demand pipeline) and immediate on the particular person doc stage, enabling these pipelines to assist varied forms of clever doc processing.
With parallelism enabled utilizing the Python’s multiprocessing module, each Lambda features of the batch inference pipeline can course of 1,000 paperwork inside quarter-hour.
Name to motion
Amazon Bedrock can allow you to construct many generative AI functions. We advocate following the fast begin within the following GitHub repo and familiarizing your self with constructing generative AI functions. For superior readers, you’ll be able to look into the way to scale the answer additional. One concept is to run the Lambda code in AWS Batch as a substitute, permitting tens of 1000’s of paperwork to be processed in a single Amazon Bedrock batch inference job.
In regards to the authors

