Generative AI is revolutionizing enterprise automation, enabling AI methods to know context, make selections, and act independently. Generative AI basis fashions (FMs), with their potential to know context and make selections, have gotten highly effective companions in fixing refined enterprise issues. At AWS, we’re utilizing the facility of fashions in Amazon Bedrock to drive automation of advanced processes which have historically been difficult to streamline.
On this publish, we concentrate on one such advanced workflow: doc processing. This serves for example of how generative AI can streamline operations that contain various knowledge varieties and codecs.
Challenges with doc processing
Doc processing typically entails dealing with three major classes of paperwork:
- Structured – For instance, kinds with fastened fields
- Semi-structured – Paperwork which have a predictable set of data however may range in format or presentation
- Unstructured – For instance, paragraphs of textual content or notes
Historically, processing these different doc varieties has been a ache level for a lot of organizations. Rule-based methods or specialised machine studying (ML) fashions typically wrestle with the variability of real-world paperwork, particularly when coping with semi-structured and unstructured knowledge.
We show how generative AI together with exterior software use provides a extra versatile and adaptable answer to this problem. By way of a sensible use case of processing a affected person well being package deal at a physician’s workplace, you will notice how this know-how can extract and synthesize data from all three doc varieties, probably enhancing knowledge accuracy and operational effectivity.
Answer overview
This clever doc processing answer makes use of Amazon Bedrock FMs to orchestrate a complicated workflow for dealing with multi-page healthcare paperwork with blended content material varieties. The answer makes use of the FM’s software use capabilities, accessed by the Amazon Bedrock Converse API. This permits the FMs to not simply course of textual content, however to actively have interaction with varied exterior instruments and APIs to carry out advanced doc evaluation duties.
The answer employs a strategic multi-model strategy, optimizing for each efficiency and price by choosing probably the most acceptable mannequin for every activity:
-
Anthropic’s Claude 3 Haiku – Serves because the workflow orchestrator resulting from its low latency and cost-effectiveness. This mannequin’s sturdy reasoning and power use skills make it preferrred for the next:
-
Coordinating the general doc processing pipeline
-
Making routing selections for various doc varieties
-
Invoking acceptable processing capabilities
-
Managing the workflow state
-
-
Anthropic’s Claude 3.5 Sonnet (v2) – Used for its superior reasoning capabilities, notably sturdy visible processing skills, notably excelling at decoding charts and graphs. Its key strengths embrace:
-
Deciphering advanced doc layouts and construction
-
Extracting textual content from tables and kinds
-
Processing medical charts and handwritten notes
-
Changing unstructured visible data into structured knowledge
-
By way of the Amazon Bedrock Converse API’s standardized software use (operate calling) interface, these fashions can work collectively seamlessly to invoke doc processing capabilities, name exterior APIs for knowledge validation, set off storage operations, and execute content material transformation duties. The API serves as the muse for this clever workflow, offering a unified interface for mannequin communication whereas sustaining dialog state all through the processing pipeline. The API’s standardized strategy to software definition and performance calling supplies constant interplay patterns throughout totally different processing phases. For extra particulars on how software use works, seek advice from The whole software use workflow.
The answer incorporates Amazon Bedrock Guardrails to implement strong content material filtering insurance policies and delicate data detection, ensuring that private well being data (PHI) and personally identifiable data (PII) knowledge is appropriately protected by automated detection and masking capabilities whereas sustaining business normal compliance all through the doc processing workflow.
Stipulations
You want the next conditions earlier than you may proceed with this answer. For this publish, we use the us-west-2
AWS Area. For particulars on obtainable Areas, see Amazon Bedrock endpoints and quotas.
Use case and dataset
For our instance use case, we study a affected person consumption course of at a healthcare establishment. The workflow processes a affected person well being data package deal containing three distinct doc varieties:
- Structured doc – A brand new affected person consumption type with standardized fields for private data, medical historical past, and present signs. This type follows a constant format with clearly outlined fields and verify bins, making it a super instance of a structured doc.
- Semi-structured doc – A medical health insurance card that incorporates important protection data. Though insurance coverage playing cards typically include related data (coverage quantity, group ID, protection dates), they arrive from totally different suppliers with various layouts and codecs, exhibiting the semi-structured nature of those paperwork.
- Unstructured doc – A handwritten physician’s be aware from an preliminary session, containing free-form observations, preliminary diagnoses, and therapy suggestions. This represents probably the most difficult class of unstructured paperwork, the place data isn’t confined to any predetermined format or construction.
The instance doc may be downloaded from the next GitHub repo.
This healthcare use case is especially related as a result of it encompasses widespread challenges in doc processing: the necessity for top accuracy, compliance with healthcare knowledge privateness necessities, and the power to deal with a number of doc codecs inside a single workflow. The number of paperwork on this affected person package deal demonstrates how a contemporary clever doc processing answer should be versatile sufficient to deal with totally different ranges of doc construction whereas sustaining consistency and accuracy in knowledge extraction.
The next diagram illustrates the answer workflow.
This self-orchestrated workflow demonstrates how trendy generative AI options can steadiness functionality, efficiency, and cost-effectiveness in reworking conventional doc processing workflows in healthcare settings.
Deploy the answer
- Create an Amazon SageMaker area. For directions, see Use fast setup for Amazon SageMaker AI.
- Launch SageMaker Studio, then create and launch a JupyterLab area. For directions, see Create an area.
- Create a guardrail. Concentrate on including delicate data filters that might masks PII or PHI.
-
Clone the code from the GitHub repository:
git clone https://github.com/aws-samples/anthropic-on-aws.git
-
Change the listing to the foundation of the cloned repository:
cd medical-idp
-
Set up dependencies:
pip set up -r necessities.txt
-
Replace setup.sh with the guardrail ID you created in Step 3. Then set the ENV variable:
supply setup.sh
-
Lastly, begin the Streamlit utility:
streamlit run streamlit_app.py
Now you’re able to discover the clever doc processing workflow utilizing Amazon Bedrock.
Technical implementation
The answer is constructed across the Amazon Bedrock Converse API and power use framework, with Anthropic’s Claude 3 Haiku serving as the first orchestrator. When a doc is uploaded by the Streamlit interface, Haiku analyzes the request and determines the sequence of instruments wanted by consulting the software definitions in ToolConfig
. These definitions embrace instruments for the next:
- Doc processing pipeline – Handles preliminary PDF processing and classification
- Doc notes processing – Extracts data from medical notes
- New affected person data processing – Processes affected person consumption kinds
- Insurance coverage type processing – Handles insurance coverage card data
The next code is an instance software definition for extracting session notes. Right here, extract_consultation_notes
represents the title of the operate that the orchestration workflow will name, and document_paths
defines the schema of the enter parameter that will likely be handed to the operate. The FM will contextually extract the knowledge from the doc and go to the tactic. An identical toolspec
will likely be outlined for every step. Check with the GitHub repo for the total toolspec
definition.
{
"toolSpec": {
"title": "extract_consultation_notes",
"description": "Extract diagnostics data from a physician's session notes. Together with the extraction embrace the total transcript in a node",
"inputSchema": {
"json": {
"sort": "object",
"properties": {
"document_paths": {
"sort": "array",
"gadgets": {"sort": "string"},
"description": "Paths to the recordsdata that have been categorised as DOC_NOTES"
}
},
"required": ["document_paths"]
}
}
}
}
When a PDF doc is uploaded by the Streamlit interface, it’s quickly saved and handed to the FileProcessor class together with the software specification and a consumer immediate:
immediate = ("1. Extract 2. save and three. summarize the knowledge from the affected person data package deal situated at " + tmp_file + ". " +
"The package deal may include varied forms of paperwork together with insurance coverage playing cards. Extract and save data from all paperwork supplied. "
"Carry out any preprocessing or classification of the file supplied previous to the extraction." +
"Set the enable_guardrails parameter to " + str(enable_guardrails) + ". " +
"On the finish, record all of the instruments that you just had entry to. Give an explantion on why every software was used and if you're not utilizing a software, clarify why it was not used as nicely" +
"Assume step-by-step.")
processor.process_file(immediate=immediate,
toolspecs=toolspecs,
...
The BedrockUtils
class manages the dialog with Anthropic’s Claude 3 Haiku by the Amazon Bedrock Converse API. It maintains the dialog state and handles the software use workflow:
# From bedrockutility.py
def invoke_bedrock(self, message_list, system_message=[], tool_list=[],
temperature=0, maxTokens=2048, guardrail_config=None):
response = self.bedrock.converse(
modelId=self.model_id,
messages=message_list,
system=system_message,
inferenceConfig={
"maxTokens": maxTokens,
"temperature": temperature
},
**({"toolConfig": {"instruments": tool_list}} if tool_list else {})
)
When the processor receives a doc, it initiates a dialog loop with Anthropic’s Claude 3 Haiku, which analyzes the doc and determines which instruments to make use of based mostly on the content material. The mannequin acts as an clever orchestrator, making selections concerning the following:
- Which doc processing instruments to invoke
- The sequence of processing steps
- How you can deal with totally different doc varieties inside the identical package deal
- When to summarize and full the processing
This orchestration is managed by a steady dialog loop that processes software requests and their outcomes till all the doc package deal has been processed.
The primary key choice within the workflow is initiating the doc classification course of. By way of the DocumentClassifier
class, the answer makes use of Anthropic’s Claude 3.5 Sonnet to research and categorize every web page of the uploaded doc into three major varieties: consumption kinds, insurance coverage playing cards, and physician’s notes:
# from document_classifier.py
class DocumentClassifier:
def __init__(self, file_handler):
self.sonnet_3_5_bedrock_utils = BedrockUtils(
model_id=ModelIDs.anthropic_claude_3_5_sonnet
)
def categorize_document(self, file_paths):
# Convert paperwork to binary format for mannequin processing
binary_data_array = []
for file_path in file_paths:
binary_data, media_type = self.file_handler.get_binary_for_file(file_path)
binary_data_array.append((binary_data[0], media_type))
# Put together message for classification
message_content = [
{"image": {"format": media_type, "source": {"bytes": data}}}
for data, media_type in binary_data_array
]
# Create classification request
message_list = [{
"role": 'user',
"content": [
*message_content,
{"text": "What types of document is in this image?"}
]
}]
# Outline system message for classification
system_message = [{
"text": '''You are a medical document processing agent.
Categorize images as: INTAKE_FORM, INSURANCE_CARD, or DOC_NOTES'''
}]
# Get classification from mannequin
response = self.sonnet_3_5_bedrock_utils.invoke_bedrock(
message_list=message_list,
system_message=system_message
)
return [response['output']['message']]
Based mostly on the classification outcomes, the FM determines the subsequent software to be invoked. The software’s description and enter schema outline precisely what data must be extracted. Following the earlier instance, let’s assume the subsequent web page to be processed is a session be aware. The workflow will invoke the extract_consultation_notes
operate. This operate processes paperwork to extract detailed medical data. Just like the classification course of mentioned earlier, it first converts the paperwork to binary format appropriate for mannequin processing. The important thing to correct extraction lies in how the photographs and system message are mixed:
def extract_info(self, file_paths):
# Convert paperwork to binary knowledge
# It will comply with the identical sample to as within the classification operate
message_content = [
{"image": {"format": media_type, "source": {"bytes": data}}}
for data, media_type in binary_data_array
]
message_list = [{
"role": 'user',
"content": [
*message_content, # Include the processed document images
{"text": '''Extract all information from this file
If you find a visualization
- Provide a detailed description in natural language
- Use domain specific language for the description
'''}
]
}]
system_message = [{
"text": '''You are a medical consultation agent with expertise in diagnosing and treating various health conditions.
You have a deep understanding of human anatomy, physiology, and medical knowledge across different specialties.
During the consultation, you review the patient's medical records, test results, and documentation provided.
You analyze this information objectively and make associations between the data and potential diagnoses.
Associate a confidence score to each extracted information. This should reflect how confident the model in the extracted value matched the requested entity.
'''}
]
response = self.bedrock_utils.invoke_bedrock(
message_list=message_list,
system_message=system_message
)
return [response['output']['message']]
The system message serves three essential functions:
- Set up medical area experience for correct interpretation.
- Present tips for dealing with several types of data (textual content and visualizations).
- Present a self-scored confidence. Though this isn’t an unbiased grading mechanism, the rating is directionally indicative of how assured the mannequin is in its personal extraction.
Following the identical sample, the FM will use the opposite instruments within the toolspec
definition to avoid wasting and summarize the outcomes.
A novel benefit of utilizing a multi-modal FM for the extraction activity is its potential to have a deep understanding of the textual content it’s extracting. For instance, the next code is an summary of the info schema we’re requesting as enter to the save_consultation_notes
operate. Check with the code in constants.py for full definition. The mannequin must not solely extract a transcript, but additionally perceive it to extract such structured knowledge from an unstructured doc. This considerably reduces the postprocessing efforts required for the info to be consumed by a downstream utility.
"session": {
"sort": "object",
"properties": {
"date": {"sort": "string"},
"concern": {
"sort": "object",
"properties": {
"primaryComplaint": {
"sort": "string",
"description": "Main medical criticism of the affected person. Solely seize the medical situation. no timelines"
},
"period": {"sort": "quantity"},
"durationUnit": {"sort": "string", "enum": ["days", "weeks", "months", "years"]},
"associatedSymptoms": {
"sort": "object",
"additionalProperties": {
"sort": "boolean"
},
"description": "Key-value pairs of signs and their presence (true) or absence (false)"
},
"absentSymptoms": {
"sort": "array",
"gadgets": {"sort": "string"}
}
},
"required": ["primaryComplaint", "duration", "durationUnit"]
}
The paperwork include a treasure trove of personally identifiable data (PII) and private well being data (PIH). To redact this data, you may go enable_guardrails as true. It will use the guardrail you setup earlier as a part of the knowledge extraction course of and masks data recognized as PII or PIH.
processor.process_file(immediate=immediate,
enable_guardrails=True,
toolspecs=toolspecs,
…
)
Lastly, cross-document validation is essential for sustaining knowledge accuracy and compliance in healthcare settings. Though the present implementation performs fundamental consistency checks by the abstract immediate, organizations can prolong the framework by implementing a devoted validation software that integrates with their particular enterprise guidelines and compliance necessities. Such a software may carry out refined validation logic like insurance coverage coverage verification, appointment date consistency checks, or another domain-specific validation necessities, offering full knowledge integrity throughout the doc package deal.
Future concerns
As Amazon Bedrock continues to evolve, a number of highly effective options may be built-in into this doc processing workflow to reinforce its enterprise readiness, efficiency, and cost-efficiency. Let’s discover how these superior capabilities can take this answer to the subsequent degree:
- Inference profiles in Amazon Bedrock outline a mannequin and its related Areas for routing invocation requests, enabling varied duties similar to utilization monitoring, price monitoring, and cross-Area inference. These profiles assist customers observe metrics by Amazon CloudWatch logs, monitor prices with price allocation tags, and enhance throughput by distributing requests throughout a number of Areas.
- Immediate caching may help when you’ve gotten workloads with lengthy and repeated contexts which are steadily reused for a number of queries. As a substitute of reprocessing all the context for every doc, the workflow can reuse cached prompts, which is especially helpful when utilizing the identical picture throughout totally different tooling workflows. With help for a number of cache checkpoints, this characteristic can considerably cut back processing time and inference prices whereas sustaining the workflow’s clever orchestration capabilities.
- Clever immediate routing can dynamically choose probably the most acceptable mannequin for every activity based mostly on efficiency and price necessities. Moderately than explicitly assigning Anthropic’s Claude 3 Haiku for orchestration and Anthropic’s Claude 3.5 Sonnet for doc evaluation, the workflow can use clever routing to robotically select the optimum mannequin inside the Anthropic household for every request. This strategy simplifies mannequin administration whereas offering cost-effective processing of various doc varieties, from easy structured kinds to advanced handwritten notes, all by a single endpoint.
Conclusion
This clever doc processing answer demonstrates the facility of mixing Amazon Bedrock FMs with software use capabilities to create refined, self-orchestrating workflows. Through the use of Anthropic’s Claude 3 Haiku for orchestration and Anthropic’s Claude 3.5 Sonnet for advanced visible duties, the answer successfully handles structured, semi-structured, and unstructured paperwork whereas sustaining excessive accuracy and compliance requirements.
Key advantages of this strategy embrace:
- Decreased handbook processing by clever automation
- Improved accuracy by specialised mannequin choice
- Constructed-in compliance with guardrails for delicate knowledge
- Versatile structure that adapts to numerous doc varieties
- Price-effective processing by strategic mannequin utilization
As organizations proceed to digitize their operations, options like this showcase how generative AI can remodel conventional doc processing workflows. The mix of highly effective FMs in Amazon Bedrock and the software use framework supplies a strong basis for constructing clever, scalable doc processing options throughout industries.
For extra details about Amazon Bedrock and its capabilities, go to the Amazon Bedrock Consumer Information.
In regards to the Creator
Raju Rangan is a Senior Options Architect at AWS. He works with government-sponsored entities, serving to them construct AI/ML options utilizing AWS. When not tinkering with cloud options, you’ll catch him hanging out with household or smashing birdies in a vigorous recreation of badminton with buddies.