This submit was co-written with Yash Munsadwala, Adam Hood, Justin Guse, and Hector Hernandez from PwC.
Contract evaluation usually consumes vital time for authorized, compliance, and procurement groups, particularly when vital insights are buried in prolonged, unstructured agreements. As contract volumes develop, discovering particular clauses and assessing extracted phrases can change into more and more tough to scale.
Immediately, many groups rely totally on key phrase and pattern-based extraction or contract administration methods to research contracts. Whereas these strategies can work, they usually fall wanting offering constant insights at a scale. Because of this, many groups are exploring AI-based approaches that may mix giant language fashions (LLMs) with automated extraction workflows.
PwC’s AI-driven annotation (AIDA) resolution, constructed on AWS, can extract structured insights from contracts by means of rule-based extraction and pure language queries. Utilizing LLMs, AIDA can interpret complicated authorized language and extracts insights based mostly on outlined guidelines. Customers can ask pure language questions on particular person contracts or throughout a number of paperwork inside a undertaking and obtain context-specific solutions supported by linked citations. By decreasing the necessity to manually search and interpret contract language, these capabilities assist streamline overview workflows. In buyer implementations, AIDA has helped scale back handbook contract overview time by as much as 90%, serving to groups to retrieve key info extra rapidly and shorten overview cycles. On this submit, you will note how AIDA addresses these challenges. We stroll by means of the structure behind AIDA and exhibit three core capabilities: template-based extraction, document-level chat, and world chat throughout paperwork.
Answer overview
AIDA is designed to transform unstructured paperwork into structured, searchable insights, streamlining the method to entry and reuse crucial contract info throughout methods. AIDA makes use of LLMs and a mix of AWS cloud-native and built-in companies to assist extract insights from contracts extra successfully. The answer supplies capabilities that may assist organizational safety, compliance, and danger administration necessities, although clients stay accountable for configuring and working the answer to fulfill their particular compliance obligations. As AIDA processes probably delicate contractual information, acceptable safeguards and human overview workflows ought to be utilized previous to enterprise or authorized reliance on AI-generated outputs. AIDA supplies a holistic suite of capabilities designed to deal with present challenges. The next key options spotlight core performance, which we discover intimately within the subsequent sections:
- Custom-made Information Extraction: Extract scalable information enabled by user-defined guidelines and customized templates. Use the customized extraction subject and logic per doc and extract insights from 1000’s of contracts in parallel with constant accuracy.
- Pure Language Q&A Throughout Paperwork: Ask pure language questions and obtain context-specific responses with linked citations to the supply paperwork.
- Integration with Mannequin Techniques: Combine with mannequin methods (for instance, contract administration methods and doc repositories) that you should utilize to retrieve supply information and ship extracted insights.
AIDA can assist scalable contract evaluation throughout a variety of industries, together with Media & Leisure (M&E) and Actual Property—and competencies like Procurement, Authorized, and Compliance. For example, within the M&E sector, AIDA helps content material producers and distributors unlock the general worth of their IP by extracting and analyzing rights info from license agreements. It summarizes rights reminiscent of broadcast, streaming, theatrical, and by-product enabling quicker, knowledgeable choices on spin-offs, sequels, and world distribution. One main movie and TV studio diminished rights analysis time by 90%.
AIDA’s structure overview
The structure illustrates how AIDA’s parts work collectively to securely course of, analyze, and ship insights from complicated contracts utilizing the scalable, cloud-native companies of AWS. Every element is designed to assist course of contracts at scale whereas sustaining safety, traceability, and efficiency.
1. Edge safety and entry
AIDA’s edge layer permits authenticated entry and managed routing for person site visitors. Requests go by means of AWS WAF for risk filtering, then by means of a Community Load Balancer to the reverse proxy server (NGINX), which manages SSL termination, routing, and coverage enforcement earlier than forwarding to Amazon Elastic Container Service (Amazon ECS). Information in transit is encrypted utilizing TLS 1.2 or increased, together with person connections by means of HTTPS, and inner service-to-service communication between Amazon ECS, Amazon Relational Database Service (Amazon RDS), Amazon Easy Storage Service (Amazon S3), Amazon Bedrock, and different AWS companies.
Authentication is dealt with by means of Amazon Cognito, built-in with enterprise id suppliers (for instance, Microsoft Entra ID, Okta) to safe entry at scale. AIDA applies fine-grained entry management by means of each application-level and project-level roles, so directors can handle person entry and permissions centrally. Mission-level roles assist directors to regulate person permissions and outline what actions every person can carry out inside a undertaking, offering safe and ruled entry to information and performance.
2. Information storage
After authentication, AIDA shops uploaded paperwork, Optical Character Recognition (OCR) outputs, and related metadata in Amazon S3 offering a sturdy and cost-effective strategy to handle giant volumes of contract information. Structured information, configurations, and extracted insights persist in Amazon RDS, so customers can question and retrieve insights successfully for analytics and integration.
Amazon S3 buckets are encrypted at relaxation utilizing Amazon S3-managed encryption keys (SSE-S3), and Amazon RDS situations are encrypted at relaxation utilizing AWS KMS-managed keys. Moreover, S3 bucket setup follows Amazon S3 finest practices together with: Block Public Entry enabled on the bucket stage and enabling entry logging for safety evaluation and audit functions.
3. OCR and prediction processing
OCR and extraction workflows run asynchronously on Amazon ECS utilizing AWS Fargate, with duties coordinated by means of Amazon Easy Queue Service (Amazon SQS). With this strategy, customers can course of giant volumes of contracts in parallel with out blocking person interactions.
Extraction guidelines information how related content material is recognized and despatched to basis fashions (FMs) hosted on Amazon Bedrock, the place LLMs can interpret the contract textual content and extract structured values. Outcomes are written again to Amazon RDS, the place they’re accessible for overview, dashboards, and integrations.
4. Retrieval Augmented Technology (RAG)
When analyzing contracts, it’s crucial that solutions are correct and traceable again to the unique supply textual content. RAG assist tackle this by grounding mannequin responses within the underlying contract content material, quite than relying solely on the mannequin’s data. AIDA makes use of RAG to assist confirm that responses are grounded within the underlying contract textual content. Paperwork saved in Amazon S3 are embedded utilizing Amazon Bedrock Embeddings Fashions, with vectors listed in Amazon OpenSearch Serverless for semantic search. Throughout inference, related information is retrieved from Amazon Bedrock Information Bases and mixed with person enter, producing correct, context-aware, and explainable outcomes.
As well as, AIDA makes use of Amazon Bedrock Guardrails to use content material filtering, delicate info (PII) safety, and immediate security controls, additional confirming that responses stay safe and aligned with enterprise and authorized requirements.
5. Visualization
To point out how contracts are being processed, AIDA integrates with Amazon Fast Sight to visualise metrics reminiscent of doc volumes, OCR accuracy, extraction throughput, and processing standing.
This dashboard may give visibility into system efficiency and helps establish bottlenecks or alternatives to enhance effectivity over time.
6. System integrations throughout inner, vendor, and third-party methods
AIDA integrates with downstream methods utilizing AWS Lambda, Amazon EventBridge, and Amazon SQS. These integrations ship extracted insights to contract lifecycle administration instruments, information methods, or different operational methods. A configurable human-in-the-loop overview queue can validate and approve extracted outputs earlier than they’re forwarded downstream.
By pushing structured contract information into instruments in use, organizations can scale back handbook information dealing with and reuse contract insights throughout compliance, reporting, and analytics workflows.
7. Ancillary and system companies
A variety of ancillary AWS companies assist AIDA’s core system offering safety, observability, and automation. AWS Identification and Entry Administration (AWS IAM) and AWS Key Administration Service (AWS KMS) handle entry and encryption, with IAM insurance policies applied following the precept of least privilege; Amazon CloudWatch and AWS X-Ray present monitoring; whereas AWS CodeBuild, AWS CodePipeline, and AWS CloudTrail allow steady deployment and auditability by enabling entry logging for information operations.
Let’s discover how Amazon Bedrock particularly permits the clever options that drive these effectivity positive aspects.
How Amazon Bedrock permits AIDA’s clever options
Amazon Bedrock permits AIDA’s clever insights, extraction and conversational capabilities. By integrating superior FMs into AIDA’s processing pipeline, Amazon Bedrock permits context-aware information extraction, semantic retrieval, and interactive chat functionalities. AIDA orchestrates doc processing, OCR, semantic retrieval, and LLM reasoning in a unified workflow retrieving related sections based mostly on queries or predefined guidelines and utilizing Amazon Bedrock to assist RAG and supply responses with clear citations to the supply paperwork.
To showcase the important thing options, we uploaded pattern contracts to AIDA from the Contract Understanding Atticus Dataset (CUAD), an open authorized contract overview dataset created with dozens of authorized specialists from The Atticus Mission. The CUAD dataset is publicly accessible beneath the Inventive Commons Attribution 4.0 (CC BY 4.0) license, allowing use and distribution for analysis and analysis functions.
1. Smarter, quicker insights extraction by means of reusable templates
Reusable templates can extract constant contract attributes at scale by serving to customers to outline extraction logic as soon as and apply it throughout a number of paperwork. Every template teams collectively labels that characterize key contract components reminiscent of termination discover durations, renewal phrases, or rights clauses that authorized and compliance groups continuously overview.
When a template is utilized to a set of contracts, the identical extraction guidelines are used constantly throughout paperwork. This helps scale back handbook overview effort whereas bettering accuracy and consistency, particularly when working with giant contract volumes. Behind the scenes, AIDA processes every contract utilizing a structured illustration that preserves web page and part context. Extraction guidelines information how related content material is recognized, and LLMs interpret that context to extract the right values. Outcomes are returned with citations that hyperlink again to the unique contract textual content, enabling you to confirm the place every perception got here from.
For instance, the Termination Discover Interval label extracts timelines instantly from the contract proven within the following screenshot, whereas the fitting panel shows the extracted reply (highlighted in inexperienced) with clickable references to the precise supply textual content inside the contract.

2. Doc-level chat
You need to use document-level chat to ask pure language questions on a single contract and obtain solutions grounded instantly in that doc. This functionality is especially helpful when fast clarification on particular phrases, dates, or obligations is required, stopping you from manually scanning prolonged and complicated agreements.
When questions are submitted, AIDA can establish essentially the most related sections of the contract by evaluating queries towards a semantic illustration of the doc’s content material. These sections are then offered as context to an LLM that’s hosted on Amazon Bedrock, which generates a response based mostly on the contract textual content.

3. International chat
International chat extends the document-level chat function to assist questions throughout a number of contracts inside a undertaking. This function is helpful when a broader view is required, reminiscent of figuring out frequent clauses, evaluating obligations, or summarizing phrases throughout a group of associated agreements.
International chat can be utilized in two methods. In a single situation, questions are evaluated throughout the contracts in a undertaking to offer a consolidated, project-wide view. In one other situation, questions may be scoped to a particular set of contracts, so customers can concentrate on particular agreements whereas utilizing the identical conversational interface.

AIDA helps construct a semantic data base utilizing Amazon Bedrock from the underlying contracts by extracting and embedding doc content material for search. These embeddings are listed in Amazon OpenSearch Serverless, making a scalable semantic layer that may assist queries throughout giant and numerous contract collections.
When submitting a query, AIDA can retrieve related passages utilizing a mix of implicit and specific filtering. Implicit filtering depends on semantic similarity between queries and the contract content material to floor contextually related sections. Express filtering applies metadata constraints reminiscent of contract sort, creation date, enterprise unit, or jurisdiction to slim outcomes to essentially the most related subset. The chosen context is then offered to an LLM hosted on Amazon Bedrock, which generates a consolidated response with citations linking again to the unique supply paperwork.
Supporting capabilities constructed on AIDA’s system
The next part describes the supporting capabilities which can be constructed on AIDA’s system: operational dashboard and exterior system integrations.
Operational dashboard
The operational dashboard supplies a consolidated view of contract overview efficiency on the undertaking stage monitoring file volumes, OCR and perception extraction completion charges, errors, and extraction accuracy. It helps groups rapidly spot bottlenecks and monitor reviewer’s productiveness.

Exterior System Integrations
The structured extracted insights generated by AIDA may be rapidly pushed to downstream methods reminiscent of Contract Lifecycle Administration (CLM) instruments, ERP methods, CRMs, or information warehouses. This integration helps enrich inner or exterior methods with high-quality, machine-readable contract information, decreasing handbook information re-entry and reconciliation throughout methods. By embedding these insights instantly into these methods, organizations can enhance compliance monitoring and assist quicker, data-driven choices.
PwC’s AI-driven annotation (AIDA) resolution, enabled by AWS, helps transfer organizations past handbook contract overview to a quicker, extra dependable, and scalable strategy. By bringing collectively OCR, user-defined extraction guidelines, and Retrieval Augmented Technology by means of Amazon Bedrock, AIDA helps rapidly establish key phrases, obligations, and insights buried inside complicated contracts.
The answer helps streamline authorized and operational workflows, scale back overview time, and enhance consistency throughout giant volumes of paperwork. This resolution was constructed on the cloud-native companies of AWS and designed to be safe like Amazon ECS, Amazon S3, Amazon RDS, and Amazon OpenSearch Serverless. AIDA can present the flexibleness and resilience wanted for enterprise deployment. Collectively, PwC and AWS can flip contract information into actionable intelligence, enabling smarter choices and higher effectivity throughout their operations.
Concerning the authors

