Automationscribe.com
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automation Scribe
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automationscribe.com
No Result
View All Result

Huntington Financial institution: Redacting delicate knowledge from 400M+ paperwork with AWS

admin by admin
June 25, 2026
in Artificial Intelligence
0
Huntington Financial institution: Redacting delicate knowledge from 400M+ paperwork with AWS
399
SHARES
2.3k
VIEWS
Share on FacebookShare on Twitter


When your doc repository comprises tons of of tens of millions of recordsdata gathered over practically a decade, how do you systematically discover and redact delicate buyer knowledge with out taking years to finish? This was the problem dealing with The Huntington Nationwide Financial institution (Huntington), a high 10 financial institution in the USA.

Redacting delicate info at scale

Since 2015, Huntington’s doc administration system has securely saved tons of of tens of millions of paperwork on-premises. In 2025, as a part of a proactive compliance initiative, Huntington got down to course of the paperwork on this system and redact delicate knowledge. These paperwork come in several codecs, so the answer wanted flexibility to deal with various file sorts whereas delivering the throughput required to course of tens of millions of paperwork rapidly.

Unique estimates indicated this effort would take years. Nonetheless, by designing a scalable redaction workflow utilizing Amazon Textract, Amazon SageMaker, AWS Step Features, and AWS Lambda, Huntington decreased this timeline to months.

Resolution overview

Earlier than inspecting the technical implementation, let’s have a look at the core necessities Huntington established for this venture. Should you’re dealing with an identical large-scale doc processing problem, these necessities can function a place to begin to your personal resolution design:

  • Information have to be encrypted at relaxation and in transit.
  • Areas the place knowledge is saved or accessed should meet strict entry necessities.
  • Providers used have to be in-scope for PCI DSS compliance.
  • Outputs have to be replicated again to on-premises knowledge shops.
  • Redaction accuracy should meet or exceed 95% to fulfill compliance necessities.

The next diagram illustrates the high-level resolution structure.

High-level architecture diagram showing the document redaction solution with on-premises file share, AWS DataSync, Amazon S3, Amazon Textract, and AWS Step Functions

Shifting knowledge securely, with confidence

Huntington’s first goal was to maneuver paperwork from an on-premises file share to an Amazon Easy Storage Service (Amazon S3) bucket. Shifting paperwork is simple, however this effort required transferring over 400 million paperwork, encrypted in transit and at relaxation. To perform this, Huntington used AWS DataSync, AWS Direct Join, Amazon S3, and AWS Key Administration Service (AWS KMS).

AWS DataSync might be deployed as an agent in your on-premises knowledge heart to watch a configured supply, similar to an SMB file share. Whereas getting paperwork to AWS was important for processing, AWS DataSync additionally helps syncing knowledge again to on-premises, which was one other key requirement for this venture.

Data transfer architecture showing AWS DataSync moving documents from on-premises file share to Amazon S3 over AWS Direct Connect

Amazon Textract is an AWS machine studying service that extracts textual content, tables, and varieties from scanned paperwork. Monetary establishments use it to routinely course of paperwork like account statements or mortgage purposes, then determine delicate knowledge similar to Social Safety numbers, account numbers, and private addresses. The next pattern bill demonstrates this functionality.

Sample invoice with detected sensitive fields

Amazon Textract output highlighting detected fields with bounding boxes on the invoice

Amazon Textract detects varied fields from a doc and offers coordinates of detected fields and different metadata inside a JSON output.

Huntington used Amazon Textract in an orchestrated course of with AWS Step Features. This method decreased guide overview time whereas enhancing accuracy in detecting delicate info throughout giant doc volumes.

Scaling detection throughput

Automated pipelines for doc processing are precious, however processing paperwork sequentially would have prolonged the venture timeline to years. To fulfill their objective, Huntington wanted to course of tens of millions of paperwork every day.

Scaling to this degree required addressing two predominant concerns: maximizing concurrent Amazon Textract jobs inside service quotas, and controlling request charges to keep away from throttling.

AWS providers have quotas that may be adjusted by means of tender and arduous limits. The Amazon Textract jobs-per-second quota might be elevated by submitting a request by means of the AWS Service Quotas console.

To maximise throughput towards the service quota, Huntington used the AWS Step Features built-in map state, which processes collections of inputs in JSON, CSV, or different codecs. The crew organized paperwork in Amazon S3 right into a JSON assortment and ran the map state in distributed mode for larger concurrency. To trace pipeline progress, they used AWS Step Features map run execution summaries alongside Amazon CloudWatch dashboards to watch response occasions, throttle counts, successes, and error charges.

To deal with potential throttling, Huntington monitored their CloudWatch dashboard to confirm Amazon Textract profitable request counts and throttled counts. As wanted, they adjusted concurrency limits for little one workflow executions to substantiate they remained beneath the Amazon Textract service quota whereas sustaining excessive throughput. When jobs accomplished efficiently, detected fields and metadata have been written to a bucket for later overview. The next diagram depicts this method:

AWS Step Functions workflow diagram showing distributed map state processing documents through Amazon Textract with CloudWatch monitoring

The wait block inside the step perform verified the method was able to proceed with writing job metadata and persevering with with the following Amazon Textract invocation. When there are not any failures, the state machine finishes with a move state. When failures happen, AWS Step Features writes to a log for human overview and reprocessing.

Redacting detected delicate info

Up up to now, the method centered on detecting delicate knowledge and cataloging it inside metadata recordsdata written to Amazon S3. The ultimate steps are to redact the paperwork and transmit them again to on-premises storage.

Picture and PDF redaction is supported by a number of open-source and proprietary instruments. Frequent open-source Python libraries embrace PyMuPDF or picture drawing libraries like PIL. The next determine exhibits a pattern redaction of the bill proven earlier. Amazon Textract helps detection of varied fields, and you can even create customized classifications utilizing regex patterns. Mixed with redaction software program, you’ll be able to confidently redact detected fields. If you wish to create a threshold for human intervention, Amazon Textract offers confidence scores that may set off validation workflows.

Sample invoice with sensitive fields redacted using black boxes

As soon as once more, Huntington confronted the identical architectural problem: how would this scale? AWS Step Features offered the answer for processing tens of millions of paperwork whereas providing hooks for error dealing with and retry logic. Because the doc processing pipeline cataloged objects requiring redaction, Huntington ran a easy movement towards them:

AWS Step Functions workflow for redaction processing with error handling and retry logic

To confirm accuracy and thoroughness, Huntington double-checked that detected fields matched anticipated patterns previous to redaction, adopted by a metadata replace for every file. Redacted recordsdata have been positioned in an Amazon S3 location monitored by AWS DataSync for transmission again to on-premises file storage.

Conclusion

Utilizing AWS, Huntington processed paperwork at a price of roughly 10 million per day, lowering estimated processing time from years to only a few months. The price of processing your complete doc repository was roughly 5% of the unique estimate. Redaction accuracy exceeded 95%, assembly compliance necessities and supporting knowledge safety goals.

This venture demonstrates how AWS providers can help large-scale knowledge processing and compliance initiatives. Huntington plans to proceed utilizing this framework for high-volume redaction wants similar to mergers and acquisitions.

To study extra in regards to the providers used on this resolution, go to the Amazon Textract element web page or discover the AWS Step Features documentation.

Acknowledgements

Particular due to the next people and groups for his or her contributions: Xuelei Yuan, Robert Carnell, Jeanne Keith, Debbie Montgomery, Invoice Gross, Jodi Pettiford, Jon Glazer, Marshall Doss, Bob Wojasinski, Tami Wolf, Marijane Eldridge, Pradeep Kumar Tata, Michael Burkhardt, Nirmal Antony, Trevor Pease, Bryan Griffith, Angus Ferguson (AWS) Randy Patrick (AWS), Stephanie Brenneman (AWS), Artwork Steele, Kevin Owen.


Concerning the authors

Rob Carnell

Rob Carnell

Rob is the Enterprise Information and Analytics Director at Huntington, overseeing cross-functional groups throughout AI, modeling, marketing campaign testing and design, insights, and digital to drive built-in options and enterprise impression.

Timothy Gorman

Timothy Gorman

Timothy is a Lead AI Engineer at Huntington Nationwide Financial institution specializing in automation and unstructured knowledge processing. He holds a doctorate in physics from The Ohio State College and has labored throughout disciplines together with atomic physics, laser engineering, and AI-driven automation in finance.

Bobby Lumpkin

Bobby Lumpkin

Bobby is an AI/ML Engineer at Huntington Nationwide Financial institution, specializing in synthetic intelligence, machine studying, and superior statistical strategies in monetary providers. He holds a bachelor’s diploma in arithmetic and three grasp’s levels in arithmetic, mathematical sciences, and utilized statistics, respectively.

Xuelei Yuan

Xuelei Yuan

Xuelei is a Information Science Director at Huntington, the place she leads AI and machine studying initiatives, specializing in scalable, production-ready options powered by cloud applied sciences.

Ryan Doty

Ryan Doty

Ryan is a Options Architect Supervisor at Amazon Internet Providers (AWS), primarily based out of New York. He helps monetary providers clients speed up their adoption of the AWS Cloud by offering architectural tips to design progressive and scalable options. Coming from a software program improvement and gross sales engineering background, the chances that the cloud can carry to the world excite him.

Angus Ferguson

Angus Ferguson

Angus is a Senior Options Architect with the North American Monetary Service Business crew at AWS since 2022. In his position, Angus helps his clients to translate enterprise goals right into a technical imaginative and prescient, enabling them to develop and innovate within the cloud. Exterior of AWS, Angus additionally has a ardour for cultivating scholar’s passions by means of giant occasions, similar to hackathons, the place he will get to mentor America’s subsequent era of pc engineers.

Randy Patrick

Randy Patrick

Randy is a Senior Technical Account Supervisor with the North American Monetary Providers Business crew at AWS. With 21 years of IT expertise and a give attention to cybersecurity, Randy helps enterprise clients construct safe, resilient architectures that meet rigorous compliance and knowledge safety necessities.

Tags: 400MAWSbankDatadocumentsHuntingtonRedactingsensitive
Previous Post

Context Home windows Are Not Reminiscence: What AI Agent Builders Have to Perceive

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular News

  • Greatest practices for Amazon SageMaker HyperPod activity governance

    Greatest practices for Amazon SageMaker HyperPod activity governance

    405 shares
    Share 162 Tweet 101
  • How Cursor Really Indexes Your Codebase

    404 shares
    Share 162 Tweet 101
  • Context Engineering — A Complete Fingers-On Tutorial with DSPy

    403 shares
    Share 161 Tweet 101
  • Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

    403 shares
    Share 161 Tweet 101
  • Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

    403 shares
    Share 161 Tweet 101

About Us

Automation Scribe is your go-to site for easy-to-understand Artificial Intelligence (AI) articles. Discover insights on AI tools, AI Scribe, and more. Stay updated with the latest advancements in AI technology. Dive into the world of automation with simplified explanations and informative content. Visit us today!

Category

  • AI Scribe
  • AI Tools
  • Artificial Intelligence

Recent Posts

  • Huntington Financial institution: Redacting delicate knowledge from 400M+ paperwork with AWS
  • Context Home windows Are Not Reminiscence: What AI Agent Builders Have to Perceive
  • Your First Process as a Knowledge Engineer in a New Firm? Make the ETL Pipeline Testable
  • Home
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

© 2024 automationscribe.com. All rights reserved.

No Result
View All Result
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us

© 2024 automationscribe.com. All rights reserved.