Introducing document-level sync reviews: Enhanced knowledge sync visibility in Amazon Q Enterprise

Amazon Q Enterprise is a totally managed, generative synthetic intelligence (AI)-powered assistant that helps enterprises unlock the worth of their knowledge and information. With Amazon Q, you possibly can shortly discover solutions to questions, generate summaries and content material, and full duties by utilizing the knowledge and experience saved throughout your organization’s numerous knowledge sources and enterprise programs. On the core of this functionality are native knowledge supply connectors that seamlessly combine and index content material from a number of repositories right into a unified index. This permits the Amazon Q giant language mannequin (LLM) to supply correct, well-written solutions by drawing from the consolidated knowledge and data. The information supply connectors act as a bridge, synchronizing content material from disparate programs like Salesforce, Jira, and SharePoint right into a centralized index that powers the pure language understanding and generative skills of Amazon Q.

Clients admire that Amazon Q Enterprise securely connects to over 40 knowledge sources. Whereas utilizing their knowledge supply, they need higher visibility into the doc processing lifecycle throughout knowledge supply sync jobs. They wish to know the standing of every doc they tried to crawl and index, in addition to the power to troubleshoot why sure paperwork weren’t returned with the anticipated solutions. Moreover, they need entry to metadata, timestamps, and entry management lists (ACLs) for the listed paperwork.

We’re happy to announce a brand new function now accessible in Amazon Q Enterprise that considerably improves visibility into knowledge supply sync operations. The most recent launch introduces a complete document-level report included into the sync historical past, offering directors with granular indexing standing, metadata, and ACL particulars for each doc processed throughout a knowledge supply sync job. This enhancement to sync job observability allows directors to shortly examine and resolve ingestion or entry points encountered whereas organising an Amazon Q Enterprise software. The detailed doc reviews are endured within the new SYNC_RUN_HISTORY_REPORT log stream below the Amazon Q Enterprise software log group, so crucial sync job particulars can be found on-demand when troubleshooting.

Lifecycle of a doc in a knowledge supply sync run job

On this part, we look at the lifecycle of a doc inside a knowledge supply sync in Amazon Q Enterprise. This supplies worthwhile perception into the sync course of. The information supply sync contains three key phases: crawling, syncing, and indexing. Crawling entails the connector connecting to the info supply and extracting paperwork assembly the outlined sync scope in accordance with the info supply configuration. These paperwork are then synced to Amazon Q Enterprise throughout the syncing part. Lastly, indexing makes the synced paperwork searchable throughout the Amazon Q Enterprise surroundings.

The next diagram reveals a flowchart of a sync run job.

Crawling stage

The primary stage is the crawling stage, the place the connector crawls all paperwork and their metadata from the info supply. Throughout this stage, the connector additionally compares the checksum of the doc in opposition to the Amazon Q index to determine if a specific doc must be added, modified, or deleted from the index. This operation corresponds to the CrawlAction discipline within the sync run historical past report.

If the doc is unmodified, it’s marked as UNMODIFIED and skipped in the remainder of the phases. If any doc fails within the crawling stage, for instance on account of throttling errors, damaged content material, or if the doc dimension is simply too large, that doc is marked as failed within the sync run historical past report with the CrawlStatus as FAILED. If the doc was skipped on account of any validation errors, its CrawlStatus is marked as SKIPPED. These paperwork are usually not despatched ahead to the subsequent stage. All profitable paperwork are marked as SUCCESS and are despatched ahead.

We additionally seize the ACLs and metadata on every doc on this stage to have the ability to add it to the sync run historical past report.

Syncing stage

In the course of the syncing stage, the doc is distributed to Amazon Q Enterprise ingestion service APIs like BatchPutDocument and BatchDeleteDocument. After a doc is submitted to those APIs, Amazon Q Enterprise runs validation checks on the submitted paperwork. If any doc fails these checks, its SyncStatus is marked as FAILED. If there may be an irrecoverable error for a specific doc, it’s marked as SKIPPED and different paperwork are despatched ahead.

Indexing stage

On this step, Amazon Q Enterprise parses the doc, processes it in accordance with its content material sort, and persists it within the index. If the doc fails to be endured, its IndexStatus is marked as FAILED; in any other case, it’s marked as SUCCESS.

After the statuses of all of the phases have been captured, we emit these statuses as an Amazon Cloudwatch occasion to the shopper’s AWS account.

Key options and advantages of document-level reviews

The next are the important thing options and advantages of the brand new doc degree report in Amazon Q Enterprise purposes:

Enhanced sync run historical past web page – A brand new Actions column has been added to the sync run historical past web page, offering entry to the document-level report for every sync run.
Devoted log stream – A brand new log stream named SYNC_RUN_HISTORY_REPORT has been created within the Amazon Q Enterprise CloudWatch log group, containing the document-level report.
Complete doc data – The document-level report contains the next data for every doc.
Doc ID – That is the doc ID that’s inherited instantly from the info supply or mapped by the shopper within the knowledge supply discipline mappings.
Doc title – The title of the doc is taken from the info supply or mapped by the shopper within the knowledge supply discipline mappings.
Consolidated doc standing (SUCCESS, FAILED, or SKIPPED) – That is the ultimate consolidated standing of the doc. It might probably have a worth of SUCCESS, FAILED, or SKIPPED. If the doc was efficiently processed in all phases, then the worth is SUCCESS. If the doc has failed or was skipped in any of the phases, then the worth of this discipline might be FAILED or SKIPPED.
Error message (if the doc failed) – This discipline comprises the error message with which a doc failed. If a doc was skipped on account of throttling errors, or any inside errors, this might be proven within the error message discipline.
Crawl standing – This discipline denotes whether or not the doc was crawled efficiently from the info supply. This standing correlates to the syncing-crawling state within the knowledge supply sync.
Sync standing – This discipline denotes whether or not the doc was despatched for syncing efficiently. This correlates to the syncing-indexing state within the knowledge supply sync.
Index standing – This discipline denotes whether or not the doc was efficiently endured within the index.
ACLs – This discipline comprises a listing of document-level permissions that have been crawled from the info supply. The main points of every aspect within the record are:
- International identify: That is the e-mail/username of the person. This discipline is mapped throughout a number of knowledge sources. For instance, if a person has 3 knowledge sources – Confluence, Sharepoint and Gmail with the native person ID as confluence_user, sharepoint_user and gmail_user respectively, and their e mail handle person@e mail.com is the globalName within the ACL for all of them; then Amazon Q Enterprise understands that each one of those native person IDs map to the identical international identify.
- Title: That is the native distinctive ID of the person which is assigned by the info supply.
- Kind: This discipline signifies the principal sort. This may be both USER or GROUP.
- Is Federated: This can be a boolean flag which signifies whether or not the group is of INDEX degree (true) or DATASOURCE degree (false).
- Entry: This discipline signifies whether or not the person has entry allowed or denied explicitly. Values will be both ALLOWED or DENIED.
- Information supply ID: That is the info supply ID. For federated teams (INDEX degree), this discipline might be null.
Metadata – This discipline comprises the metadata fields (aside from ACL) that have been pulled from the info supply. This record additionally contains the metadata fields mapped by the shopper within the knowledge supply discipline mappings in addition to further metadata fields added by the connector.
Hashed doc ID (for troubleshooting help) – To safeguard your knowledge privateness, we current a safe, one-way hash of the doc identifier. This encrypted worth allows the Amazon Q Enterprise workforce to effectively find and analyze the precise doc inside our logs, must you encounter any concern that requires additional investigation and determination.
Timestamp – The timestamp signifies when the doc standing was logged in CloudWatch.

Within the following sections, we discover totally different use circumstances for the logging function.

Troubleshoot “Sorry, I couldn’t discover related data” with the new logging feature

The brand new document-level logging function in Amazon Q Enterprise may help troubleshoot widespread points associated to the “Sorry, I couldn’t discover related data to finish your request” response.

Let’s discover an instance situation. A mutual funds supervisor makes use of Amazon Q Enterprise chat for information retrieval and insights extraction throughout their enterprise knowledge shops. When the fund supervisor asks, “What’s the CAGR of the multi-asset fund?” within the Amazon Q chat, they obtain the “Sorry, I couldn’t discover related data to finish your request” response.

Because the administrator managing their Amazon Q Enterprise software, you possibly can troubleshoot the problem utilizing the next method with the brand new logging function. First, you wish to decide whether or not the multi-asset fund doc was efficiently listed within the Amazon Q Enterprise software. Subsequent, it’s essential to confirm if the fund supervisor’s person account has the required permission to learn the knowledge from the multi-asset fund doc. Amazon Q Enterprise enforces the doc permissions configured in its knowledge supply, and you should use this new function to confirm that the doc ACL settings are synced within the Amazon Q Enterprise software index.

You need to use the next CloudWatch question string to verify the doc ACL settings:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and DocumentTitle = "your-document-title"
| fields DocumentTitle, ConnectorDocumentStatus.Standing, Acl
| kind @timestamp desc
| restrict 1

This question filter makes use of the per-document-level logging stream SYNC_RUN_HISTORY_REPORT, and shows the doc title and its related ACL settings. By verifying the doc indexing and permissions, you possibly can determine and resolve potential points which may be inflicting the “Sorry, I couldn’t discover related data” response.

The next screenshot reveals an instance consequence.

Decide the optimum boosting length for current paperwork in utilizing document-level reporting

In relation to producing correct solutions, it’s possible you’ll wish to fine-tune the way in which Amazon Q prioritizes its content material. As an illustration, it’s possible you’ll want to spice up current paperwork over older ones to verify essentially the most up-to-date passages are used to generate a solution. To realize this, you should use the enterprise’s relevance tuning function in Amazon Q Enterprise to spice up paperwork primarily based on the final replace date attribute, with a specified boosting length. Nonetheless, figuring out the optimum boosting interval will be difficult when coping with numerous continuously altering paperwork.

Now you can use the per-document-level report back to acquire the _last_updated_at metadata discipline data in your paperwork, which may help you identify the suitable boosting interval. For this, you utilize the next CloudWatch Logs Insights question to retrieve the _last_updated_at metadata attribute for machine studying paperwork from the SYNC_RUN_HISTORY_REPORT log stream:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and Metadata like 'Machine Studying'
| parse Metadata '{"key":"_last_updated_at","worth":{"dateValue":"*"}}' as @last_updated_at
| kind @last_updated_at desc, @timestamp desc
| dedup DocumentTitle

With the previous question, you possibly can acquire insights into the final up to date timestamps of your paperwork, enabling you to make knowledgeable selections concerning the optimum boosting interval. This method makes positive your chat responses are generated utilizing the newest and related data, enhancing the general accuracy and effectiveness of your Amazon Q Enterprise implementation.

The next screenshot reveals an instance consequence.

Frequent doc indexing observability and troubleshooting strategies

On this part, we discover some widespread admin duties for observing and troubleshooting doc indexing utilizing the brand new document-level reporting function.

Record all efficiently listed paperwork from a knowledge supply

To retrieve a listing of all paperwork which have been efficiently listed from a particular knowledge supply, you should use the next CloudWatch question:

fields DocumentTitle, DocumentId, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/'
and ConnectorDocumentStatus.Standing = "SUCCESS"
| kind @timestamp desc | dedup DocumentTitle, DocumentId

The next screenshot reveals an instance consequence.

Record all efficiently listed paperwork from a knowledge supply sync job

To retrieve a listing of all paperwork which have been efficiently listed throughout a particular sync job, you should use the next CloudWatch question:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Standing AS IndexStatus, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id'
and ConnectorDocumentStatus.Standing = "SUCCESS"
| kind DocumentTitle

The next screenshot reveals an instance consequence.

Record all failed listed paperwork from a knowledge supply sync job

To retrieve a listing of all paperwork that didn’t index throughout a particular sync job, together with the error messages, you should use the next CloudWatch question:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Standing AS IndexStatus, ErrorMsg, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id'
and ConnectorDocumentStatus.Standing = "FAILED"
| kind @timestamp desc

The next screenshot reveals an instance consequence.

Record all paperwork that comprises a specific person identify ACL permission from an Amazon Q Enterprise software

To retrieve a listing of paperwork which have a particular person’s ACL permission, you should use the next CloudWatch Logs Insights question:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/' 
and Acl like 'aneesh@mydemoaws.onmicrosoft.com'
| show DocumentTitle, SourceUri

The next screenshot reveals an instance consequence.

Record the ACL of an listed doc from a knowledge supply sync job

To retrieve the ACL data for a particular listed doc from a sync job, you should use the next CloudWatch Logs Insights question:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/data-source-id/run-id' 
and DocumentTitle = "your-document-title"
| show DocumentTitle, Acl

The next screenshot reveals an instance consequence.

Record metadata of an listed doc from a knowledge supply sync job

To retrieve the metadata data for a particular listed doc from a sync job, you should use the next CloudWatch Logs Insights question:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/data-source-id/run-id' 
and DocumentTitle = "your-document-title"
| show DocumentTitle, Metadata

The next screenshot reveals an instance consequence.

Conclusion

The newly launched document-level report in Amazon Q Enterprise supplies enhanced visibility and observability into the doc processing lifecycle throughout knowledge supply sync jobs. This function addresses a crucial want expressed by clients for higher troubleshooting capabilities and entry to detailed details about the indexing standing, metadata, and ACLs of particular person paperwork.

The document-level report is saved in a devoted log stream named SYNC_RUN_HISTORY_REPORT throughout the Amazon Q Enterprise software CloudWatch log group. This report comprises complete data for every doc, together with the doc ID, title, total doc sync standing, error messages (if any), together with its ACLs, and metadata data retrieved from the info sources. The information supply sync run historical past web page now contains an Actions column, offering entry to the document-level report for every sync run. This function considerably improves the power to troubleshoot points associated to doc ingestion and entry management, and points associated to metadata relevance, and supplies higher visibility concerning the paperwork synced with an Amazon Q index.

To get began with Amazon Q Enterprise, discover the Getting began information. To study extra about knowledge supply connectors and greatest practices, see Configuring Amazon Q Enterprise knowledge supply connectors.

Concerning the authors

Aneesh Mohan is a Senior Options Architect at Amazon Internet Companies (AWS), bringing twenty years of expertise in creating impactful options for business-critical workloads. He’s enthusiastic about know-how and loves working with clients to construct well-architected options, specializing in the monetary providers trade, AI/ML, safety, and knowledge applied sciences.

Ashwin Shukla is a Software program Improvement Engineer II on the Amazon Q for Enterprise and Amazon Kendra engineering workforce, with 6 years of expertise in growing enterprise software program. On this function, he works on designing and growing foundational options for Amazon Q for Enterprise.

Introducing document-level sync reviews: Enhanced knowledge sync visibility in Amazon Q Enterprise

Streamlit fairly styled dataframes half 1: utilizing the pandas Styler

From Surrogate Modelling to Aerospace Engineering: a NASA Case Examine | by Piero Paialunga | Aug, 2024

From Surrogate Modelling to Aerospace Engineering: a NASA Case Examine | by Piero Paialunga | Aug, 2024

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

Optimizing Mixtral 8x7B on Amazon SageMaker with AWS Inferentia2

About Us

Category

Recent Posts