As Kubernetes clusters develop in complexity, managing them effectively turns into more and more difficult. Troubleshooting fashionable Kubernetes environments requires deep experience throughout a number of domains—networking, storage, safety, and the increasing ecosystem of CNCF plugins. With Kubernetes now internet hosting mission-critical workloads, speedy problem decision has turn out to be paramount to sustaining enterprise continuity.
Integrating superior generative AI instruments like K8sGPT and Amazon Bedrock can revolutionize Kubernetes cluster operations and upkeep. These options go far past easy AI-powered troubleshooting, providing enterprise-grade operational intelligence that transforms how groups handle their infrastructure. By means of pre-trained information and each built-in and customized analyzers, these instruments allow speedy debugging, steady monitoring, and proactive problem identification—permitting groups to resolve issues earlier than they influence vital workloads.
K8sGPT, a CNCF sandbox undertaking, revolutionizes Kubernetes administration by scanning clusters and offering actionable insights in plain English by way of cutting-edge AI fashions together with Anthropic’s Claude, OpenAI, and Amazon SageMaker customized and open supply fashions. Past fundamental troubleshooting, K8sGPT options subtle auto-remediation capabilities that perform like an skilled Web site Reliability Engineer (SRE), monitoring change deltas in opposition to present cluster state, implementing configurable threat thresholds, and offering rollback mechanisms by way of Mutation customized assets. Its Mannequin Communication Protocol (MCP) server help permits structured, real-time interplay with AI assistants for persistent cluster evaluation and pure language operations. Amazon Bedrock enhances this ecosystem by offering absolutely managed entry to basis fashions with seamless AWS integration. This strategy represents a paradigm shift from reactive troubleshooting to proactive operational intelligence, the place AI assists in resolving issues with enterprise-grade controls and full audit trails.
This submit demonstrates the perfect practices to run K8sGPT in AWS with Amazon Bedrock in two modes: K8sGPT CLI and K8sGPT Operator. It showcases how the answer can assist SREs simplify Kubernetes cluster administration by way of steady monitoring and operational intelligence.
Answer overview
K8sGPT operates in two modes: the K8sGPT CLI for native, on-demand evaluation, and the K8sGPT Operator for steady in-cluster monitoring. The CLI presents flexibility by way of command-line interplay, and the Operator integrates with Kubernetes workflows, storing outcomes as customized assets and enabling automated remediation. Each operational fashions can invoke Amazon Bedrock fashions to offer detailed evaluation and proposals.
K8sGPT CLI structure
The next structure diagram reveals that after a person’s function is authenticated by way of AWS IAM Id Middle, the person runs the K8sGPT CLI to scan Amazon Elastic Kubernetes Service (Amazon EKS) assets and invoke an Amazon Bedrock mannequin for evaluation. The K8sGPT CLI gives an interactive interface for retrieving scan outcomes, and mannequin invocation logs are despatched to Amazon CloudWatch for additional monitoring. This setup facilitates troubleshooting and evaluation of Kubernetes assets within the CLI, with Amazon Bedrock fashions providing insights and proposals on the Amazon EKS atmosphere.
The K8sGPT CLI comes with wealthy options, together with a customized analyzer, filters, anonymization, distant caching, and integration choices. See the Getting Began Information for extra particulars.
K8sGPT Operator structure
The next structure diagram reveals an answer the place the K8sGPT Operator put in within the EKS cluster makes use of Amazon Bedrock fashions to research and clarify findings from the EKS cluster in actual time, serving to customers perceive points and optimize workloads. The person collects these occasion insights from the K8sGPT Operator by merely querying by way of a typical Kubernetes technique resembling kubectl
. Mannequin invocation logs, together with detailed findings from the K8sGPT Operator, are logged in CloudWatch for additional evaluation.
On this mannequin, no extra CLI instruments are required to put in aside from the kubectl
CLI. As well as, the only sign-on (SSO) function that the person assumed doesn’t have to have Amazon Bedrock entry, as a result of the K8sGPT Operator will assume an AWS Id and Entry Administration (IAM) machine function to invoke the Amazon Bedrock massive language mannequin (LLM).
When to make use of which modes
The next desk gives a comparability of the 2 modes with widespread use circumstances.
K8sGPT CLI | K8sGPT Operator | |
Entry Administration | Human function (IAM Id Middle) | Machine function (IAM) |
Function | Wealthy options:
|
|
Widespread Use circumstances |
|
|
Within the following sections, we stroll you thru the 2 set up modes of K8sGPT.
Set up the K8sGPT CLI
Full the next steps to put in the K8sGPT CLI:
- Allow Amazon Bedrock within the US West (Oregon) AWS Area. Make sure that to incorporate the next role-attached insurance policies to request or modify entry to Amazon Bedrock FMs:
aws-marketplace:Subscribe
aws-marketplace:Unsubscribe
aws-marketplace:ViewSubscriptions
- Request entry to Amazon Bedrock FMs in US West (Oregon) Area:
- On the Amazon Bedrock console, within the navigation pane, beneath Bedrock configurations, select Mannequin entry.
- On the Mannequin entry web page, select Allow particular fashions.
- Choose the fashions, then select Subsequent and Submit to request entry.
- Set up K8sGPT following the official directions.
- Add Amazon Bedrock and the FM as an AI backend supplier to the K8sGPT configuration:
Word: On the time of writing, K8sGPT contains help for Anthropic’s state-of-the-art Claude 4 Sonnet and three.7 Sonnet fashions.
- Make the Amazon Bedrock backend default:
- Replace
Kubeconfig
to hook up with an EKS cluster:
- Analyze points inside the cluster utilizing Amazon Bedrock:
Set up the K8sGPT Operator
To put in the K8sGPT Operator, first full the next stipulations:
- Set up the most recent model of Helm. To examine your model, run
helm model
. - Set up the most recent model of eksctl. To examine your model, run
eksctl model
.
Create the EKS cluster
Create an EKS cluster with eksctl
with the pre-defined eksctl
config file:
You need to get the next anticipated output:EKS cluster "eks" in "us-west-2" area is prepared
Create an Amazon Bedrock and CloudWatch VPC non-public endpoint (non-compulsory)
To facilitate non-public communication between Amazon EKS and Amazon Bedrock, in addition to CloudWatch, it is strongly recommended to make use of a digital non-public cloud (VPC) non-public endpoint. This can guarantee that the communication is retained inside the VPC, offering a safe and personal channel.
Seek advice from Create a VPC endpoint to arrange the Amazon Bedrock and CloudWatch VPC endpoints.
Create an IAM coverage, belief coverage, and function
Full the next steps to create an IAM coverage, belief coverage, and function to solely enable the K8sGPT Operator to work together with Amazon Bedrock for least privilege:
- Create a job coverage with Amazon Bedrock permissions:
- Create a permission coverage:
- Create a belief coverage:
- Create a job and connect the belief coverage:
Set up Prometheus
Prometheus will probably be used for monitoring. Use the next command to put in Prometheus utilizing Helm within the k8sgpt-operator-system
namespace:
Set up the K8sGPT Operator by way of Helm
Set up the K8sGPT Operator by way of Helm with Prometheus and Grafana enabled:
Patch the K8sGPT controller supervisor to be acknowledged by the Prometheus operator:
Affiliate EKS Pod Id
EKS Pod Id is an AWS characteristic that simplifies how Kubernetes purposes acquire IAM permissions by empowering cluster directors to affiliate IAM roles which have least privileged permissions with Kubernetes service accounts instantly by way of Amazon EKS. It gives a easy strategy to enable EKS pods to name AWS companies resembling Amazon Easy Storage Service (Amazon S3). Seek advice from Find out how EKS Pod Id grants pods entry to AWS companies for extra particulars.
Use the next command to carry out the affiliation:
Scan the cluster with Amazon Bedrock because the backend
Full the next steps:
- Deploy a K8sGPT useful resource utilizing the next YAML, utilizing Anthropic’s Claude 3.5 mannequin on Amazon Bedrock because the backend:
- When the
k8sgpt-bedrock
pod is working, use the next command to examine the checklist of scan outcomes:
- Use the next command to examine the main points of every scan consequence:
Arrange Amazon Bedrock invocation logging
Full the next steps to allow Amazon Bedrock invocation logging, forwarding to CloudWatch or Amazon S3 as log locations:
- Create a CloudWatch log group:
- On the CloudWatch console, select Log teams beneath Logs within the navigation pane.
- Select Create log group.
- Present particulars for the log group, then select Create.
- Allow mannequin invocation logging:
- On the Amazon Bedrock console, beneath Bedrock configurations within the navigation pane, select Settings.
- Allow Mannequin invocation logging.
- Choose which information requests and responses you need to publish to the logs.
- Choose CloudWatch Logs solely beneath Choose the logging locations and enter the invocation logs group identify.
- For Select a technique to authorize Bedrock, choose Create and use a brand new function.
- Select Save settings.
Use case- Constantly scan the EKS cluster with the K8sGPT Operator
This part demonstrates the way to leverage the K8sGPT Operator for steady monitoring of your Amazon EKS cluster. By integrating with fashionable observability instruments, the answer gives complete cluster well being visibility by way of two key interfaces: a Grafana dashboard that visualizes scan outcomes and cluster well being metrics, and CloudWatch logs that seize detailed AI-powered evaluation and proposals from Amazon Bedrock. This automated strategy eliminates the necessity for handbook kubectl instructions whereas guaranteeing proactive identification and backbone of potential points. The combination with current monitoring instruments streamlines operations and helps preserve optimum cluster well being by way of steady evaluation and clever insights.
Observe the well being standing of your EKS cluster by way of Grafana
Log in to Grafana dashboard utilizing localhost:3000
with the next credentials embedded:
The next screenshot showcases the K8sGPT Overview dashboard.
The dashboard options the next:
- The Consequence Form sorts part represents the breakdown of the totally different Kubernetes useful resource sorts, resembling companies, pods, or deployments, that skilled points based mostly on the K8sGPT scan outcomes
- The Evaluation Outcomes part represents the variety of scan outcomes based mostly on the K8sGPT scan
- The Outcomes over time part represents the depend of scan outcomes change over time
- The remainder of the metrics showcase the efficiency of the K8sGPT controller over time, which assist in monitoring the operational effectivity of the K8sGPT Operator
Use a CloudWatch dashboard to examine recognized points and get suggestions
Amazon Bedrock mannequin invocation logs are logged into CloudWatch, which we arrange beforehand. You should use a CloudWatch Logs Insights question to filter mannequin invocation enter and output for cluster scan suggestions and output as a dashboard for fast entry. Full the next steps:
- On the CloudWatch console, create a dashboard.
- On the CloudWatch console, select the CloudWatch log group and run the next question to filter the scan consequence carried out by the K8sGPT Operator:
- Select Create Widget to save lots of the dashboard.
It’ll robotically present the mannequin invocation log with enter and output from the K8sGPT Operator. You may broaden the log to examine the mannequin enter for errors and output for suggestions given by the Amazon Bedrock backend.
Lengthen K8sGPT with Customized Analyzers
K8sGPT’s customized analyzers characteristic permits groups to create specialised checks for his or her Kubernetes environments, extending past the built-in evaluation capabilities. This highly effective extension mechanism permits organizations to codify their particular operational necessities and greatest practices into K8sGPT’s scanning course of, making it potential to watch facets of cluster well being that aren’t lined by default analyzers.
You may create customized analyzers to watch numerous facets of your cluster well being. For instance, you would possibly need to monitor Linux disk utilization on nodes – a standard operational concern that might influence cluster stability. The next steps exhibit the way to implement and deploy such an analyzer:
First, create the analyzer code:
Construct your analyzer right into a docker picture and deploy the analyzer to your cluster:
Lastly, configure K8sGPT to make use of your customized analyzer:
This strategy lets you lengthen K8sGPT’s capabilities whereas sustaining its integration inside the Kubernetes ecosystem. Customized analyzers can be utilized to implement specialised well being checks, safety scans, or every other cluster evaluation logic particular to your group’s wants. When mixed with K8sGPT’s AI-powered evaluation by way of Amazon Bedrock, these customized checks present detailed, actionable insights in plain English, serving to groups rapidly perceive and resolve potential points.
K8sGPT privateness concerns
K8sGPT collects information by way of its analyzers, together with container standing messages and pod particulars, which will be exhibited to customers or despatched to an AI backend when the --explain
flag is used. Information sharing with the AI backend happens provided that the person opts in by utilizing this flag and authenticates with the backend. To reinforce privateness, you possibly can anonymize delicate information resembling deployment names and namespaces with the --anonymize
flag earlier than sharing. K8sGPT doesn’t accumulate logs or API server information past what is important for its evaluation capabilities. These practices make certain customers have management over their information and that it’s dealt with securely and transparently. For extra data, seek advice from Privateness within the K8sGPT documentation.
Clear Up
Full the next steps to scrub up your assets:
- Run the next command to delete the EKS cluster:
- Delete the IAM function (k8sgpt-bedrock).
- Delete the CloudWatch logs and dashboard.
Conclusion
The K8sGPT and Amazon Bedrock integration can revolutionize Kubernetes upkeep utilizing AI for cluster scanning, problem analysis, and actionable insights. The submit mentioned greatest practices for K8sGPT on Amazon Bedrock in CLI and Operator modes and highlighted use circumstances for simplified cluster administration. This answer combines K8sGPT’s SRE experience with Amazon Bedrock FMs to automate duties, predict points, and optimize assets, lowering operational overhead and enhancing efficiency.
You should use these greatest practices to determine and implement probably the most appropriate use circumstances to your particular operational and administration wants. By doing so, you possibly can successfully enhance Kubernetes administration effectivity and obtain larger productiveness in your DevOps and SRE workflows.
To be taught extra about K8sGPT and Amazon Bedrock, seek advice from the next assets:
In regards to the authors
Angela Wang is a Technical Account Supervisor based mostly in Australia with over 10 years of IT expertise, specializing in cloud-native applied sciences and Kubernetes. She works carefully with clients to troubleshoot complicated points, optimize platform efficiency, and implement greatest practices for value optimized, dependable and scalable cloud-native environments. Her hands-on experience and strategic steering make her a trusted accomplice in navigating fashionable infrastructure challenges.
Haofei Feng is a Senior Cloud Architect at AWS with over 18 years of experience in DevOps, IT Infrastructure, Information Analytics, and AI. He makes a speciality of guiding organizations by way of cloud transformation and generative AI initiatives, designing scalable and safe GenAI options on AWS. Primarily based in Sydney, Australia, when not architecting options for purchasers, he cherishes time together with his household and Border Collies.
Eva Li is a Technical Account Supervisor at AWS situated in Australia with over 10 years of expertise within the IT business. Specializing in IT infrastructure, cloud structure and Kubernetes, she guides enterprise clients to navigate their cloud transformation journeys and optimize their AWS environments. Her experience in cloud structure, containerization, and infrastructure automation helps organizations bridge the hole between enterprise goals and technical implementation. Outdoors of labor, she enjoys yoga and exploring Australia’s bush strolling trails with mates.
Alex Jones is a Principal Engineer at AWS. His profession has centered largely on extremely constrained environments for bodily and digital infrastructure. Working at firms resembling Microsoft, Canoncial and American Categorical, he has been each an engineering chief and particular person contributor. Outdoors of labor he has based a number of fashionable tasks resembling OpenFeature and extra not too long ago the GenAI accelerator for Kubernetes, K8sGPT. Primarily based in London, Alex has a accomplice and two kids.