How To not Write an MCP Server

I the possibility to create an MCP server for an observability utility with a purpose to present the AI agent with dynamic code evaluation capabilities. Due to its potential to rework purposes, MCP is a know-how I’m much more ecstatic about than I initially was about genAI on the whole. I wrote extra about that and a few intro to MCPs on the whole in a earlier submit.

Whereas an preliminary POCs demonstrated that there was an immense potential for this to be a pressure multiplier to our product’s worth, it took a number of iterations and a number of other stumbles to ship on that promise. On this submit, I’ll attempt to seize a few of the classes discovered, as I believe that this will profit different MCP server builders.

My Stack

I used to be utilizing Cursor and vscode intermittently as the primary MCP shopper
To develop the MCP server itself, I used the .NET MCP SDK, as I made a decision to host the server on one other service written in .NET

Lesson 1: Don’t dump all your information on the agent

In my utility, one software returns aggregated data on errors and exceptions. The API could be very detailed because it serves a posh UI view, and spews out massive quantities of deeply linked information:

Error frames
Affected endpoints
Stack traces
Precedence and traits
Histograms

My first hunch was to easily expose the API as is as an MCP software. In any case, the agent ought to have the ability to make extra sense of it than any UI view, and catch on to attention-grabbing particulars or connections between occasions. There have been a number of eventualities I had in thoughts as to how I might count on this information to be helpful. The agent may mechanically supply fixes for current exceptions recorded in manufacturing or within the testing atmosphere, let me learn about errors that stand out, or assist me deal with some systematic issues which can be the underlying root reason behind the problems.

The essential premise was due to this fact to permit the agent to work its ‘magic’, with extra information probably that means extra hooks for the agent to latch on in its investigation efforts. I shortly coded a wrapper round our API on the MCP endpoint and determined to start out with a fundamental immediate to see whether or not every part is working:

We will see the agent was sensible sufficient to know that it wanted to name one other software to seize the atmosphere ID for that ‘take a look at’ atmosphere I discussed. With that at hand, after discovering that there was really no current exception within the final 24 hours, it then took the freedom to scan a extra prolonged time interval, and that is when issues received a bit bizarre:

What an odd response. The agent queries for exceptions from the final seven days, will get again some tangible outcomes this time, and but proceeds to ramble on as if ignoring the info altogether. It continues to try to use the software in several methods and completely different parameter mixtures, clearly fumbling, till I discover it flat out calls out the truth that the info is totally invisible to it. Whereas errors are being despatched again within the response, the agent really claims there are no errors. What’s going on?

After some investigation, the issue was revealed to be the truth that we’ve merely reached a cap within the agent’s capability to course of massive quantities of knowledge within the response.

I used an present API that was extraordinarily verbose, which I initially even thought of to be a bonus. The top consequence, nonetheless, was that I someway managed to overwhelm the mannequin. Total, there have been round 360k characters and 16k phrases within the response JSON. This contains name stacks, error frames, and references. This ought to have been supported simply by wanting on the context window restrict for the mannequin I used to be utilizing (Claude 3.7 Sonnet ought to assist as much as 200k tokens), however nonetheless the big information dump left the agent completely stumped.

One technique could be to vary the mannequin to 1 that helps an excellent greater context window. I converted to the Gemini 2.5 professional mannequin simply to check that concept out, because it boasts an outrageous restrict of 1 million tokens. Positive sufficient, the identical question now yielded a way more clever response:

That is nice! The agent was capable of parse the errors and discover the systematic reason behind a lot of them with some fundamental reasoning. Nevertheless, we are able to’t depend on the consumer utilizing a selected mannequin, and to complicate issues, this was output from a comparatively low bandwidth testing atmosphere. What if the dataset had been even bigger?
To unravel this concern, I made some basic adjustments to how the API was structured:

Nested information hierarchy: Hold the preliminary response centered on high-level particulars and aggregations. Create a separate API to retrieve the decision stacks of particular frames as wanted.
Improve queryability: All the queries made to date by the agent used a really small web page measurement for the info (10), if we would like the agent to have the ability to to entry extra related subsets of the info to suit with the constraints of its context, we have to present extra APIs to question errors primarily based on completely different dimensions, for instance: affected strategies, error sort, precedence and impression and many others.

With the brand new adjustments, the software now persistently analyzes necessary new exceptions and comes up with repair options. Nevertheless, I glanced over one other minor element I wanted to kind earlier than I may actually use it reliably.

Lesson 2: What’s the time?

Picture generated by the writer with Midjourney

The keen-eyed reader might have seen that within the earlier instance, to retrieve the errors in a selected time vary, the agent makes use of the ISO 8601 time length format as a substitute of the particular dates and instances. So as a substitute of together with commonplace ‘From’ and ‘To’ parameters with datetime values, the AI despatched a length worth, for instance, seven days or P7D, to point it needs to examine for errors prior to now week.

The explanation for that is considerably unusual — the agent may not know the present date and time! You possibly can confirm that your self by asking the agent that easy query. The beneath would have made sense had been it not for the truth that I typed that immediate in at round midday on Could 4th…

Utilizing time length values turned out to be an amazing resolution that the agent dealt with fairly nicely. Don’t overlook to doc the anticipated worth and instance syntax within the software parameter description, although!

Lesson 3: When the agent makes a mistake, present it the way to do higher

Within the first instance, I used to be really bowled over by how the agent was capable of decipher the dependencies between the completely different software calls As a way to present the correct atmosphere identifier. In learning the MCP contract, it discovered that it needed to name on a dependent one other software to get the checklist of atmosphere IDs first.

Nevertheless, responding to different requests, the agent would typically take the atmosphere names talked about within the immediate verbatim. For instance, I seen that in response to this query: evaluate gradual traces for this methodology between the take a look at and prod environments, are there any vital variations? Relying on the context, the agent would typically use the atmosphere names talked about within the request and would ship the strings “take a look at” and “prod” because the atmosphere ID.

In my unique implementation, my MCP server would silently fail on this state of affairs, returning an empty response. The agent, upon receiving no information or a generic error, would merely stop and attempt to resolve the request utilizing one other technique. To offset that habits, I shortly modified my implementation in order that if an incorrect worth was offered, the JSON response would describe precisely what went unsuitable, and even present a sound checklist of attainable values to save lots of the agent one other software name.

This was sufficient for the agent, studying from its mistake, it repeated the decision with the right worth and someway additionally prevented making that very same error sooner or later.

Lesson 4: Give attention to consumer intent and never performance

Whereas it’s tempting to easily describe what the API is doing, typically the generic phrases don’t fairly enable the agent to comprehend the kind of necessities for which this performance may apply greatest.

Let’s take a easy instance: My MCP server has a software that, for every methodology, endpoint, or code location, can point out the way it’s getting used at runtime. Particularly, it makes use of the tracing information to point which utility flows attain the precise perform or methodology.

The unique documentation merely described this performance:

[McpServerTool,
Description(
@"For this method, see which runtime flows in the application
(including other microservices and code not in this project)
use this function or method.
This data is based on analyzing distributed tracing.")]
public static async Activity GetUsagesForMethod(IMcpService shopper,
[Description("The environment id to check for usages")]
string environmentId,
[Description("The name of the class. Provide only the class name without the namespace prefix.")]
string codeClass,
[Description("The name of the method to check, must specify a specific method to check")]
string codeMethod)

The above represents a functionally correct description of what this software does, but it surely doesn’t essentially make it clear what varieties of actions it may be related for. After seeing that the agent wasn’t selecting this software up for varied prompts I believed it could be pretty helpful for, I made a decision to rewrite the software description, this time emphasizing the use instances:

[McpServerTool,
Description(
@"Find out what is the how a specific code location is being used and by
which other services/code.
Useful in order to detect possible breaking changes, to check whether
the generated code will fit the current usages,
to generate tests based on the runtime usage of this method,
or to check for related issues on the endpoints triggering this code
after any change to ensure it didnt impact it"

Updating the text helped the agent realize why the information was useful. For example, before making this change, the agent would not even trigger the tool in response to a prompt similar to the one below. Now, it has become completely seamless, without the user having to directly mention that this tool should be used:

Lesson 5: Document your JSON responses

The JSON standard, at least officially, does not support comments. That means that if the JSON is all the agent has to go on, it might be missing some clues about the context of the data you’re returning. For example, in my aggregated error response, I returned the following score object:

"Score": {"Score":21,
"ScoreParams":{ "Occurrences":1,
"Trend":0,
"Recent":20,
"Unhandled":0,
"Unexpected":0}}

Without proper documentation, any non-clairvoyant agent would be hard pressed to make sense of what these numbers mean. Thankfully, it is easy to add a comment element at the beginning of the JSON file with additional information about the data provided:

"_comment": "Each error contains a link to the error trace,
which can be retrieved using the GetTrace tool,
information about the affected endpoints the code and the
relevant stacktrace.
Each error in the list represents numerous instances
of the same error and is given a score after its been
prioritized.
The score reflects the criticality of the error.
The number is between 0 and 100 and is comprised of several
parameters, each can contribute to the error criticality,
all are normalized in relation to the system
and the other methods.
The score parameters value represents its contributation to the
overall score, they include:

1. 'Occurrences', representing the number of instances of this error
compared to others.
2. 'Trend' whether this error is escalating in its
frequency.
3. 'Unhandled' represents whether this error is caught
internally or poropagates all the way
out of the endpoint scope
4. 'Unexpected' are errors that are in high probability
bugs, for example NullPointerExcetion or
KeyNotFound",
"EnvironmentErrors":[]

This allows the agent to elucidate to the consumer what the rating means in the event that they ask, but additionally feed this clarification into its personal reasoning and proposals.

Selecting the best structure: SSE vs STDIO,

There are two architectures you should utilize in creating an MCP server. The extra frequent and broadly supported implementation is making your server accessible as a command triggered by the MCP shopper. This could possibly be any CLI-triggered command; npx, docker, and python are some frequent examples. On this configuration, all communication is finished by way of the method STDIO, and the method itself is working on the shopper machine. The shopper is liable for instantiating and sustaining the lifecycle of the MCP server.

This client-side structure has one main downside from my perspective: Because the MCP server implementation is run by the shopper on the native machine, it’s a lot tougher to roll out updates or new capabilities. Even when that drawback is someway solved, the tight coupling between the MCP server and the backend APIs it relies on in our purposes would additional complicate this mannequin when it comes to versioning and ahead/backward compatibility.

For these causes, I selected the second sort of MCP Server — an SSE Server hosted as part of our utility companies. This removes any friction from working CLI instructions on the shopper machine, in addition to permits me to replace and model the MCP server code together with the applying code that it consumes. On this state of affairs, the shopper is supplied with a URL of the SSE endpoint with which it interacts. Whereas not all shoppers at present assist this feature, there’s a sensible commandMCP known as supergateway that can be utilized as a proxy to the SSE server implementation. Which means customers can nonetheless add the extra broadly supported STDIO variant and nonetheless eat the performance hosted in your SSE backend.

MCPs are nonetheless new

There are numerous extra classes and nuances to utilizing this deceptively easy know-how. I’ve discovered that there’s a huge hole between implementing a workable MCP to 1 that may really combine with consumer wants and utilization eventualities, even past these you’ve anticipated. Hopefully, because the know-how matures, we’ll see extra posts on Greatest Practices.

Wish to Join? You possibly can attain me on Twitter at @doppleware or by way of LinkedIn.
Comply with my mcp for dynamic code evaluation utilizing observability at https://github.com/digma-ai/digma-mcp-server

How To not Write an MCP Server

A Evaluate of AccentFold: One of many Most Necessary Papers on African ASR

Construct an clever neighborhood agent to revolutionize IT assist with Amazon Q Enterprise

Construct an clever neighborhood agent to revolutionize IT assist with Amazon Q Enterprise

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

How Cursor Really Indexes Your Codebase

Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

The Good-Sufficient Fact | In direction of Knowledge Science

About Us

Category

Recent Posts