19 minute read

Published:

Motivation:

State of the art LLMs are rapidly changing the way we interact with code and our models.

~Insert any-number of news headlines describing major economic/software/societal shifts attributable to Claude or ChatGPT.

The nature of our work, computationally-based risk assessment and decision support, has pushed us to stay cognizant of emergent tools. At the moment, it feels almost overwhelming how quickly things are accelerating.

Last fall, Lillian made a nice post “A beginner’s guide for the LLM-curious (Part 1): A very basic introduction to NLP and LLMs.” We decided this would be a nice continuation, and effort to stay up-to-date with the most cutting edge developments.

Over the past year, there have been major developments in LLMs and the interfaces through which we interact with them. This post serves to highlight just one of these developments, the Model Context Protocol (MCP).

This post does not attempt to provide step-by-step instructions for constructing a custom MCP. The reason being:

  • Custom MCPs require custom implementations that are model-specific, and instructions for the Pywr-DRB MCP would not be helpful for someone working with a different model
  • There are plenty of good tutorials online
  • This technology is rapidly developing and any instructions written today may be outdated in 6 months

Instead, in this post, I’ll briefly introduce the concept of an MCP and provide an argument for why you should consider building a custom MCP for your own model codebase. Lastly, I’ll demonstrate the value of using a custom MCP when working with your code base by presenting and using the pywrdrb-mcp MCP that I developed for our Pywr-DRB model.

Disclaimer:

I used Claude Code to support development of the pywrdrb-mcp repo discussed at the end of this post. After all, the MCP was developed by Anthropic and the Claude models are particularly good at developing the MCP source code. These MCPs should always be developed with close human oversight and guidance.

However, the written content and ideas in this blog post are 100% human!

Content:

  • What is a Model Context Protocol (MCP)?
    • Key MCP terminology
    • Quick overview of fastmcp and primitives
  • Why you should develop an MCP for you systems model
    • The problem with large research codebases
    • How a custom MCP helps
  • Demo: Development of the pywrdrb-mcp
    • The Pywr-DRB model
    • pywrdrb-mcp
      • Static analysis with ast
      • Writing MCP primitives
      • Bringing it all together
  • References

What is a Model Context Protocol (MCP)

In November 2024 Anthropic began promoting the Model Context Protocol (MCP) starting with a public announcement, “Introducing the Model Context Protocol”, describing the MCP as:

“a new standard for connecting AI assistants to the systems where data lives, including content repositories, business tools, and development environments. Its aim is to help frontier models produce better, more relevant responses.”

Anthropic’s MCP standard was quickly adopted, and soon even OpenAI had adopted this method for connecting AI model to data.

Rather than trying to cover all of the details here, I will hit a few key ideas but generally refer you to the MCP documentation site: modelcontextprotocol.io

The diagram below is from the documentation, and shows how the MCP serves as a ‘middle man’ when connecting an LLM-based interfaces (left side) to external applications, datasets or code repositories (right side).
![[Use custom MCP implementations when working with model code - MCP diagram.png]] Figure source here.

Over the last year, I assumed MCPs were only useful for connecting to major applications like Google Drive or Slack. I had not considered the value of building a custom MCP for a local research codebase.

That realization is really what this post is about. Before getting into the argument for custom MCPs, let’s start with some key terminology.

Key MCP terminology

An MCP server is a lightweight program that exposes data and functionality to an LLM client. From the docs: “MCP server refers to the program that serves context data, regardless of where it runs.”

The server may run locally on your machine and is launched automatically by the LLM client (Claude Code, Claude Desktop, etc.) when a conversation starts.

The important thing to understand is that each MCP server is specific to a particular codebase or application. You build a server that knows the structure, conventions, and data of your model, and you expose that knowledge through a set of defined actions. This is what makes MCPs so valuable for working with basin-specific water resources model, since the unique context of reservoir operations/regulations/etc are so important.

An MCP server communicates through what are called “primitives”, which are just the different types of actions and data that a server can expose to the LLM. Think of them as 3 types of functions that the LLM can utilize.

MCP defines three core primitives:

  1. Tools may be the most important primitive. A tool is a Python function that the LLM can call during a conversation to retrieve information or perform an action. When the LLM needs to answer a question about your code, it invokes a function to return accurate information rather than guessing or manually searching the code base.

  2. Resources are pre-computed reference data that the LLM can access. These can include lists, dictionaries and or function/class docstrings. If the LLM need information about a specific function, then it can use a resource to retrieve the specific function docstring.

  3. Prompts are reusable templates that guide the LLM through multi-step workflows. For example, “how do I add a new node to this model?” triggers a prompt that walks through the Pywr-DRB-specific pattern for creating and registering a custom Pywr parameter class.

There are a set of rapidly emerging opensource tools to support custom MCPs. For this project, I’ve used the fastmcp package.

Quick overview of fastmcp and primitives

The fastmcp package provides a simple decorator-based API where you write normal Python functions and register them as MCP tools. The function’s docstring becomes the tool description that the LLM reads, and the function arguments become the tool’s input parameters. The LLM uses these descriptions to decide when and how to call each tool.

The package defines multiple decorators which are used to define the ‘primitives’ described above:

  • @mcp.tool
  • @mcp.resource
  • @mcp.prompt

An example from the fastmcp landing page is helpful for understanding more concretely the utility of a MCP:

from fastmcp import FastMCP
mcp = FastMCP("Demo")

@mcp.tool
def add(a: int, b: int) -> int:
    """Add two numbers"""
    return a + b

if __name__ == "__main__":
    mcp.run()

If you then enable this Demo MCP within Claude/ChatGPT, the LLM would be able to access and use the add() function.

Now imagine replacing this add() tool with a function that returns your model’s network topology or reservoir operating rules, pulled directly from the source code.

The key takeaway here is that the MCP framework itself is simple. The real work is in deciding what information your LLM needs about your specific model, and writing the tools and resources that deliver it reliably.

Why you should develop an MCP for your systems model

Are you using Claude Code, ChatGPT, Cursor or any of the other LLM-aided tools to support code development or understanding? Be honest.

If so, you should consider using a custom MCP for the following reasons.

The problem with large research codebases

If you work with a water resources systems model (or any large simulation model for that matter), you are familiar with the pattern of these code bases growing more complex over time. When a repository has >50 files, an LLM agent will need to spend meaningful time scanning and searching the files in order to retrieve necessary context and information. Sometimes, the LLM will ignore this context and make assumptions, which is prone to introducing errors into your code.

When I use an LLM to help with code development, I consistently run into the same issues:

  • Hallucinations or bad assumptions. The LLM draws from general knowledge about water resource models rather than your actual source code, leading to incorrect assumptions about model functionality and internal relationships

  • Missing or ignored context. Even tools like Claude Code that can read your files will typically scan a few files at a time. They often miss the connections between modules, or overlook operational rules that are defined in CSV data files rather than in Python code.

  • Inconsistency across sessions. Every new conversation starts from scratch. The LLM has no memory of the model architecture, the node naming conventions, or the operational rules you explained to it yesterday.

  • Risk of unintended edits. When an LLM has direct access to your model repository, there is always a risk that it modifies files in ways you did not anticipate.

How a custom MCP helps

A custom MCP addresses each of these problems by giving the LLM a structured, read-only interface to your codebase.

Accurate, source-code based responses instead of hallucinations. When the LLM needs to know what functions exist or how model nodes are connected, it calls an MCP tool that reads directly from the source code. This information is structured and reliably provided instead of requiring the LLM to consistently re-search the code base.

Consistent context across sessions. The MCP server exposes the same information, tools and resources every time. Whether I start a new conversation today or next week, the LLM has the same reliable interface to access the Pywr-DRB model information.

No unintended write risk. The MCP can be developed in a completely separate repository. It reads from your model code but never modifies it. This separation gives you control over exactly what information the LLM can access and what it can do with it.

Efficiency. Rather than having the LLM read through entire source files looking for relevant information, a well-designed MCP tool can return exactly what is needed in a single call. This saves context window space and produces faster, more focused conversation.

If you are already using LLMs to support your code development, a custom MCP may be a relatively small investment that makes those interactions significantly more reliable and accurate.


Demo: Development of the pywrdrb-mcp

My research for the last several years has focused on water supply risks in the Delaware River Basin (DRB), using a model developed by Andrew Hamilton and myself named Pywr-DRB (Hamilton et al., 2024).

I am constantly working on projects that require deep understanding of the water systems model code. This accurate, model-specific context is critical at any point when interacting with the model, from setup and simulation through understanding and interpretation of model outputs.

When using LLMs to aid in that development, I found myself repeatedly skeptical that the model was going to (a) understand the nuances of the Pywr-DRB model or (b) overlook important details like how the FFMP drought levels interact with reservoir release rules.

This motivated me to build a custom MCP for Pywr-DRB, to ensure that the LLM had reliable and accurate Pywr-DRB model context and information.

The Pywr-DRB model

For an in-depth understanding of the model, you should review the original model publication from Hamilton et al. (2024), “Pywr-DRB: An open-source Python model for water availability and drought risk assessment in the Delaware River Basin”.

Alternatively you can glance through the blog post by Chung-Yi and Myself “Introducing Pywr-DRB – Part 1 or the Pywr-DRB github README.

For the sake of this MCP-focused post, you need to know that Pywr-DRB is a water resources systems model of the Delaware River Basin, with daily-timestep representation of:

  • Nodal infrastructure network with 17 reservoirs,
  • Reservoir operations (17 reservoirs)
  • Supreme Court mandated minimum streamflow targets
  • Transbasin diversions to New York City (NYC) and New Jersey (NJ)

The codebase is sizable. There are dozens of custom parameter classes, multi-step volume balancing algorithms, and operational rules that are defined across Python source files, CSV data tables, and configuration objects.

pywrdrb-mcp

My requirements going into this project were:

  1. The MCP should provide reliable access to model details (nodes, reservoirs details, parameter functionality, network topology) without requiring the LLM to scan through source files
  2. It should never import pywrdrb or require any of its dependencies (numpy, pandas, pywr, etc.)
  3. It should be easy to register with Claude Code, Claude Desktop, or other MCP-compatible tools

The development process can be broken into three main tasks:

  1. Static analysis and indexing of all source code using Python’s ast module
  2. Writing MCP primitives (tools, resources, prompts) that query the indexed information
  3. Launching the server and connecting it to Claude Code or other LLM interfaces

I’ll briefly touch on the first two below. For the full details on point 3, you are referred to the project repo.

Static analysis with ast

One design decision worth explaining is how the MCP reads the Pywr-DRB source code. An easy approach would be to simply import pywrdrb and inspect its objects at runtime. However, Pywr-DRB has many dependencies (pywr, h5py, etc) that would need to be installed just to run the MCP server. I wanted to avoid that.

Instead, pywrdrb-mcp uses Python’s built-in ast (Abstract Syntax Tree) module. The ast module can parse a Python source file into a tree data structure that represents all of the code’s structure (classes, functions, variables, etc.) without ever executing it.

At startup, the MCP server walks through every .py file in the Pywr-DRB source tree and parses it with ast. It extracts class definitions, method signatures, docstrings, and even evaluates simple data structures like lists and dictionaries using ast.literal_eval().

All of this information is cached in an index object that the tools query during conversations. This indexing during startup is actually very fast, and has the benefit of being up-to-date every time I restart the session.

Writing MCP primitives

As mentioned above, I used the fastmcp package to define tool, resource and prompt primitives. Each of these primitive classes is located within its own subfolder.

Importantly, an MCP is only as good as its primitive functionality. With that in mind, the design of each tool matters more than the number of tools you create.

Here is a simplified example of a mcp.tool primitive from pywrdrb-mcp.

This tool lets the LLM dictionary containing information about the Pywr-DRB network topology for a specific model node. Specifically, this function will return lists of (a) all upstream nodes, (b) the immediate downstream node and (c) lag travel time between the node and the downstream node.

Note that the index object in this example is the custom data structure built by parsing the Pywr-DRB source tree at startup using Python’s ast module, as described above.

from fastmcp import FastMCP
mcp = FastMCP("pywrdrb-mcp")

@mcp.tool()
def get_node_topology(node_name: str) -> str:
    """Look up the river network topology for a specific node.

    Returns upstream nodes, downstream node, travel time lag,
    and USGS/NHM/NWM gage IDs for the given node.
    """
    result = {
        "node": node_name,
        "upstream_nodes": index.upstream_nodes.get(node_name, []),
        "downstream_node": index.immediate_downstream.get(node_name),
        "downstream_lag_days": index.downstream_lags.get(node_name),
    }
    return json.dumps(result, indent=2)

The pattern is the same each time you define an MCP primitive: you write a function that queries your model’s data, add the @mcp.tool() decorator, and the LLM gains access to that specific information.

The logic inside each tool is entirely specific to your codebase. Someone building an MCP for a different model would write completely different tools tailored to their own code structure.

Bringing it all together

Putting the ast-based index and the fastmcp primitives together, the full pywrdrb-mcp project looks like this:

pywrdrb-mcp/
├── src/pywrdrb_mcp/
│   ├── server.py          # FastMCP entry point
│   ├── config.py          # PYWRDRB_ROOT path
│   ├── index/             # AST-based static analysis engine
│   ├── tools/             # 19 MCP tools (8 modules)
│   ├── resources/         # 15 MCP resources
│   ├── prompts/           # 6 MCP prompt templates (how_to_* convention)
│   └── content/           # Hand-written domain knowledge (markdown)
└── tests/

To see how I registered the pywrdrb-mcp with Claude, see the README.md in the project repo.

Finally, a demonstration.

Example LLM interaction when pywrdrb-mcp is available

To give a sense of what this looks like in practice, here is a brief exchange with Claude Opus 4.6 when the pywrdrb-mcp server is connected.

My prompt: Provide a concise explanation of how Pywr-DRB implements flexible flow management (FFMP) operational rules.

After seeing this prompt, Claude automatically recognized that it should use the pywrdrb-mcp, I did not have to instruct it to do so!

It used 7 of the MCP tools to retrieve FFMP-specific implementation as well as details regarding different class objects defined in the model:

![[Use custom MCP implementations when working with model code - claude using mcp.png]]

After retrieving this information from the MCP, it them was able to respond:

Claude Response: Now I have a thorough picture. Here’s how Pywr-DRB implements the FFMP:

The FFMP implementation in Pywr-DRB lives primarily in parameters/ffmp.py and is driven by data tables in data/operational_constants/. It consists of 8 custom Pywr Parameter classes that together encode the drought management, release coordination, and flood control logic for the three NYC reservoirs (Cannonsville, Pepacton, Neversink).

The system operates through several interconnected layers:

  1. Drought Level Determination via Storage Zones The model defines 6 drought levels (L1a through L5) based on aggregate NYC reservoir storage relative to time-varying thresholds. These thresholds are specified as daily profiles of storage fraction (0–1) that vary seasonally. For example, the L2 boundary ranges from 0.75 in winter up to 1.0 in spring/early summer, then declines through the fall. L1a is implicitly “above L1b.” The current drought level at each timestep determines which operational rules apply.

  2. Release Factor Calculation (NYCCombinedReleaseFactor) This is the core drought-response parameter. For each reservoir, it computes a release multiplier that blends two signals: … …

Without the MCP, these questions would require the LLM to either (1) search the web for this information, (2) search for the Pywr-DRB source code and scan multiple files and hope it finds the right information or (3) guess the answer based on prior context and training.

With the MCP, the answer comes directly from the model source code.

Conclusions

If you work with a codebase and you are using LLMs as part of your development workflow, I would encourage you to consider building a custom MCP for your project. The investment may relatively small, and the payoff is that every LLM conversation you have about your model starts from a place of accurate, structured information and context.

Thanks for reading, till next time.

References

Hamilton, A. L., Amestoy, T. J., & Reed, Patrick. M. (2024). Pywr-DRB: An open-source Python model for water availability and drought risk assessment in the Delaware River Basin. Environmental Modelling & Software, 106185. https://doi.org/10.1016/j.envsoft.2024.106185