MCP and Agentic AI on Google Cloud Run

We’re have moved from AI that primarily responds via text, to AI that manipulates thinngs. These “agentic AI” systems use tools to do that manipulation. The way they interact with tools is via Model Context Protocol (MCP), this open source standard for helping LLM’s connect and use external data sources, or tools.

How and where you run these tools is the purpose of this article, the TL;DR is: “Google Cloud Run”.

But read on if you want the details, and points to look out for.

Why Cloud Run for Agentic AI?

When it comes to deploying these sophisticated agentic AI Google Cloud Run emerges as a great choice, its serverless nature is super scalable, cost efficient, and requires no ops team to keep it running. Aviato often recommend it as part of our AI Deployment Services.

Further it can easily connect to any Databases or LLM’s running on Google Vertex AI without leaving your network (VPC).

What previously might have required dedicated SRE and DevOps teams can now be tackled by an individual developer, freeing up time to innovate on the actual AI Agent.

Architecting Your Agent on Cloud Run

So, what does a typical agentic AI architecture on Cloud Run look like?

At its core, you’ll have:

Cloud Run service for the User Interface
Cloud Run Service for Agent Development Kit (or Langraph, Langchain, etc) to coordinate the agent’s behavior
A Database — A vector DB for Retrieval-Augmented Generation (RAG)
Short Term memory: Session data or caching with Memorystore
Long Term memory: A vector DB for RAG
Vertex AI with your choice of LLM : Google’s Gemini API, custom models or other foundation models deployed on Vertex AI Endpoints
Cloud Run Service for MCP Servers : Agents use tools to perform specific tasks or to interact with external services, APIs, and websites.

MCP Servers

Cloud Run is a well known pattern for all of this, but with MCP being so new and the focus of this article it might be best to take a step back and explain what Model Context Protocol (MCP) does .

MCP directly addresses the inability of an LLM to use tools; it provides a standardized, structured way for systems to expose their capabilities to language models. Here are some examples:

Git — Read, search, and manipulate Git repositories
Google Maps — Location services, directions, and place details
EverArt — AI image generation using various models
Puppeteer — to browse the web, extract information, or perform actions via clicks and keyboard input
Google Drive — File access and search capabilities for Google Drive

Further the use of MCP lets us change our LLM as new ones are released to improve your agents without rebuilding from scratch, a great benefit of using VertexAI we can do this with a line of code.

Practical MCP Server Deployment on Cloud Run

Here’s a quick overview on how you can get your MCP servers up and running on Cloud Run for non production uses.

Deployment from Container Images

If your MCP server is already packaged as a container image (perhaps from Docker Hub), deploying it is straightforward. You’ll use the command:

gcloud run deploy SERVICE_NAME - image IMAGE_URL - port PORT

For instance, deploying a generic MCP container might look like:

gcloud run deploy my-mcp-server - image us-docker.pkg.dev/cloudrun/container/mcp - port 3000

Deployment from Source

If you are deploying a production use case, this is the recommended approach, if you have the source code for an MCP server (Perhaps from GitHub) you can deploy it directly. Simply clone the repository, navigate into its root directory, and use:

gcloud run deploy SERVICE_NAME - source .

Cloud Run will handle the building and deployment, or you can work this into a CI/CD pipeline for a more production ready use case.

Cloud Run does not support MCP servers that rely on Standard Input/Output (stdio) transport. This constraint implicitly pushes MCP server development towards web-centric, network-addressable services, which aligns better with cloud-native architectures and scalability. Developers should use frameworks like FastMCP (the standard Python SDK wrapper) using transport=”streamable-http” to natively align with Cloud Run’s architecture.

State Management Strategies for Agentic AI on Cloud Run

Fortunately, Google Cloud provides robust solutions for managing the various types of state your agentic AI systems will require:

Short Term Memroy / Caching

For data that needs fast access, like session information or frequently accessed data for an agent, connecting your Cloud Run service to Memorystore for Redis is an excellent option.2

Long-term Memory / Persistent Knowledge

For storing conversational history, user profiles, or other forms of persistent agent knowledge, Firestore offers a scalable, serverless NoSQL database solution.

If your agent deals with structured data or requires the powerful RAG capabilities discussed earlier, Cloud SQL for PostgreSQL or AlloyDB for PostgreSQL are ideal choices or one of the many that work on Google’s Vertex AI RAG Engine.

Orchestration Framework Memory

Many AI orchestration frameworks, such as LangChain, come with built-in memory modules. For example, LangChain’s ConversationBufferMemory can store conversation history to provide context across multiple turns. These often integrate with external stores for persistence.

Table 1: State Management Options for Agentic Systems on Cloud Run

Choosing the right state management approach depends heavily on the specific requirement:

The Challenge of Stateful MCP Servers

As highlighted, MCP servers using Streamable HTTP transport might need to maintain a persistent session context, especially to allow clients to resume interrupted connections. The core challenge here, (as of June 2025), is that many official MCP SDKs lack support for external session persistence, aka storing session state in a dedicated service like Redis. Instead, they often keep the session state in the memory of the server instance.

This makes horizontal scaling problematic, if a client’s subsequent request is routed by a load balancer to a different instance from the one that initiated the session, the session context is lost, and the connection will likely fail. This limitation in current MCP SDKs points to a maturity gap in the ecosystem and until SDKs evolve to better support externalized state, designing MCP servers to be stateless is the more resilient cloud native pattern where feasible.

Cloud Run Session Affinity to the Rescue?

Cloud Run offers a feature called session affinity that can help mitigate this issue.

When enabled, Cloud Run uses a session affinity cookie to attempt to route sequential requests from a particular client to the same revision instance.

You can enable this with a gcloud command:

gcloud run services update SERVICE - session-affinity

Or via the Google Cloud Console or YAML config.

However, it’s crucial to understand that session affinity on Cloud Run is “best effort”.

If the targeted instance is terminated (due to scaling, etc) or becomes overwhelmed (reaching maximum request concurrency, etc), session affinity will be broken, and subsequent requests will be routed to a different instance.

So if the in memory state is absolutely critical and irreplaceable, session affinity alone is not a foolproof guarantee of state preservation.

Addressing SDK and Session Affinity Limitations

Given these constraints, there are 2 practical approaches:

Manual Cookie Handling: If you’re using a client SDK that doesn’t natively support cookies and you need to work with load balancer level sticky sessions you might need to implement manual cookie handling in your client code. This is a workaround, but it can be necessary.
Stateless MCP Server Design (preffered): The most robust cloud native approach and what Aviato Consulting recommend, is to design your MCP servers to be stateless. The MCP specification itself permits a stateless server mode where the server doesn’t maintain session context between requests, and clients aren’t expected to resume dropped connections. This design enables seamless horizontal scaling and is ideal for environments where elasticity and load distribution are critical.

Operationalizing AI

Logging:

Cloud Run integrates with Cloud Logging out of the box. You can benefit from the inbuilt logging mechanisms which also has distributed tracing capabilities to trace each HTTP request lifecycle.

Pay As You Go:

Cloud Run charges you money based on the actual usage, and scales to 0. You can set billing alerts to protect against sudden success of your AI tool.

Security:

While Cloud Run’s IAM and Cloud Armor are fantastic infrastructure level protections, MCP now natively supports OAuth 2.1 authorization workflows.

Cloud Run’s native OIDC ID tokens seamlessly integrate with MCP’s new security spec for service-to-service authentication.

Conclusion

The combination of agentic AI, Model Context Protocol (MCP), and Google Cloud Run offers a powerful platform for intelligent automation, creating a cohesive ecosystem with no operational overhead, cost efficiency (scale to 0), and the ability to handle large volumes of traffic.

UPDATES March 2026

MCP protocol and SDKs now support Background Tasks for long-running operations (e.g., using @mcp.tool(task=True) in FastMCP).
In 2025, Anthropic donated MCP to the Linux Foundation
Modern IDEs (like Claude Code, Antigravity, and Cursor) act as immediate clients capable of bridging to Cloud Run hosted servers.