Previously, we used tool calls to equip our Study Buddy agent with the right capabilities. Now we'll look at how to deploy it so it's accessible to the world through a REST API.
There are two common ways to achieve this:
Google ADK–supported: Leverage AI Engine, Cloud Run, or GKE (Google Kubernetes Engine)
Custom FastAPI Deployment: Build your own REST API wrapper
For Google ADK–supported deployment, please refer to the official docs
In this section, I'll demonstrate our custom FastAPI deployment with detailed implementation.
We've built a complete REST API that wraps our StudyBuddy agent with proper session management. Let's dive deep into how it works.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ FastAPI App │───▶│ Runner │───▶│ StudyBuddy │
│ │ │ │ │ Agent │
├─────────────────┤ ├─────────────────┤ └─────────────────┘
│ Session Mgmt │ │ Event Handling │
│ Request/Response│ │ Message Routing │
└─────────────────┘ └─────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ InMemorySessionService │
│ ┌───────────────┬───────────────────────┐ │
│ │ user_id │ Sessions │ │
│ │ "student" │ [session1, session2] │ │
│ └───────────────┴───────────────────────┘ │
└─────────────────────────────────────────────┘
src/api/
├── __init__.py # Package exports
└── api_server.py # Main FastAPI application
The Runner is the central component that manages the execution of your StudyBuddy agent. Think of it as an orchestra conductor that coordinates between the API requests and the AI agent.
from google.adk.runners import Runner
from google.adk.sessions.in_memory_session_service import InMemorySessionService
# Initialize the StudyBuddy agent runner
runner = Runner(
app_name="School Agents API",
agent=root_agent, # Our StudyBuddy agent
session_service=InMemorySessionService()
)
What the Runner does:
Message Routing: Takes your text queries and converts them to the format the agent expects
Event Handling: Manages the async stream of responses from the AI model
Session Coordination: Works with the session service to maintain conversation context
Error Management: Handles failures gracefully and provides meaningful responses
Session management is crucial for maintaining conversation context. Without it, every interaction would be like meeting the AI for the first time.
Session Components:
class QueryRequest(BaseModel):
query: str # The student's question
session_id: Optional[str] = None # For continuing conversations
# Our session management logic
user_id = "student" # Consistent identifier for all students
if not request.session_id:
# First interaction - create new session
session = await runner.session_service.create_session(
app_name="School Agents API",
user_id=user_id,
state={}
)
session_id = session.id
is_new_session = True
else:
# Continuing conversation - use existing session
try:
existing_session = await runner.session_service.get_session(
app_name="School Agents API",
user_id=user_id,
session_id=request.session_id
)
if existing_session:
session_id = request.session_id
is_new_session = False
except Exception:
# Session not found - create new one
session = await runner.session_service.create_session(
app_name="School Agents API",
user_id=user_id,
state={}
)
session_id = session.id
is_new_session = True
Session Flow Visualization:
First Request (no session_id):
┌─────────────┐ ┌────────────────┐ ┌─────────────┐
│ Student │───▶│ Create New │───▶│ Return │
│ "Hello!" │ │ Session │ │ session_id |
└─────────────┘ └────────────────┘ └─────────────┘
Follow-up Request (with session_id):
┌─────────────┐ ┌────────────────┐ ┌─────────────┐
│ Student │───▶│ Find Existing │───▶│ Continue │
│"What's 2+2?"│ │ Session │ │Conversation │
└─────────────┘ └────────────────┘ └─────────────┘
The InMemorySessionService stores all active conversations in memory. It's like the AI's short-term memory.
from google.adk.sessions.in_memory_session_service import InMemorySessionService
session_service = InMemorySessionService()
Key Methods:
create_session(): Creates a new conversation thread
get_session(): Retrieves an existing conversation
list_sessions(): Shows all conversations for a user
Important Note: Since it's "in-memory", all sessions are lost when the server restarts. For production, you'd want a persistent session store (Redis, Database, etc.).
User ID (user_id)
Purpose: Identifies WHO is talking to the AI
Scope: Groups all sessions for a specific user
Our Implementation: user_id = "student" (single user for educational API)
Persistence: Consistent across all interactions
Session ID (session_id)
Purpose: Identifies a SPECIFIC conversation thread
Scope: One continuous conversation
Our Implementation: Generated UUID (e.g., "a7b72840-4783-404e-89d8-f02c546ed3f5")
Persistence: Unique per conversation
Relationship Diagram:
user_id: "student"
├── session_id: "abc-123" → "Math homework help"
├── session_id: "def-456" → "Science questions"
└── session_id: "ghi-789" → "History discussion"
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import os
from dotenv import load_dotenv
# Load environment variables (Google API key, etc.)
load_dotenv()
# Create FastAPI app
app = FastAPI(
title="School Agents API",
description="Simple API for interacting with StudyBuddy agent",
version="1.0.0"
)
# Enable CORS for web applications
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
# Initialize rate limiter
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
@app.post("/query")
@limiter.limit("50/day") # Limit to 50 requests per day per IP
async def process_query(request: Request, query_request: QueryRequest):
"""Process query with StudyBuddy agent"""
# Session management (detailed above)
user_id = "student"
# ... session logic ...
# Create message for the agent
message = types.Content(
role='user',
parts=[types.Part.from_text(text=query_request.query)]
)
# Get response from StudyBuddy
response_parts = []
async for event in runner.run_async(
user_id=user_id,
session_id=session_id,
new_message=message
):
if hasattr(event, 'content') and event.content:
for part in event.content.parts:
if hasattr(part, 'text') and part.text:
response_parts.append(str(part.text))
# Return formatted response
response_text = " ".join(response_parts)
return {
"response": response_text,
"session_id": session_id,
"new_session": is_new_session,
"message": "Use this session_id in your next request to maintain conversation context" if is_new_session else "Continuing conversation with existing context"
}
We've added rate limiting to prevent API abuse:
50 requests per day per IP address on the /query endpoint
Automatic blocking when limit exceeded
Clean error responses when rate limit hit
Memory-based tracking (no external dependencies needed)
Rate Limiting Features:
# Different rate limit options you can use:
@limiter.limit("50/day") # 50 requests per day
@limiter.limit("10/minute") # 10 requests per minute
@limiter.limit("100/hour") # 100 requests per hour
@limiter.limit("5/second") # 5 requests per second
Add to pyproject.toml:
dependencies = [
# ... other dependencies ...
"slowapi>=0.1.9",
]
# Direct execution (recommended)
python src/api/api_server.py
# Using uvicorn with auto-reload
uvicorn src.api.api_server:app --host 0.0.0.0 --port 8080 --reload
# Using the package export
uvicorn src.api:app --host 0.0.0.0 --port 8080 --reload
# First request - establishes session
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{"query": "Hello, my name is Alice"}'
# Response includes session_id:
# {
# "response": "Hi Alice! Nice to meet you...",
# "session_id": "abc-123-def",
# "new_session": true
# }
# Follow-up request - maintains context
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is my name?",
"session_id": "abc-123-def"
}'
# Response remembers context:
# {
# "response": "Your name is Alice!",
# "session_id": "abc-123-def",
# "new_session": false
# }
While the API is powerful, we need a user-friendly interface for students to interact with StudyBuddy. Let's create a modern chat UI that makes learning feel natural.
For simple chat interfaces, you don't need React or other complex frameworks! Here's why:
✅ Vanilla JS Advantages:
Zero Build Process: No webpack, babel, or complicated setup
Faster Load Times: No framework overhead (React bundle ~40KB+ min)
Direct Deployment: Single HTML file works anywhere
Easier Debugging: No framework abstractions to debug
Lower Learning Curve: Basic HTML/CSS/JS skills sufficient
❌ When You DO Need React:
Complex state management across many components
Large applications with hundreds of interactive elements
Team already experienced with React ecosystem
Need for component reusability across multiple pages
Our Use Case: A simple chat interface with ~10 interactive elements is perfect for vanilla JavaScript.
Our approach uses Server-Side Rendering with FastAPI:
Server-Side Rendering (Our Approach):
@app.get("/", response_class=HTMLResponse)
async def chat_interface():
# Server sends complete HTML to browser
return HTMLResponse(content=html_content)
✅ SSR Benefits:
Instant Load: HTML renders immediately
SEO Friendly: Search engines see complete content
Works Without JS: Basic functionality even if JS disabled
Better Performance: No JS bundle downloading/parsing delay
Client-Side Rendering (React/Vue):
// Browser downloads JS → JS builds DOM → User sees content
ReactDOM.render(<ChatApp />, document.getElementById('root'));
❌ CSR Drawbacks for Simple Apps:
Blank Page First: User sees nothing until JS loads
SEO Issues: Search engines struggle with JS-generated content
Performance Overhead: Framework + bundling complexity
Development Complexity: Build tools, transpilation, etc.
Our Choice: SSR for instant loading + enhanced with client-side JS for interactivity.
┌─────────────────────┐
│ HTML/CSS/JS │ ← Modern chat interface
│ Chat Interface │
└─────────┬───────────┘
│
▼ JavaScript fetch()
┌─────────────────────┐
│ FastAPI Server │ ← Serves HTML + handles API
│ │
├─────────────────────┤
│ GET / │ ← Returns chat interface
│ POST /query │ ← Processes messages
└─────────────────────┘
Now we need to serve this HTML file from our FastAPI server. Update the server to handle the root route:
from fastapi.responses import HTMLResponse
from pathlib import Path
# Root endpoint - serve chat UI
@app.get("/", response_class=HTMLResponse)
async def chat_interface():
"""Serve the StudyBuddy chat interface"""
try:
template_path = Path(__file__).parent / "templates" / "index.html"
with open(template_path, "r", encoding="utf-8") as f:
html_content = f.read()
return HTMLResponse(content=html_content, status_code=200)
except FileNotFoundError:
return HTMLResponse(
content="<h1>StudyBuddy API</h1><p><a href='/docs'>View API Documentation</a></p>",
status_code=200
)
💡 Complete Code Available: The full HTML/CSS/JS implementation with styling, animations, and enhanced features is available in src/api/templates/index.html in your project.
1. Session Management Integration
// Automatic session continuity - no manual session handling needed
const requestBody = { query };
if (sessionId) requestBody.session_id = sessionId;
// Save session ID from API response
sessionId = data.session_id; // Preserved across requests
2. Progressive Enhancement
Works immediately: HTML loads instantly (SSR)
Enhanced with JS: Interactive features load progressively
Graceful degradation: Basic functionality without JavaScript
Mobile responsive: Single CSS file handles all screen sizes
3. Simple State Management
// No complex state management needed - just a few variables
let sessionId = null; // For conversation continuity
let isLoading = false; // Prevent duplicate submissions
4. Zero Dependencies
No npm packages: Everything works with browser APIs
No build process: Direct HTML/CSS/JS deployment
No bundling: Single file deployment
If your chat interface needs these advanced features, consider a framework:
Multi-page application with complex routing
Rich text editing with formatting tools
File uploads with progress tracking
Complex animations requiring state orchestration
Team collaboration features with real-time sync
src/api/
├── templates/
│ └── index.html # Complete chat interface (470 lines)
├── api_server.py # FastAPI server with HTML serving
└── __init__.py # Package exports
Create the template directory:
mkdir -p src/api/templates
# Copy the complete HTML code to src/api/templates/index.html
Start the server:
python src/api/api_server.py
Open your browser to:
http://localhost:8080/
Test conversation flow:
First message: "Hello, my name is Sarah"
Follow-up: "What's my name?" (should remember "Sarah")
New chat: Click "New Chat" to start fresh
Result: A production-ready (not actually, when we say production ready we consider security, scalability, monitoring e.t.c. also) chat interface that loads instantly, works on mobile, and maintains conversation context - all without React!
We choose to keep the agent's configuration in an external `agent_config.yml` file and the system prompt in a separate `prompts` folder for several reasons:
Modularity and Maintainability: Separating configuration from code allows you to update the agent's name, description, model, or prompt file without modifying the Python code. This makes the system more flexible and easier to maintain.
Readability and Collaboration: Prompts are often lengthy and descriptive. Storing them in Markdown files makes them easier to read, edit, and version control. Multiple team members can collaborate on refining the prompt without touching the codebase.
Separation of Concerns: By externalizing these elements, we follow best practices in software design, keeping the code focused on logic (like loading and creating the agent) while configuration and content are handled separately.
Reusability: This structure allows you to reuse prompts across different agents or projects, and easily experiment with different configurations.
This approach aligns with principles like the Twelve-Factor App methodology, which emphasizes treating configuration as something separate from code.
GET /: Modern chat interface (HTML)
POST /query: Main interaction endpoint (JSON API)
GET /docs: Interactive API documentation
GET /health: Health check endpoint
GET /info: API information
Your .env file should contain:
GOOGLE_GENAI_USE_VERTEXAI=FALSE
GOOGLE_API_KEY=your-google-api-key-here
Let's containerize our StudyBuddy application for easy deployment anywhere that supports Docker.
Create a Dockerfile in your project root:
# Use Python 3.11 slim image for smaller size
FROM python:3.11-slim
# Set working directory in container
WORKDIR /app
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PYTHONPATH=/app
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install UV package manager for faster dependency resolution
RUN pip install uv
# Copy dependency files
COPY pyproject.toml uv.lock ./
# Copy application source code
COPY src/ ./src/
COPY .env* ./
# Install dependencies globally (no virtual environment needed in container)
RUN uv pip install --system fastapi uvicorn[standard] pydantic python-dotenv google-adk matplotlib numpy python-multipart
# Create non-root user for security
RUN adduser --disabled-password --gecos '' appuser && \
chown -R appuser:appuser /app
USER appuser
# Expose port
EXPOSE 8080
# Command to run the application
CMD ["python", "src/api/api_server.py"]
Create .dockerignore to exclude unnecessary files:
__pycache__/
*.py[cod]
.git/
.venv/
node_modules/
*.log
tests/
docs/
*.md
.DS_Store
Dockerfile*
docker-compose*.yml
Create docker-compose.yml for easier development:
version: '3.8'
services:
studybuddy:
build: .
ports:
- "8080:8080"
environment:
- PORT=8080
- GOOGLE_API_KEY=${GOOGLE_API_KEY}
- GOOGLE_GENAI_USE_VERTEXAI=FALSE
volumes:
- ./src:/app/src:ro # Development mode
- ./.env:/app/.env:ro
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
Option A: Docker Commands
# Build the image
docker build -t studybuddy-chat .
# Run the container
docker run -p 8080:8080 \
-e GOOGLE_API_KEY="your-api-key-here" \
studybuddy-chat
# Or run in background
docker run -d -p 8080:8080 \
--name studybuddy \
-e GOOGLE_API_KEY="your-api-key-here" \
studybuddy-chat
Option B: Docker Compose (Recommended)
# Start the application
docker-compose up
# Start in background
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the application
docker-compose down
Make sure your .env file contains:
GOOGLE_API_KEY=your-google-api-key-here
GOOGLE_GENAI_USE_VERTEXAI=FALSE
PORT=8080
Our Docker setup includes:
Multi-stage optimization: Efficient layer caching
Security: Non-root user, minimal attack surface
Health checks: Automatic container health monitoring
Fast builds: UV package manager for rapid dependency installation
Production ready: Optimized for deployment
The containerized app can be deployed to:
Cloud Platforms:
Google Cloud Run: gcloud run deploy --source .
AWS : Deploy via ECR + ECS service or ec2 or eks
Azure: az container create or azure vm or azure serice runner
Self-Hosted:
Dokploy: Docker-based deployment platform
Docker Swarm: Multi-container orchestration
Kubernetes: For large-scale deployments
VPS: Any server with Docker installed
Quick Cloud Run Deployment:
# Build and deploy to Google Cloud Run
gcloud run deploy studybuddy \
--source . \
--port 8080 \
--allow-unauthenticated \
--set-env-vars GOOGLE_API_KEY="your-key"
Dokploy is a powerful, open-source alternative to Vercel/Netlify for self-hosted deployments. It's perfect for developers who want the convenience of PaaS with the control of self-hosting.
🔗 Official Website: dokploy.com
🐙 GitHub Repository: github.com/Dokploy/dokploy
For a comprehensive step-by-step guide on setting up and deploying with Dokploy, watch this detailed tutorial:
This video covers everything from server setup to deployment, perfect for getting your StudyBuddy app live quickly and cost-effectively.
Our StudyBuddy demo is deployed using Dokploy:
🔗 Live Demo: https://study_buddy.chotuai.in/
📂 Source Code: https://github.com/arjunagi-a-rehman/school-agents/tree/function-calling
Congratulations! 🎉 You've successfully built and deployed a complete AI agent system with:
✅ Custom Agent: StudyBuddy with specialized tools
✅ Function Calling: Math visualization and student management
✅ REST API: FastAPI with session management
✅ Modern UI: Responsive chat interface
✅ Rate Limiting: API protection and abuse prevention
✅ Containerization: Docker deployment ready
✅ Live Deployment: Production-ready with Dokploy
But we're just getting started! 🚀