{ "cells": [ { "cell_type": "markdown", "id": "5de79491", "metadata": {}, "source": [ "# Llama Stack Quick Start Demo\n", "\n", "This notebook demonstrates how to use Llama Stack to run an agent with tools in two ways:\n", "\n", "- **Option A (section 2):** define a **client-side** weather tool with `@client_tool`; the cell sets **`AGENT_TOOLS`**.\n", "- **Option B (section 2):** run an **MCP** weather tool with **FastMCP** and register it with the server; the register cell sets **`AGENT_TOOLS`**.\n", "- **Section 3** uses the **same** connect / model selection / `Agent` construction / run flow for both options. The only difference is the value of **`AGENT_TOOLS`** passed into `Agent`.\n", "\n", "### Inference backend (`LlamaStackDistribution`)\n", "\n", "- **`VLLM_URL`** should point at a **vLLM OpenAI-compatible** HTTP API for the model in use.\n", "- For **vLLM on KServe**, enable tool calling on the vLLM container by adding extra args, for example:\n", "\n", "```yaml\n", "args:\n", " - --enable-auto-tool-choice\n", " - --tool-call-parser\n", " - # set from vLLM documentation for the deployed model\n", "```\n", "\n", "**MCP prerequisites:** The server distribution must configure the **tool runtime** (the subsystem that executes tool calls for agents) to include the **`model-context-protocol`** provider so MCP tools can be invoked. The MCP URL must be reachable **from the server** (not only from the notebook).\n" ] }, { "cell_type": "markdown", "id": "e5d1fc8c", "metadata": {}, "source": [ "## 1. Install Dependencies\n", "\n", "**Note:** `llama-stack-client` requires Python 3.12 or higher. If your Python version does not meet this requirement, refer to the FAQ section in the documentation: **How to prepare Python 3.12 in Notebook**.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a8f9e5e4", "metadata": {}, "outputs": [], "source": [ "# Use current kernel's Python so PATH does not point to another env\n", "# If download is slow, add: -i https://pypi.tuna.tsinghua.edu.cn/simple\n", "import sys\n", "!{sys.executable} -m pip install \"llama-stack-client>=0.4\" \"requests\" \"fastapi\" \"uvicorn\" \"fastmcp\"" ] }, { "cell_type": "markdown", "id": "baabf4fc", "metadata": {}, "source": [ "\n", "## 2. Define Tools\n", "\n", "### Create Llama Stack Client" ] }, { "cell_type": "code", "execution_count": null, "id": "lls-client-init", "metadata": {}, "outputs": [], "source": [ "import os\n", "from llama_stack_client import LlamaStackClient\n", "\n", "# Set LLAMA_STACK_URL to the actual Llama Stack Server URL (cluster Service/Route or port-forward).\n", "# The default below only works when the server is reachable at localhost:8321.\n", "base_url = os.getenv(\"LLAMA_STACK_URL\", \"http://localhost:8321\")\n", "client = LlamaStackClient(base_url=base_url)\n", "print(f\"Llama Stack client created (LLAMA_STACK_URL={base_url})\")\n" ] }, { "cell_type": "markdown", "id": "8d42246c", "metadata": {}, "source": [ "### Select Tool Option\n", "Set `TOOL_OPTION` first to control which tool path section 3 will use.\n", "\n", "- `A`: client-side tool via `@client_tool`\n", "- `B`: MCP tool via FastMCP + toolgroup registration\n", "\n", "Then run the corresponding setup cells below.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "tool-option-selector", "metadata": {}, "outputs": [], "source": [ "# Choose one: \"A\" (client tool) or \"B\" (MCP)\n", "TOOL_OPTION = \"A\"\n", "print(f\"Selected TOOL_OPTION={TOOL_OPTION}\")" ] }, { "cell_type": "markdown", "id": "optA-title-md", "metadata": {}, "source": [ "### Option A: Define client-side tool\n", "\n", "Run this cell only when `TOOL_OPTION = \"A\"`. It defines `get_weather` and sets `AGENT_TOOLS` for the shared flow in section 3.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c57f95e5", "metadata": {}, "outputs": [], "source": [ "import requests\n", "from typing import Dict, Any\n", "from urllib.parse import quote\n", "from llama_stack_client.lib.agents.client_tool import client_tool\n", "\n", "\n", "if globals().get(\"TOOL_OPTION\") != \"A\":\n", " print('Skip Option A setup (TOOL_OPTION != \"A\")')\n", "else:\n", " @client_tool\n", " def get_weather(city: str) -> Dict[str, Any]:\n", " \"\"\"Get current weather information for a specified city.\n", "\n", " Uses the wttr.in free weather API to fetch weather data.\n", "\n", " :param city: City name, e.g., Beijing, Shanghai, Paris\n", " :returns: Dictionary containing weather information including city, temperature and humidity\n", " \"\"\"\n", " try:\n", " encoded_city = quote(city)\n", " url = f'https://wttr.in/{encoded_city}?format=j1'\n", " response = requests.get(url, timeout=10)\n", " response.raise_for_status()\n", " data = response.json()\n", "\n", " current = data['current_condition'][0]\n", " return {\n", " 'city': city,\n", " 'temperature': f\"{current['temp_C']}°C\",\n", " 'humidity': f\"{current['humidity']}%\",\n", " }\n", " except Exception as e:\n", " return {'error': f'Failed to get weather information: {str(e)}'}\n", "\n", " AGENT_TOOLS = [get_weather]\n", " print('Option A: AGENT_TOOLS = [get_weather]')\n" ] }, { "cell_type": "markdown", "id": "1614c719", "metadata": {}, "source": [ "### Option B: MCP tool (FastMCP)\n", "\n", "Start an MCP server with the **`fastmcp`** package: Streamable HTTP on port **8002**, tool `get_weather_mcp`. Tools are executed by **llama-server** against this URL.\n", "\n", "Set **`MCP_SERVER_URL`** if the Llama Stack Server runs elsewhere (e.g. in-cluster): the URL must be reachable from that server. If unset, the notebook derives a LAN IP for port **8002**.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "e4da207a", "metadata": {}, "outputs": [], "source": [ "if globals().get(\"TOOL_OPTION\") != \"B\":\n", " print('Skip Option B MCP server start (TOOL_OPTION != \"B\")')\n", "else:\n", " import os\n", " import socket\n", " import sys\n", " import time\n", " from pathlib import Path\n", " from subprocess import Popen\n", "\n", " # Start the MCP server in a separate Python process.\n", " # This avoids multiprocessing pickle/spawn issues on Windows/macOS.\n", " server_script = r'''\n", "from urllib.parse import quote\n", "\n", "import requests\n", "from fastmcp import FastMCP\n", "\n", "\n", "mcp = FastMCP(\"demo-weather\")\n", "\n", "\n", "@mcp.tool()\n", "def get_weather_mcp(city: str) -> dict:\n", " \"\"\"Get current weather for a city (wttr.in).\"\"\"\n", " try:\n", " encoded_city = quote(city)\n", " url = f\"https://wttr.in/{encoded_city}?format=j1\"\n", " r = requests.get(url, timeout=10)\n", " r.raise_for_status()\n", " data = r.json()\n", " cur = data[\"current_condition\"][0]\n", " return {\n", " \"city\": city,\n", " \"temperature_c\": cur[\"temp_C\"],\n", " \"humidity\": cur[\"humidity\"],\n", " }\n", " except Exception as e:\n", " return {\"error\": str(e)}\n", "\n", "\n", "if __name__ == \"__main__\":\n", " mcp.run(transport=\"streamable-http\", host=\"0.0.0.0\", port=8002)\n", "'''\n", "\n", " script_path = Path(\"/tmp/fastmcp_weather_server.py\")\n", " script_path.write_text(server_script, encoding=\"utf-8\")\n", "\n", " # Best-effort stop existing process when re-running\n", " if \"mcp_proc\" in globals() and mcp_proc and getattr(mcp_proc, \"poll\", None) and mcp_proc.poll() is None:\n", " try:\n", " mcp_proc.terminate()\n", " mcp_proc.wait(timeout=2)\n", " except Exception:\n", " pass\n", "\n", " mcp_proc = Popen([sys.executable, str(script_path)], env=os.environ.copy())\n", "\n", " # Readiness: wait for local port to accept connections (no fixed sleep)\n", " deadline = time.time() + 20\n", " last_err = None\n", " while time.time() < deadline:\n", " try:\n", " with socket.create_connection((\"127.0.0.1\", 8002), timeout=1):\n", " last_err = None\n", " break\n", " except Exception as e:\n", " last_err = e\n", " time.sleep(0.25)\n", " if last_err is not None:\n", " raise RuntimeError(f\"MCP server did not become ready on 127.0.0.1:8002: {last_err}\")\n", "\n", " MCP_SERVER_URL = os.getenv(\"MCP_SERVER_URL\")\n", " if not MCP_SERVER_URL:\n", " _host = socket.gethostbyname(socket.gethostname())\n", " if _host.startswith(\"127.\"):\n", " _host = os.getenv(\"MCP_SERVER_HOST\", \"127.0.0.1\")\n", " MCP_SERVER_URL = f\"http://{_host}:8002/mcp\"\n", "\n", " os.environ[\"MCP_SERVER_URL\"] = MCP_SERVER_URL\n", " print(f\"✓ MCP (FastMCP) at {MCP_SERVER_URL} — tool get_weather_mcp\")\n" ] }, { "cell_type": "markdown", "id": "3b9ea887", "metadata": {}, "source": [ "### Option B: Register MCP tool group\n", "\n", "Uses `toolgroups.register` with `provider_id=\"model-context-protocol\"`. A **timestamp** is appended to the tool group id so re-running the cell avoids duplicate-id errors. This cell sets **`AGENT_TOOLS`** for the MCP path.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "75ffadf0", "metadata": {}, "outputs": [], "source": [ "import os\n", "import time\n", "\n", "\n", "if globals().get(\"TOOL_OPTION\") != \"B\":\n", " print('Skip Option B registration (TOOL_OPTION != \"B\")')\n", "else:\n", "\n", " mcp_server_url = os.getenv(\"MCP_SERVER_URL\")\n", " if not mcp_server_url:\n", " raise RuntimeError(\"MCP_SERVER_URL is not set. Run Option B MCP server setup first, or export MCP_SERVER_URL.\")\n", "\n", " toolgroup_id = f\"mcp::demo-weather-{int(time.time())}\"\n", " client.toolgroups.register(\n", " toolgroup_id=toolgroup_id,\n", " provider_id=\"model-context-protocol\",\n", " mcp_endpoint={\"uri\": mcp_server_url},\n", " )\n", "\n", " AGENT_TOOLS = [\n", " {\n", " \"type\": \"mcp\",\n", " \"server_label\": toolgroup_id,\n", " \"server_url\": mcp_server_url,\n", " }\n", " ]\n", " print(\"Option B: AGENT_TOOLS configured for MCP\")\n" ] }, { "cell_type": "markdown", "id": "fed3605b", "metadata": {}, "source": [ "### Troubleshooting (MCP / tool calling)\n", "\n", "- **400 / message `content` type errors:** Some inference backends expect string `content` while tool turns use structured content. This is a **server–backend compatibility** issue; **vLLM** with `--enable-auto-tool-choice` and a matching `--tool-call-parser` is the supported path for tools here.\n", "- **Alternative:** Prefer **Option A** (client-side tools) if MCP HTTP is not reachable from llama-server.\n" ] }, { "cell_type": "markdown", "id": "sec4-md", "metadata": {}, "source": [ "## 3. Connect, Create Agent, and Run\n", "\n", "Shared flow for both options. This section uses `TOOL_OPTION` + `AGENT_TOOLS` prepared in section 2.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "394ee5db", "metadata": {}, "outputs": [], "source": [ "import os\n", "from llama_stack_client import Agent\n", "\n", "\n", "if \"AGENT_TOOLS\" not in globals():\n", " raise RuntimeError(\"AGENT_TOOLS is missing. Run the matching setup cell(s) in section 2 for the selected TOOL_OPTION.\")\n", "\n", "models = client.models.list()\n", "llm_model = next(\n", " (m for m in models if m.custom_metadata and m.custom_metadata.get(\"model_type\") == \"llm\"),\n", " None,\n", ")\n", "if not llm_model:\n", " raise RuntimeError(\"No LLM model found\")\n", "\n", "model_id = llm_model.id\n", "print(f\"Using model: {model_id}\\n\")\n", "\n", "WEATHER_INSTRUCTIONS = (\n", " \"You are a helpful weather assistant. When users ask about weather, use the weather tool to query and answer.\"\n", ")\n", "\n", "agent = Agent(\n", " client,\n", " model=model_id,\n", " instructions=WEATHER_INSTRUCTIONS,\n", " tools=AGENT_TOOLS,\n", ")\n", "\n", "print(\"Agent created successfully\")" ] }, { "cell_type": "markdown", "id": "90c28b81", "metadata": {}, "source": [ "### Run the Agent\n", "\n", "Same session and turn flow regardless of Option A or B.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "70e8d661", "metadata": {}, "outputs": [], "source": [ "# Create session\n", "session_id = agent.create_session('weather-agent-session')\n", "print(f'✓ Session created: {session_id}\\n')\n", "\n", "# First query\n", "print('=' * 60)\n", "print('User> What is the weather like in Beijing today?')\n", "print('-' * 60)\n", "\n", "response_stream = agent.create_turn(\n", " messages=[{'role': 'user', 'content': 'What is the weather like in Beijing today?'}],\n", " session_id=session_id,\n", " stream=True,\n", ")" ] }, { "cell_type": "markdown", "id": "ca2f26f2", "metadata": {}, "source": [ "### Display the Result" ] }, { "cell_type": "code", "execution_count": null, "id": "4728a638", "metadata": {}, "outputs": [], "source": [ "from llama_stack_client.lib.agents.event_logger import AgentEventLogger\n", "\n", "logger = AgentEventLogger()\n", "for printable in logger.log(response_stream):\n", " print(printable, end='', flush=True)\n", "print('\\n')" ] }, { "cell_type": "markdown", "id": "728530b0", "metadata": {}, "source": [ "### Try Different Queries" ] }, { "cell_type": "code", "execution_count": null, "id": "ed8cc5a0", "metadata": {}, "outputs": [], "source": [ "# Second query\n", "print('=' * 60)\n", "print('User> What is the weather in Shanghai?')\n", "print('-' * 60)\n", "\n", "response_stream = agent.create_turn(\n", " messages=[{'role': 'user', 'content': 'What is the weather in Shanghai?'}],\n", " session_id=session_id,\n", " stream=True,\n", ")\n", "\n", "for printable in logger.log(response_stream):\n", " print(printable, end='', flush=True)\n", "print('\\n')" ] }, { "cell_type": "markdown", "id": "6f8d31d0", "metadata": {}, "source": [ "## 4. FastAPI Service Example\n", "\n", "Expose the `llama-stack-client`-based `agent` as a FastAPI web service, so it can be called via HTTP.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a5d732e4", "metadata": {}, "outputs": [], "source": [ "import time\n", "from fastapi import FastAPI\n", "from pydantic import BaseModel\n", "from threading import Thread\n", "from llama_stack_client.lib.agents.event_logger import AgentEventLogger\n", "\n", "\n", "# Create a simple FastAPI app\n", "api_app = FastAPI(title=\"Llama Stack Agent API\")\n", "\n", "\n", "class ChatRequest(BaseModel):\n", " message: str\n", "\n", "\n", "@api_app.post(\"/chat\")\n", "def chat(request: ChatRequest):\n", " \"\"\"Chat endpoint that uses the Llama Stack Agent\"\"\"\n", " session_id = agent.create_session('fastapi-weather-session')\n", "\n", " # Create turn and collect response\n", " response_stream = agent.create_turn(\n", " messages=[{'role': 'user', 'content': request.message}],\n", " session_id=session_id,\n", " stream=True,\n", " )\n", "\n", " # Collect the full response\n", " full_response = \"\"\n", " logger = AgentEventLogger()\n", " for printable in logger.log(response_stream):\n", " full_response += printable\n", "\n", " return {\"response\": full_response}\n", "\n", "\n", "print(\"FastAPI app created. Use the next cell to start the server.\")" ] }, { "cell_type": "markdown", "id": "475997ba", "metadata": {}, "source": [ "### Start the FastAPI Server\n", "\n", "**Note**: In a notebook, you can start the server in a background thread. For production, run it as a separate process using `uvicorn`." ] }, { "cell_type": "code", "execution_count": null, "id": "6f5db723", "metadata": {}, "outputs": [], "source": [ "# Start server in background thread (for notebook demonstration)\n", "from uvicorn import Config, Server\n", "\n", "# Create a server instance that can be controlled\n", "config = Config(api_app, host=\"127.0.0.1\", port=8000, log_level=\"info\")\n", "server = Server(config)\n", "\n", "def run_server():\n", " server.run()\n", "\n", "# Use daemon=True so the thread stops automatically when the kernel restarts\n", "# This is safe for notebook demonstrations\n", "# For production, use process managers instead of threads\n", "server_thread = Thread(target=run_server, daemon=True)\n", "server_thread.start()\n", "\n", "# Wait a moment for the server to start\n", "time.sleep(2)\n", "print(\"✓ FastAPI server started at http://127.0.0.1:8000\")" ] }, { "cell_type": "markdown", "id": "715b2d47", "metadata": {}, "source": [ "### Test the API\n", "\n", "Now you can call the API using HTTP requests:" ] }, { "cell_type": "code", "execution_count": null, "id": "407b82af", "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "# Test the API endpoint\n", "response = requests.post(\n", " \"http://127.0.0.1:8000/chat\",\n", " json={\"message\": \"What's the weather in Shanghai?\"},\n", " timeout=60\n", ")\n", "\n", "print(f\"Status Code: {response.status_code}\")\n", "print(\"Response:\")\n", "print(response.json().get('response'))" ] }, { "cell_type": "markdown", "id": "cleanup-all-md", "metadata": {}, "source": [ "## Cleanup\n", "\n", "Run cleanup cells when finished (especially if Option B was used)." ] }, { "cell_type": "markdown", "id": "945a776f", "metadata": {}, "source": [ "### Stop the FastAPI Server" ] }, { "cell_type": "code", "execution_count": null, "id": "c7795bba", "metadata": {}, "outputs": [], "source": [ "# Stop the FastAPI server\n", "if 'server' in globals() and server.started:\n", " server.should_exit = True\n", " print(\"✓ FastAPI server shutdown requested.\")\n", "else:\n", " print(\"FastAPI server is not running or has already stopped.\")" ] }, { "cell_type": "markdown", "id": "9d557594", "metadata": {}, "source": [ "### Stop the MCP server process" ] }, { "cell_type": "code", "execution_count": null, "id": "a1679861", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "stopped = False\n", "\n", "# New launcher uses subprocess.Popen stored in mcp_proc\n", "if \"mcp_proc\" in globals() and mcp_proc and getattr(mcp_proc, \"poll\", None) and mcp_proc.poll() is None:\n", " try:\n", " mcp_proc.terminate()\n", " mcp_proc.wait(timeout=2)\n", " stopped = True\n", " except Exception:\n", " pass\n", "\n", "# Backward compatibility for older runs that used multiprocessing\n", "if not stopped and \"mcp_process\" in globals() and getattr(mcp_process, \"is_alive\", None) and mcp_process.is_alive():\n", " try:\n", " mcp_process.terminate()\n", " mcp_process.join(timeout=2)\n", " stopped = True\n", " except Exception:\n", " pass\n", "\n", "if stopped:\n", " print(\"✓ MCP server process stopped.\")\n", "else:\n", " print(\"MCP server process is not running or has already stopped.\")\n", "\n", "# Clear MCP runtime state for clean re-runs\n", "os.environ.pop(\"MCP_SERVER_URL\", None)\n", "if \"MCP_SERVER_URL\" in globals():\n", " del MCP_SERVER_URL\n", "print(\"✓ Cleared MCP_SERVER_URL from env/state.\")\n" ] }, { "cell_type": "markdown", "id": "a3ebed1f", "metadata": {}, "source": [ "## 5. More Resources\n", "\n", "For more resources on developing AI Agents with Llama Stack, see:\n", "\n", "### Official Documentation\n", "- [Llama Stack Documentation](https://llamastack.github.io/docs) - The official Llama Stack documentation covering all usage-related topics, API providers, and core concepts.\n", "- [Llama Stack Core Concepts](https://llamastack.github.io/docs/concepts) - Deep dive into Llama Stack architecture, API stability, and resource management.\n", "\n", "### Code Examples and Projects\n", "- [Llama Stack GitHub Repository](https://github.com/llamastack/llama-stack) - Source code, example applications, distribution configurations, and how to add new API providers.\n", "- [Llama Stack Example Apps](https://github.com/llamastack/llama-stack-apps/) - Official examples demonstrating how to use Llama Stack in various scenarios.\n", "\n", "### Community and Support\n", "- [Llama Stack GitHub Issues](https://github.com/llamastack/llama-stack/issues) - Report bugs, ask questions, and contribute to the project.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python (llama-stack-demo)", "language": "python", "name": "llama-stack-demo" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.11" } }, "nbformat": 4, "nbformat_minor": 5 }