Source:
examples/agent-local-llmOverview
The local LLM agent runs a quantized language model entirely in-process using Transformers.js (ONNX/WebAssembly). No external server, no base URL, and no API key are required. The model is downloaded from HuggingFace Hub on first run and cached locally.Project Structure
Skills
| Skill | Description |
|---|---|
| Convert Temperature | Convert temperature values between Celsius, Fahrenheit, and Kelvin |
Agent
onnx-community/Qwen2.5-1.5B-Instruct, q4 quantized) is loaded via @browser-ai/transformers-js, which is an official Vercel AI SDK community provider for Transformers.js.
Payment
This agent is free (scheme: "free") — no x402 payment is required to call it.
Running locally
Docker deployment
Because the local LLM model (~1 GB of weights) is downloaded and prewarmed at image build time, the container starts serving requests immediately with no cold-start delay.Build the image
Run the container
http://localhost:3000:
| Endpoint | Protocol | Description |
|---|---|---|
/.well-known/agent-card.json | A2A | Agent discovery card |
/agent | A2A | JSON-RPC task handler |
/mcp | MCP | Tool endpoint |