Ollama
Run LLMs locally with no API keys
Ollama
Local AI inference, full privacy
Run powerful LLMs locally on your machine. No API keys, no cloud, no costs - just privacy and control.
Setup
1. Install Ollama
First, install Ollama on your machine:
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Windows: Download from https://ollama.ai2. Pull a Model
# Start Ollama server
ollama serve
# Pull a model (in another terminal)
ollama pull llama3.1 # Best for chat & tools (~4.7GB)
ollama pull llava # For vision support (~4.7GB)
ollama pull qwen2.5:1.5b # Lightweight option (~986MB)3. Install Packages
npm install @yourgpt/copilot-sdk @yourgpt/llm-sdkNo additional SDK required! Ollama uses native fetch - no openai or other provider SDK needed.
4. Usage
import { createOllama } from '@yourgpt/llm-sdk/ollama';
const ollama = createOllama();
const model = ollama('llama3.1');
for await (const event of model.stream({
messages: [{ id: '1', role: 'user', content: 'Hello!' }],
})) {
if (event.type === 'message:delta') {
process.stdout.write(event.content);
}
}5. Streaming (API Route)
import { createOllama } from '@yourgpt/llm-sdk/ollama';
const ollama = createOllama();
export async function POST(req: Request) {
const { messages } = await req.json();
const model = ollama('llama3.1');
const stream = new ReadableStream({
async start(controller) {
for await (const event of model.stream({
messages,
system: 'You are a helpful assistant.',
})) {
if (event.type === 'message:delta') {
controller.enqueue(new TextEncoder().encode(event.content));
}
}
controller.close();
},
});
return new Response(stream, {
headers: { 'Content-Type': 'text/plain' },
});
}Available Models
| Model | Vision | Tools | Context | Size |
|---|---|---|---|---|
llama3.1 | ❌ | ✅ | 128k | ~4.7GB |
llama3.2-vision | ✅ | ✅ | 128k | ~4.7GB |
llava | ✅ | ❌ | 4k | ~4.7GB |
mistral | ❌ | ✅ | 8k | ~4.1GB |
mixtral | ❌ | ✅ | 32k | ~26GB |
qwen2.5:1.5b | ❌ | ✅ | 32k | ~986MB |
deepseek | ❌ | ✅ | 16k | ~4GB |
codellama | ❌ | ❌ | 16k | ~3.8GB |
// Use any Ollama model
ollama('llama3.1') // General purpose, tool support
ollama('llava') // Vision capable
ollama('mistral') // Fast, good for coding
ollama('qwen2.5:1.5b') // Lightweight, great for testingOllama-Specific Options
Ollama supports unique configuration options for fine-tuning model behavior:
import { createOllama } from '@yourgpt/llm-sdk/ollama';
const ollama = createOllama({
baseUrl: 'http://localhost:11434', // Custom server URL
options: {
// Context & Performance
num_ctx: 8192, // Context window size
num_batch: 512, // Batch size for processing
num_gpu: 1, // Number of GPUs to use
// Sampling
temperature: 0.7, // Creativity (0.0-2.0)
top_p: 0.9, // Nucleus sampling
top_k: 40, // Top-k sampling
// Repetition Control
repeat_penalty: 1.1, // Penalize repetition
repeat_last_n: 64, // Look back window
// Advanced
mirostat: 0, // Mirostat sampling (0, 1, or 2)
mirostat_eta: 0.1, // Mirostat learning rate
mirostat_tau: 5.0, // Mirostat target entropy
seed: 42, // For reproducible outputs
},
});Tool Calling
Ollama supports tool calling with compatible models (llama3.1, mistral, qwen2):
import { createOllama } from '@yourgpt/llm-sdk/ollama';
const ollama = createOllama();
const model = ollama('llama3.1');
for await (const event of model.stream({
messages: [{ id: '1', role: 'user', content: "What's the weather in San Francisco?" }],
actions: [
{
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
city: { type: 'string', required: true, description: 'City name' },
},
handler: async ({ city }) => {
return { temperature: 65, condition: 'foggy', city };
},
},
],
})) {
if (event.type === 'action:start') {
console.log(`\nTool called: ${event.name}`);
}
if (event.type === 'action:result') {
console.log(`Result: ${JSON.stringify(event.result)}`);
}
if (event.type === 'message:delta') {
process.stdout.write(event.content);
}
}Not all models support tool calling. Use llama3.1, mistral, or qwen2 for best results.
Vision
Use vision-capable models like LLaVA to analyze images:
import { createOllama } from '@yourgpt/llm-sdk/ollama';
import { readFileSync } from 'fs';
const ollama = createOllama();
const model = ollama('llava');
// Read image and convert to base64
const imageBuffer = readFileSync('./image.png');
const base64Image = imageBuffer.toString('base64');
for await (const event of model.stream({
messages: [
{
id: '1',
role: 'user',
content: 'What do you see in this image?',
metadata: {
attachments: [
{
type: 'image',
data: base64Image,
mimeType: 'image/png',
},
],
},
},
],
})) {
if (event.type === 'message:delta') {
process.stdout.write(event.content);
}
}With Copilot UI
Use with the Copilot React components:
'use client';
import { CopilotProvider } from '@yourgpt/copilot-sdk/react';
export function Providers({ children }: { children: React.ReactNode }) {
return (
<CopilotProvider runtimeUrl="/api/chat">
{children}
</CopilotProvider>
);
}import { createOllama } from '@yourgpt/llm-sdk/ollama';
const ollama = createOllama();
export async function POST(req: Request) {
const { messages } = await req.json();
const model = ollama('llama3.1');
const stream = new ReadableStream({
async start(controller) {
for await (const event of model.stream({ messages })) {
if (event.type === 'message:delta') {
controller.enqueue(new TextEncoder().encode(event.content));
}
}
controller.close();
},
});
return new Response(stream, {
headers: { 'Content-Type': 'text/plain' },
});
}Why Ollama?
| Benefit | Description |
|---|---|
| Privacy | All data stays on your machine - nothing sent to external servers |
| No API Keys | No billing, no rate limits, no account required |
| Offline | Works completely offline once models are downloaded |
| No Costs | Run unlimited inferences without paying per token |
| Fast Iteration | No network latency for local development |
| Customizable | Fine-tune with Ollama modelfiles |
Ollama is perfect for development, testing, privacy-sensitive applications, and air-gapped environments.
Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL | http://localhost:11434 | Ollama server URL |
OLLAMA_MODEL | llama3.1 | Default model |
Troubleshooting
Cannot connect to Ollama
# Make sure Ollama is running
ollama serve
# Check if it's responding
curl http://localhost:11434/api/tagsModel not found
# Pull the required model
ollama pull llama3.1
ollama pull llava # for visionTool calling not working
Not all models support tool calling. Supported models:
llama3.1✅mistral✅qwen2✅
Models like codellama and gemma2 don't support tools.
Next Steps
- OpenAI - Cloud-based alternative
- Custom Provider - Build your own
- Examples - Full demo project