Update Ollama Setup

2026-01-10 03:42:58 +00:00
parent 0f3a822ff2
commit 1e5e687ee2

@@ -17,10 +17,6 @@ Set up local AI models with Ollama for offline inference. Ollama runs on your ho
- [Step 1: Install Ollama (Clean Install)](#step-1-install-ollama-clean-install) - [Step 1: Install Ollama (Clean Install)](#step-1-install-ollama-clean-install)
- [Step 2: Pull Models](#step-2-pull-models) - [Step 2: Pull Models](#step-2-pull-models)
- [Step 3: Add Model to GT AI OS](#step-3-add-model-to-gt-ai-os) - [Step 3: Add Model to GT AI OS](#step-3-add-model-to-gt-ai-os)
- [macOS (Apple Silicon M1+)](#macos-apple-silicon-m1)
- [Step 1: Install Ollama](#step-1-install-ollama-2)
- [Step 2: Pull a Model](#step-2-pull-a-model)
- [Step 3: Add Model to GT AI OS](#step-3-add-model-to-gt-ai-os-1)
- [Verify Ollama is Working](#verify-ollama-is-working) - [Verify Ollama is Working](#verify-ollama-is-working)
--- ---
@@ -39,7 +35,6 @@ Set up local AI models with Ollama for offline inference. Ollama runs on your ho
|----------|-------------------| |----------|-------------------|
| Ubuntu Linux 24.04 (x86_64) | `http://ollama-host:11434/v1/chat/completions` | | Ubuntu Linux 24.04 (x86_64) | `http://ollama-host:11434/v1/chat/completions` |
| NVIDIA DGX OS 7 | `http://ollama-host:11434/v1/chat/completions` | | NVIDIA DGX OS 7 | `http://ollama-host:11434/v1/chat/completions` |
| macOS (Apple Silicon M1+) | `http://host.docker.internal:11434/v1/chat/completions` |
--- ---
@@ -352,65 +347,13 @@ ollama pull gemma3:27b
--- ---
## macOS (Apple Silicon M1+)
### Step 1: Install Ollama
Download from https://ollama.com/download or run:
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
### Step 2: Pull a Model
```bash
ollama pull llama3.1:8b
```
### Step 3: Add Model to GT AI OS
1. Open Control Panel: http://localhost:3001
2. Log in with `gtadmin@test.com` / `Test@123`
3. Go to **Models****Add Model**
4. Fill in:
- **Model ID:** `llama3.1:8b` (must match exactly what you pulled)
- **Provider:** `Local Ollama (macOS Apple Silicon)`
- **Endpoint URL:** `http://host.docker.internal:11434/v1/chat/completions`
- **Model Type:** `LLM` (Language Model - this is the most common type for AI agents)
- **Context Length:** Based on your Mac's unified memory (see table below)
- **Max Tokens:** `4096`
5. Click **Save**
6. Go to **Tenant Access****Assign Model to Tenant**
7. Select your model, tenant, and rate limit
**Context Length by Mac Memory:**
| Unified Memory | Context Length |
|----------------|----------------|
| 8GB | `8192` |
| 16GB | `32768` |
| 32GB | `65536` |
| 64GB+ | `131072` |
> ⚠️ **Critical: Model ID Must Match Exactly**
>
> The **Model ID** in GT AI OS must match the Ollama model name **exactly** - character for character. Run `ollama list` to see the exact model names. Common mistakes:
> - Extra spaces before or after the ID
> - Missing version tags (e.g., `qwen3-coder` vs `qwen3-coder:30b`)
> - Typos in the model name
>
> **Example:** If `ollama list` shows `llama3.1:8b`, use `llama3.1:8b` exactly as shown.
---
## Verify Ollama is Working ## Verify Ollama is Working
After completing the setup for your platform, follow these verification steps to ensure Ollama is properly configured and accessible by GT AI OS. After completing the setup for your platform, follow these verification steps to ensure Ollama is properly configured and accessible by GT AI OS.
### Step 1: Verify Ollama Service is Running ### Step 1: Verify Ollama Service is Running
**All Platforms (Ubuntu, DGX, macOS):** **All Platforms (Ubuntu and DGX):**
Run these commands on your host machine (not inside Docker) to confirm Ollama is running and responding: Run these commands on your host machine (not inside Docker) to confirm Ollama is running and responding:
@@ -426,7 +369,7 @@ This tests the Ollama API. You should see a JSON response with version informati
### Step 2: Verify GPU Acceleration ### Step 2: Verify GPU Acceleration
**Ubuntu x86 and DGX Only** (skip this step on macOS): **Ubuntu x86 and DGX Only**:
While a model is running, check that your NVIDIA GPU is being utilized: While a model is running, check that your NVIDIA GPU is being utilized:
@@ -440,17 +383,11 @@ nvidia-smi
You should see `ollama` or `ollama_llama_server` processes using GPU memory. If you only see CPU usage, revisit Step 1 (NVIDIA driver installation) in your platform's setup. You should see `ollama` or `ollama_llama_server` processes using GPU memory. If you only see CPU usage, revisit Step 1 (NVIDIA driver installation) in your platform's setup.
**macOS:** Apple Silicon Macs automatically use the GPU via Metal. No verification needed.
### Step 3: Verify GT AI OS Can Reach Ollama ### Step 3: Verify GT AI OS Can Reach Ollama
This step confirms that the Docker containers running GT AI OS can communicate with Ollama on your host machine. This step confirms that the Docker containers running GT AI OS can communicate with Ollama on your host machine.
**macOS (Apple Silicon M1+):**
```bash
docker exec gentwo-resource-cluster curl http://host.docker.internal:11434/api/version
```
**Ubuntu x86 and DGX:** **Ubuntu x86 and DGX:**
```bash ```bash
docker exec gentwo-resource-cluster curl http://ollama-host:11434/api/version docker exec gentwo-resource-cluster curl http://ollama-host:11434/api/version