Update Ollama Setup
@@ -17,10 +17,6 @@ Set up local AI models with Ollama for offline inference. Ollama runs on your ho
|
|||||||
- [Step 1: Install Ollama (Clean Install)](#step-1-install-ollama-clean-install)
|
- [Step 1: Install Ollama (Clean Install)](#step-1-install-ollama-clean-install)
|
||||||
- [Step 2: Pull Models](#step-2-pull-models)
|
- [Step 2: Pull Models](#step-2-pull-models)
|
||||||
- [Step 3: Add Model to GT AI OS](#step-3-add-model-to-gt-ai-os)
|
- [Step 3: Add Model to GT AI OS](#step-3-add-model-to-gt-ai-os)
|
||||||
- [macOS (Apple Silicon M1+)](#macos-apple-silicon-m1)
|
|
||||||
- [Step 1: Install Ollama](#step-1-install-ollama-2)
|
|
||||||
- [Step 2: Pull a Model](#step-2-pull-a-model)
|
|
||||||
- [Step 3: Add Model to GT AI OS](#step-3-add-model-to-gt-ai-os-1)
|
|
||||||
- [Verify Ollama is Working](#verify-ollama-is-working)
|
- [Verify Ollama is Working](#verify-ollama-is-working)
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -39,7 +35,6 @@ Set up local AI models with Ollama for offline inference. Ollama runs on your ho
|
|||||||
|----------|-------------------|
|
|----------|-------------------|
|
||||||
| Ubuntu Linux 24.04 (x86_64) | `http://ollama-host:11434/v1/chat/completions` |
|
| Ubuntu Linux 24.04 (x86_64) | `http://ollama-host:11434/v1/chat/completions` |
|
||||||
| NVIDIA DGX OS 7 | `http://ollama-host:11434/v1/chat/completions` |
|
| NVIDIA DGX OS 7 | `http://ollama-host:11434/v1/chat/completions` |
|
||||||
| macOS (Apple Silicon M1+) | `http://host.docker.internal:11434/v1/chat/completions` |
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -352,65 +347,13 @@ ollama pull gemma3:27b
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## macOS (Apple Silicon M1+)
|
|
||||||
|
|
||||||
### Step 1: Install Ollama
|
|
||||||
|
|
||||||
Download from https://ollama.com/download or run:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -fsSL https://ollama.com/install.sh | sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 2: Pull a Model
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ollama pull llama3.1:8b
|
|
||||||
```
|
|
||||||
|
|
||||||
### Step 3: Add Model to GT AI OS
|
|
||||||
|
|
||||||
1. Open Control Panel: http://localhost:3001
|
|
||||||
2. Log in with `gtadmin@test.com` / `Test@123`
|
|
||||||
3. Go to **Models** → **Add Model**
|
|
||||||
4. Fill in:
|
|
||||||
- **Model ID:** `llama3.1:8b` (must match exactly what you pulled)
|
|
||||||
- **Provider:** `Local Ollama (macOS Apple Silicon)`
|
|
||||||
- **Endpoint URL:** `http://host.docker.internal:11434/v1/chat/completions`
|
|
||||||
- **Model Type:** `LLM` (Language Model - this is the most common type for AI agents)
|
|
||||||
- **Context Length:** Based on your Mac's unified memory (see table below)
|
|
||||||
- **Max Tokens:** `4096`
|
|
||||||
5. Click **Save**
|
|
||||||
6. Go to **Tenant Access** → **Assign Model to Tenant**
|
|
||||||
7. Select your model, tenant, and rate limit
|
|
||||||
|
|
||||||
**Context Length by Mac Memory:**
|
|
||||||
|
|
||||||
| Unified Memory | Context Length |
|
|
||||||
|----------------|----------------|
|
|
||||||
| 8GB | `8192` |
|
|
||||||
| 16GB | `32768` |
|
|
||||||
| 32GB | `65536` |
|
|
||||||
| 64GB+ | `131072` |
|
|
||||||
|
|
||||||
> ⚠️ **Critical: Model ID Must Match Exactly**
|
|
||||||
>
|
|
||||||
> The **Model ID** in GT AI OS must match the Ollama model name **exactly** - character for character. Run `ollama list` to see the exact model names. Common mistakes:
|
|
||||||
> - Extra spaces before or after the ID
|
|
||||||
> - Missing version tags (e.g., `qwen3-coder` vs `qwen3-coder:30b`)
|
|
||||||
> - Typos in the model name
|
|
||||||
>
|
|
||||||
> **Example:** If `ollama list` shows `llama3.1:8b`, use `llama3.1:8b` exactly as shown.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Verify Ollama is Working
|
## Verify Ollama is Working
|
||||||
|
|
||||||
After completing the setup for your platform, follow these verification steps to ensure Ollama is properly configured and accessible by GT AI OS.
|
After completing the setup for your platform, follow these verification steps to ensure Ollama is properly configured and accessible by GT AI OS.
|
||||||
|
|
||||||
### Step 1: Verify Ollama Service is Running
|
### Step 1: Verify Ollama Service is Running
|
||||||
|
|
||||||
**All Platforms (Ubuntu, DGX, macOS):**
|
**All Platforms (Ubuntu and DGX):**
|
||||||
|
|
||||||
Run these commands on your host machine (not inside Docker) to confirm Ollama is running and responding:
|
Run these commands on your host machine (not inside Docker) to confirm Ollama is running and responding:
|
||||||
|
|
||||||
@@ -426,7 +369,7 @@ This tests the Ollama API. You should see a JSON response with version informati
|
|||||||
|
|
||||||
### Step 2: Verify GPU Acceleration
|
### Step 2: Verify GPU Acceleration
|
||||||
|
|
||||||
**Ubuntu x86 and DGX Only** (skip this step on macOS):
|
**Ubuntu x86 and DGX Only**:
|
||||||
|
|
||||||
While a model is running, check that your NVIDIA GPU is being utilized:
|
While a model is running, check that your NVIDIA GPU is being utilized:
|
||||||
|
|
||||||
@@ -440,17 +383,11 @@ nvidia-smi
|
|||||||
|
|
||||||
You should see `ollama` or `ollama_llama_server` processes using GPU memory. If you only see CPU usage, revisit Step 1 (NVIDIA driver installation) in your platform's setup.
|
You should see `ollama` or `ollama_llama_server` processes using GPU memory. If you only see CPU usage, revisit Step 1 (NVIDIA driver installation) in your platform's setup.
|
||||||
|
|
||||||
**macOS:** Apple Silicon Macs automatically use the GPU via Metal. No verification needed.
|
|
||||||
|
|
||||||
### Step 3: Verify GT AI OS Can Reach Ollama
|
### Step 3: Verify GT AI OS Can Reach Ollama
|
||||||
|
|
||||||
This step confirms that the Docker containers running GT AI OS can communicate with Ollama on your host machine.
|
This step confirms that the Docker containers running GT AI OS can communicate with Ollama on your host machine.
|
||||||
|
|
||||||
**macOS (Apple Silicon M1+):**
|
|
||||||
```bash
|
|
||||||
docker exec gentwo-resource-cluster curl http://host.docker.internal:11434/api/version
|
|
||||||
```
|
|
||||||
|
|
||||||
**Ubuntu x86 and DGX:**
|
**Ubuntu x86 and DGX:**
|
||||||
```bash
|
```bash
|
||||||
docker exec gentwo-resource-cluster curl http://ollama-host:11434/api/version
|
docker exec gentwo-resource-cluster curl http://ollama-host:11434/api/version
|
||||||
|
|||||||
Reference in New Issue
Block a user