From 1e5e687ee200f3cef91326efe049b80188b25ff2 Mon Sep 17 00:00:00 2001 From: daniel Date: Sat, 10 Jan 2026 03:42:58 +0000 Subject: [PATCH] Update Ollama Setup --- Ollama-Setup.md | 67 ++----------------------------------------------- 1 file changed, 2 insertions(+), 65 deletions(-) diff --git a/Ollama-Setup.md b/Ollama-Setup.md index b18bd0b..112c80e 100644 --- a/Ollama-Setup.md +++ b/Ollama-Setup.md @@ -17,10 +17,6 @@ Set up local AI models with Ollama for offline inference. Ollama runs on your ho - [Step 1: Install Ollama (Clean Install)](#step-1-install-ollama-clean-install) - [Step 2: Pull Models](#step-2-pull-models) - [Step 3: Add Model to GT AI OS](#step-3-add-model-to-gt-ai-os) -- [macOS (Apple Silicon M1+)](#macos-apple-silicon-m1) - - [Step 1: Install Ollama](#step-1-install-ollama-2) - - [Step 2: Pull a Model](#step-2-pull-a-model) - - [Step 3: Add Model to GT AI OS](#step-3-add-model-to-gt-ai-os-1) - [Verify Ollama is Working](#verify-ollama-is-working) --- @@ -39,7 +35,6 @@ Set up local AI models with Ollama for offline inference. Ollama runs on your ho |----------|-------------------| | Ubuntu Linux 24.04 (x86_64) | `http://ollama-host:11434/v1/chat/completions` | | NVIDIA DGX OS 7 | `http://ollama-host:11434/v1/chat/completions` | -| macOS (Apple Silicon M1+) | `http://host.docker.internal:11434/v1/chat/completions` | --- @@ -352,65 +347,13 @@ ollama pull gemma3:27b --- -## macOS (Apple Silicon M1+) - -### Step 1: Install Ollama - -Download from https://ollama.com/download or run: - -```bash -curl -fsSL https://ollama.com/install.sh | sh -``` - -### Step 2: Pull a Model - -```bash -ollama pull llama3.1:8b -``` - -### Step 3: Add Model to GT AI OS - -1. Open Control Panel: http://localhost:3001 -2. Log in with `gtadmin@test.com` / `Test@123` -3. Go to **Models** → **Add Model** -4. Fill in: - - **Model ID:** `llama3.1:8b` (must match exactly what you pulled) - - **Provider:** `Local Ollama (macOS Apple Silicon)` - - **Endpoint URL:** `http://host.docker.internal:11434/v1/chat/completions` - - **Model Type:** `LLM` (Language Model - this is the most common type for AI agents) - - **Context Length:** Based on your Mac's unified memory (see table below) - - **Max Tokens:** `4096` -5. Click **Save** -6. Go to **Tenant Access** → **Assign Model to Tenant** -7. Select your model, tenant, and rate limit - -**Context Length by Mac Memory:** - -| Unified Memory | Context Length | -|----------------|----------------| -| 8GB | `8192` | -| 16GB | `32768` | -| 32GB | `65536` | -| 64GB+ | `131072` | - -> ⚠️ **Critical: Model ID Must Match Exactly** -> -> The **Model ID** in GT AI OS must match the Ollama model name **exactly** - character for character. Run `ollama list` to see the exact model names. Common mistakes: -> - Extra spaces before or after the ID -> - Missing version tags (e.g., `qwen3-coder` vs `qwen3-coder:30b`) -> - Typos in the model name -> -> **Example:** If `ollama list` shows `llama3.1:8b`, use `llama3.1:8b` exactly as shown. - ---- - ## Verify Ollama is Working After completing the setup for your platform, follow these verification steps to ensure Ollama is properly configured and accessible by GT AI OS. ### Step 1: Verify Ollama Service is Running -**All Platforms (Ubuntu, DGX, macOS):** +**All Platforms (Ubuntu and DGX):** Run these commands on your host machine (not inside Docker) to confirm Ollama is running and responding: @@ -426,7 +369,7 @@ This tests the Ollama API. You should see a JSON response with version informati ### Step 2: Verify GPU Acceleration -**Ubuntu x86 and DGX Only** (skip this step on macOS): +**Ubuntu x86 and DGX Only**: While a model is running, check that your NVIDIA GPU is being utilized: @@ -440,17 +383,11 @@ nvidia-smi You should see `ollama` or `ollama_llama_server` processes using GPU memory. If you only see CPU usage, revisit Step 1 (NVIDIA driver installation) in your platform's setup. -**macOS:** Apple Silicon Macs automatically use the GPU via Metal. No verification needed. ### Step 3: Verify GT AI OS Can Reach Ollama This step confirms that the Docker containers running GT AI OS can communicate with Ollama on your host machine. -**macOS (Apple Silicon M1+):** -```bash -docker exec gentwo-resource-cluster curl http://host.docker.internal:11434/api/version -``` - **Ubuntu x86 and DGX:** ```bash docker exec gentwo-resource-cluster curl http://ollama-host:11434/api/version