Add Ollama Setup
473
Ollama-Setup.md
Normal file
473
Ollama-Setup.md
Normal file
@@ -0,0 +1,473 @@
|
|||||||
|
# Ollama Setup
|
||||||
|
|
||||||
|
Set up local AI models with Ollama for offline inference. Ollama runs on your host machine (outside Docker) and GT AI OS containers connect to it.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Recommended Models](#recommended-models)
|
||||||
|
- [Quick Reference](#quick-reference)
|
||||||
|
- [Ubuntu Linux 24.04 (x86_64)](#ubuntu-linux-2404-x86_64)
|
||||||
|
- [Step 1: Ensure Your NVIDIA Drivers are Properly Installed](#step-1-ensure-your-nvidia-drivers-are-properly-installed)
|
||||||
|
- [Step 2: Install Ollama](#step-2-install-ollama)
|
||||||
|
- [Step 3: Configure Systemd](#step-3-configure-systemd)
|
||||||
|
- [Step 4: Start Service](#step-4-start-service)
|
||||||
|
- [Step 5: Pull a Model](#step-5-pull-a-model)
|
||||||
|
- [Step 6: Add Model to GT AI OS](#step-6-add-model-to-gt-ai-os)
|
||||||
|
- [NVIDIA DGX Spark and RTX Pro Systems (DGX OS 7)](#nvidia-dgx-spark-and-rtx-pro-systems-dgx-os-7)
|
||||||
|
- [Step 1: Install Ollama (Clean Install)](#step-1-install-ollama-clean-install)
|
||||||
|
- [Step 2: Pull Models](#step-2-pull-models)
|
||||||
|
- [Step 3: Add Model to GT AI OS](#step-3-add-model-to-gt-ai-os)
|
||||||
|
- [macOS (Apple Silicon M1+)](#macos-apple-silicon-m1)
|
||||||
|
- [Step 1: Install Ollama](#step-1-install-ollama-2)
|
||||||
|
- [Step 2: Pull a Model](#step-2-pull-a-model)
|
||||||
|
- [Step 3: Add Model to GT AI OS](#step-3-add-model-to-gt-ai-os-1)
|
||||||
|
- [Verify Ollama is Working](#verify-ollama-is-working)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recommended Models
|
||||||
|
|
||||||
|
| Model | Size | VRAM Required | Best For |
|
||||||
|
|-------|------|---------------|----------|
|
||||||
|
| llama3.1:8b | ~4.7GB | 6GB+ | General chat, coding help |
|
||||||
|
| qwen3-coder:30b | ~19GB | 24GB+ | Code generation, agentic coding |
|
||||||
|
| gemma3:27b | ~17GB | 20GB+ | General tasks, multilingual |
|
||||||
|
|
||||||
|
## Quick Reference
|
||||||
|
|
||||||
|
| Platform | Model Endpoint URL |
|
||||||
|
|----------|-------------------|
|
||||||
|
| Ubuntu Linux 24.04 (x86_64) | `http://ollama-host:11434/v1/chat/completions` |
|
||||||
|
| NVIDIA DGX OS 7 | `http://ollama-host:11434/v1/chat/completions` |
|
||||||
|
| macOS (Apple Silicon M1+) | `http://host.docker.internal:11434/v1/chat/completions` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ubuntu Linux 24.04 (x86_64)
|
||||||
|
|
||||||
|
### Step 1: Ensure Your NVIDIA Drivers are Properly Installed
|
||||||
|
|
||||||
|
If your system has an NVIDIA GPU, you need working drivers for GPU-accelerated inference. If you don't have an NVIDIA GPU, skip to Step 2.
|
||||||
|
|
||||||
|
**1. Check if NVIDIA drivers are already installed:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nvidia-smi
|
||||||
|
```
|
||||||
|
|
||||||
|
If this command shows your GPU info, skip to Step 3. If not, continue below.
|
||||||
|
|
||||||
|
**2. Install NVIDIA drivers:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Update package list
|
||||||
|
sudo apt update
|
||||||
|
|
||||||
|
# Install the recommended NVIDIA driver
|
||||||
|
sudo ubuntu-drivers install
|
||||||
|
|
||||||
|
# Reboot to load the new driver
|
||||||
|
sudo reboot
|
||||||
|
```
|
||||||
|
|
||||||
|
After reboot, verify the driver is working:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nvidia-smi
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see your GPU model, driver version, and CUDA version.
|
||||||
|
|
||||||
|
**3. Install nvtop:**
|
||||||
|
|
||||||
|
Now install the nvtop utility that allows you to monitor your GPU utilization:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo apt install nvtop
|
||||||
|
```
|
||||||
|
Now run nvtop to see your GPU metrics by copying and pasting the below command:
|
||||||
|
```bash
|
||||||
|
nvtop
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Note:** Ollama automatically detects and uses NVIDIA GPUs when drivers are installed. No additional configuration is needed.
|
||||||
|
|
||||||
|
### Step 2: Install Ollama
|
||||||
|
|
||||||
|
Install Ollama using the command below. Other installation methods may not function correctly.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -fsSL https://ollama.com/install.sh | sh
|
||||||
|
```
|
||||||
|
|
||||||
|
When the Ollama install is completed it will confirm that your GPU is present. If Ollama does not detect your GPU, check your GPU driver configuration.
|
||||||
|
|
||||||
|
### Step 3: Configure Systemd
|
||||||
|
|
||||||
|
Create the override configuration based on your GPU's VRAM. Choose the configuration that matches your GPU:
|
||||||
|
|
||||||
|
#### These Configurations are required for GT AI OS to connect properly to Ollama.
|
||||||
|
|
||||||
|
**4GB VRAM:**
|
||||||
|
```bash
|
||||||
|
sudo mkdir -p /etc/systemd/system/ollama.service.d
|
||||||
|
|
||||||
|
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
|
||||||
|
[Service]
|
||||||
|
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||||
|
Environment="OLLAMA_CONTEXT_LENGTH=4096"
|
||||||
|
Environment="OLLAMA_FLASH_ATTENTION=1"
|
||||||
|
Environment="OLLAMA_KEEP_ALIVE=4h"
|
||||||
|
Environment="OLLAMA_MAX_LOADED_MODELS=1"
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
**6GB VRAM:**
|
||||||
|
```bash
|
||||||
|
sudo mkdir -p /etc/systemd/system/ollama.service.d
|
||||||
|
|
||||||
|
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
|
||||||
|
[Service]
|
||||||
|
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||||
|
Environment="OLLAMA_CONTEXT_LENGTH=8192"
|
||||||
|
Environment="OLLAMA_FLASH_ATTENTION=1"
|
||||||
|
Environment="OLLAMA_KEEP_ALIVE=4h"
|
||||||
|
Environment="OLLAMA_MAX_LOADED_MODELS=1"
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
**8GB VRAM:**
|
||||||
|
```bash
|
||||||
|
sudo mkdir -p /etc/systemd/system/ollama.service.d
|
||||||
|
|
||||||
|
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
|
||||||
|
[Service]
|
||||||
|
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||||
|
Environment="OLLAMA_CONTEXT_LENGTH=16384"
|
||||||
|
Environment="OLLAMA_FLASH_ATTENTION=1"
|
||||||
|
Environment="OLLAMA_KEEP_ALIVE=4h"
|
||||||
|
Environment="OLLAMA_MAX_LOADED_MODELS=1"
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
**12GB VRAM:**
|
||||||
|
```bash
|
||||||
|
sudo mkdir -p /etc/systemd/system/ollama.service.d
|
||||||
|
|
||||||
|
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
|
||||||
|
[Service]
|
||||||
|
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||||
|
Environment="OLLAMA_CONTEXT_LENGTH=32768"
|
||||||
|
Environment="OLLAMA_FLASH_ATTENTION=1"
|
||||||
|
Environment="OLLAMA_KEEP_ALIVE=4h"
|
||||||
|
Environment="OLLAMA_MAX_LOADED_MODELS=2"
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
**16GB VRAM:**
|
||||||
|
```bash
|
||||||
|
sudo mkdir -p /etc/systemd/system/ollama.service.d
|
||||||
|
|
||||||
|
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
|
||||||
|
[Service]
|
||||||
|
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||||
|
Environment="OLLAMA_CONTEXT_LENGTH=65536"
|
||||||
|
Environment="OLLAMA_FLASH_ATTENTION=1"
|
||||||
|
Environment="OLLAMA_KEEP_ALIVE=4h"
|
||||||
|
Environment="OLLAMA_MAX_LOADED_MODELS=2"
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
**32GB+ VRAM:**
|
||||||
|
```bash
|
||||||
|
sudo mkdir -p /etc/systemd/system/ollama.service.d
|
||||||
|
|
||||||
|
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
|
||||||
|
[Service]
|
||||||
|
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||||
|
Environment="OLLAMA_CONTEXT_LENGTH=131072"
|
||||||
|
Environment="OLLAMA_FLASH_ATTENTION=1"
|
||||||
|
Environment="OLLAMA_KEEP_ALIVE=4h"
|
||||||
|
Environment="OLLAMA_MAX_LOADED_MODELS=3"
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
**Configuration explained:**
|
||||||
|
- `OLLAMA_HOST=0.0.0.0:11434` - Listen on all network interfaces (required for Docker)
|
||||||
|
- `OLLAMA_CONTEXT_LENGTH` - Maximum context window size (adjust based on VRAM)
|
||||||
|
- `OLLAMA_FLASH_ATTENTION=1` - Enable flash attention for better performance
|
||||||
|
- `OLLAMA_KEEP_ALIVE=4h` - Keep models loaded for 4 hours
|
||||||
|
- `OLLAMA_MAX_LOADED_MODELS` - Number of models loaded simultaneously (adjust based on VRAM)
|
||||||
|
|
||||||
|
### Step 4: Start Service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
sudo systemctl enable ollama
|
||||||
|
sudo systemctl start ollama
|
||||||
|
sudo systemctl restart ollama
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5: Pull a Model
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ollama pull llama3.1:8b
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 6: Add Model to GT AI OS
|
||||||
|
|
||||||
|
1. Open Control Panel: http://localhost:3001
|
||||||
|
2. Log in with `gtadmin@test.com` / `Test@123`
|
||||||
|
3. Go to **Models** → **Add Model**
|
||||||
|
4. Fill in:
|
||||||
|
- **Model ID:** `llama3.1:8b` (must match exactly what you pulled)
|
||||||
|
- **Provider:** `Local Ollama (Ubuntu x86 / DGX ARM)`
|
||||||
|
- **Endpoint URL:** `http://ollama-host:11434/v1/chat/completions`
|
||||||
|
- **Model Type:** `LLM` (Language Model - this is the most common type for AI agents)
|
||||||
|
- **Context Length:** Use the value from your systemd config (e.g., `8192` for 6GB VRAM)
|
||||||
|
- **Max Tokens:** `4096`
|
||||||
|
5. Click **Save**
|
||||||
|
6. Go to **Tenant Access** → **Assign Model to Tenant**
|
||||||
|
7. Select your model, tenant, and rate limit
|
||||||
|
|
||||||
|
> ⚠️ **Critical: Model ID Must Match Exactly**
|
||||||
|
>
|
||||||
|
> The **Model ID** in GT AI OS must match the Ollama model name **exactly** - character for character. Run `ollama list` to see the exact model names. Common mistakes:
|
||||||
|
> - Extra spaces before or after the ID
|
||||||
|
> - Missing version tags (e.g., `qwen3-coder` vs `qwen3-coder:30b`)
|
||||||
|
> - Typos in the model name
|
||||||
|
>
|
||||||
|
> **Example:** If `ollama list` shows `llama3.1:8b`, use `llama3.1:8b` exactly as shown.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## NVIDIA DGX Spark and RTX Pro Systems (DGX OS 7)
|
||||||
|
|
||||||
|
DGX systems come with NVIDIA drivers and CUDA pre-installed. Ollama will automatically use the GPUs.
|
||||||
|
|
||||||
|
### Step 1: Install Ollama (Clean Install)
|
||||||
|
|
||||||
|
Copy and paste the command below to perform a complete clean install of Ollama.
|
||||||
|
|
||||||
|
> **Important:** The configuration settings in this script are required for GT AI OS integration on DGX OS 7 Systems:
|
||||||
|
> - `OLLAMA_HOST=0.0.0.0:11434` - Allows Docker containers to connect (required)
|
||||||
|
> - `OLLAMA_CONTEXT_LENGTH=131072` - 128K context window for long conversations
|
||||||
|
> - `OLLAMA_FLASH_ATTENTION=1` - Enables flash attention for better GPU performance
|
||||||
|
> - `OLLAMA_KEEP_ALIVE=4h` - Keeps models loaded to avoid cold start delays
|
||||||
|
> - `OLLAMA_MAX_LOADED_MODELS=3` - DGX has enough VRAM for multiple models
|
||||||
|
>
|
||||||
|
> Do not skip or modify these settings unless you understand the implications.
|
||||||
|
|
||||||
|
> ⚠️ **Warning:** This command performs a clean reinstallation of Ollama. Any existing Ollama installation will be removed, including downloaded models. If you wish to preserve your models, back up `/usr/share/ollama/.ollama/models` before proceeding.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Cleanup
|
||||||
|
sudo systemctl stop ollama 2>/dev/null; sudo pkill ollama 2>/dev/null; sleep 2; \
|
||||||
|
snap list ollama &>/dev/null && sudo snap remove ollama; \
|
||||||
|
sudo systemctl disable ollama 2>/dev/null; \
|
||||||
|
sudo rm -f /etc/systemd/system/ollama.service; \
|
||||||
|
sudo rm -rf /etc/systemd/system/ollama.service.d; \
|
||||||
|
sudo rm -f /usr/local/bin/ollama /usr/bin/ollama; \
|
||||||
|
sudo rm -rf /usr/local/lib/ollama; \
|
||||||
|
id ollama &>/dev/null && sudo userdel -r ollama 2>/dev/null; \
|
||||||
|
getent group ollama &>/dev/null && sudo groupdel ollama 2>/dev/null; \
|
||||||
|
sudo systemctl daemon-reload && \
|
||||||
|
# Install
|
||||||
|
curl -fsSL https://ollama.com/install.sh | sh && \
|
||||||
|
if [ ! -f /etc/systemd/system/ollama.service ]; then
|
||||||
|
sudo tee /etc/systemd/system/ollama.service > /dev/null <<'EOF'
|
||||||
|
[Unit]
|
||||||
|
Description=Ollama Service
|
||||||
|
After=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
ExecStart=/usr/local/bin/ollama serve
|
||||||
|
User=ollama
|
||||||
|
Group=ollama
|
||||||
|
Restart=always
|
||||||
|
RestartSec=3
|
||||||
|
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=default.target
|
||||||
|
EOF
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
fi && \
|
||||||
|
# Configure
|
||||||
|
sudo mkdir -p /etc/systemd/system/ollama.service.d && \
|
||||||
|
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
|
||||||
|
[Service]
|
||||||
|
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||||
|
Environment="OLLAMA_CONTEXT_LENGTH=131072"
|
||||||
|
Environment="OLLAMA_FLASH_ATTENTION=1"
|
||||||
|
Environment="OLLAMA_KEEP_ALIVE=4h"
|
||||||
|
Environment="OLLAMA_MAX_LOADED_MODELS=3"
|
||||||
|
EOF
|
||||||
|
# Start
|
||||||
|
sudo systemctl daemon-reload && \
|
||||||
|
sudo systemctl enable ollama && \
|
||||||
|
sudo systemctl start ollama && \
|
||||||
|
sudo systemctl restart ollama && \
|
||||||
|
# Verify
|
||||||
|
sleep 3 && \
|
||||||
|
systemctl is-active ollama && echo "✓ Service running" && \
|
||||||
|
curl -s http://localhost:11434/api/version && echo -e "\n✓ API responding" && \
|
||||||
|
systemctl show ollama --property=Environment | tr ' ' '\n'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Pull Models
|
||||||
|
|
||||||
|
DGX systems have more VRAM, so you can run larger models:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ollama pull llama3.1:8b
|
||||||
|
ollama pull qwen3-coder:30b
|
||||||
|
ollama pull gemma3:27b
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Add Model to GT AI OS
|
||||||
|
|
||||||
|
1. Open Control Panel: http://localhost:3001
|
||||||
|
2. Log in with `gtadmin@test.com` / `Test@123`
|
||||||
|
3. Go to **Models** → **Add Model**
|
||||||
|
4. Fill in:
|
||||||
|
- **Model ID:** `llama3.1:8b` (or `qwen3-coder:30b`, `gemma3:27b`)
|
||||||
|
- **Provider:** `Local Ollama (Ubuntu x86 / DGX ARM)`
|
||||||
|
- **Endpoint URL:** `http://ollama-host:11434/v1/chat/completions`
|
||||||
|
- **Model Type:** `LLM` (Language Model - this is the most common type for AI agents)
|
||||||
|
- **Context Length:** `131072`
|
||||||
|
- **Max Tokens:** `4096`
|
||||||
|
5. Click **Save**
|
||||||
|
6. Go to **Tenant Access** → **Assign Model to Tenant**
|
||||||
|
7. Select your model, tenant, and rate limit
|
||||||
|
|
||||||
|
> ⚠️ **Critical: Model ID Must Match Exactly**
|
||||||
|
>
|
||||||
|
> The **Model ID** in GT AI OS must match the Ollama model name **exactly** - character for character. Run `ollama list` to see the exact model names. Common mistakes:
|
||||||
|
> - Extra spaces before or after the ID
|
||||||
|
> - Missing version tags (e.g., `qwen3-coder` vs `qwen3-coder:30b`)
|
||||||
|
> - Typos in the model name
|
||||||
|
>
|
||||||
|
> **Example:** If `ollama list` shows `llama3.1:8b`, use `llama3.1:8b` exactly as shown.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## macOS (Apple Silicon M1+)
|
||||||
|
|
||||||
|
### Step 1: Install Ollama
|
||||||
|
|
||||||
|
Download from https://ollama.com/download or run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -fsSL https://ollama.com/install.sh | sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Pull a Model
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ollama pull llama3.1:8b
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Add Model to GT AI OS
|
||||||
|
|
||||||
|
1. Open Control Panel: http://localhost:3001
|
||||||
|
2. Log in with `gtadmin@test.com` / `Test@123`
|
||||||
|
3. Go to **Models** → **Add Model**
|
||||||
|
4. Fill in:
|
||||||
|
- **Model ID:** `llama3.1:8b` (must match exactly what you pulled)
|
||||||
|
- **Provider:** `Local Ollama (macOS Apple Silicon)`
|
||||||
|
- **Endpoint URL:** `http://host.docker.internal:11434/v1/chat/completions`
|
||||||
|
- **Model Type:** `LLM` (Language Model - this is the most common type for AI agents)
|
||||||
|
- **Context Length:** Based on your Mac's unified memory (see table below)
|
||||||
|
- **Max Tokens:** `4096`
|
||||||
|
5. Click **Save**
|
||||||
|
6. Go to **Tenant Access** → **Assign Model to Tenant**
|
||||||
|
7. Select your model, tenant, and rate limit
|
||||||
|
|
||||||
|
**Context Length by Mac Memory:**
|
||||||
|
|
||||||
|
| Unified Memory | Context Length |
|
||||||
|
|----------------|----------------|
|
||||||
|
| 8GB | `8192` |
|
||||||
|
| 16GB | `32768` |
|
||||||
|
| 32GB | `65536` |
|
||||||
|
| 64GB+ | `131072` |
|
||||||
|
|
||||||
|
> ⚠️ **Critical: Model ID Must Match Exactly**
|
||||||
|
>
|
||||||
|
> The **Model ID** in GT AI OS must match the Ollama model name **exactly** - character for character. Run `ollama list` to see the exact model names. Common mistakes:
|
||||||
|
> - Extra spaces before or after the ID
|
||||||
|
> - Missing version tags (e.g., `qwen3-coder` vs `qwen3-coder:30b`)
|
||||||
|
> - Typos in the model name
|
||||||
|
>
|
||||||
|
> **Example:** If `ollama list` shows `llama3.1:8b`, use `llama3.1:8b` exactly as shown.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verify Ollama is Working
|
||||||
|
|
||||||
|
After completing the setup for your platform, follow these verification steps to ensure Ollama is properly configured and accessible by GT AI OS.
|
||||||
|
|
||||||
|
### Step 1: Verify Ollama Service is Running
|
||||||
|
|
||||||
|
**All Platforms (Ubuntu, DGX, macOS):**
|
||||||
|
|
||||||
|
Run these commands on your host machine (not inside Docker) to confirm Ollama is running and responding:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ollama list
|
||||||
|
```
|
||||||
|
This shows all models you have pulled. You should see `llama3.1:8b` (or other models you installed).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://localhost:11434/api/version
|
||||||
|
```
|
||||||
|
This tests the Ollama API. You should see a JSON response with version information like `{"version":"0.x.x"}`.
|
||||||
|
|
||||||
|
### Step 2: Verify GPU Acceleration
|
||||||
|
|
||||||
|
**Ubuntu x86 and DGX Only** (skip this step on macOS):
|
||||||
|
|
||||||
|
While a model is running, check that your NVIDIA GPU is being utilized:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nvtop
|
||||||
|
```
|
||||||
|
or
|
||||||
|
```bash
|
||||||
|
nvidia-smi
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see `ollama` or `ollama_llama_server` processes using GPU memory. If you only see CPU usage, revisit Step 1 (NVIDIA driver installation) in your platform's setup.
|
||||||
|
|
||||||
|
**macOS:** Apple Silicon Macs automatically use the GPU via Metal. No verification needed.
|
||||||
|
|
||||||
|
### Step 3: Verify GT AI OS Can Reach Ollama
|
||||||
|
|
||||||
|
This step confirms that the Docker containers running GT AI OS can communicate with Ollama on your host machine.
|
||||||
|
|
||||||
|
**macOS (Apple Silicon M1+):**
|
||||||
|
```bash
|
||||||
|
docker exec gentwo-resource-cluster curl http://host.docker.internal:11434/api/version
|
||||||
|
```
|
||||||
|
|
||||||
|
**Ubuntu x86 and DGX:**
|
||||||
|
```bash
|
||||||
|
docker exec gentwo-resource-cluster curl http://ollama-host:11434/api/version
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see the same JSON version response. If you get a connection error, check that:
|
||||||
|
- Ollama is running (`ollama list` works)
|
||||||
|
- On Ubuntu/DGX: The systemd config has `OLLAMA_HOST=0.0.0.0:11434`
|
||||||
|
- GT AI OS containers are running (`docker ps | grep gentwo`)
|
||||||
|
|
||||||
|
### Step 4: Test in the Application
|
||||||
|
|
||||||
|
Once all verification steps pass, test the full integration:
|
||||||
|
|
||||||
|
1. Open Tenant App: http://localhost:3002
|
||||||
|
2. Create a new agent or edit an existing one
|
||||||
|
3. Select your Ollama model (e.g., `llama3.1:8b`) from the model dropdown
|
||||||
|
4. Send a test message and verify you get a response
|
||||||
|
|
||||||
|
If the agent doesn't respond, check the model configuration in Control Panel → Models and ensure the Model ID matches exactly what `ollama list` shows.
|
||||||
Reference in New Issue
Block a user