Add Projects for NVIDIA NIMs and Nemotron using Local Ollama

2026-01-10 03:26:41 +00:00
parent 33cbd13760
commit 1fdb517463

@@ -0,0 +1,387 @@
# Projects for NVIDIA NIMs and Nemotron using Local Ollama
A step-by-step runbook for setting up cloud and local AI models on NVIDIA DGX Spark.
---
## Prerequisites
**You MUST complete these steps first before following this guide:**
| Step | Guide | What You'll Do |
|------|-------|----------------|
| 1 | [Installation Guide](Installation#nvidia-dgx-os-7-grace-blackwell-architecture) | Install GT AI OS on your NVIDIA DGX Spark system |
| 2 | [Control Panel Guide](Control-Panel-Guide) | Create your admin account, delete default account, configure tenant |
**Verify your installation works:**
1. Open http://localhost:3001 (Control Panel) - you should see the login page
2. Open http://localhost:3002 (Tenant App) - you should see the login page
3. Log in with your admin credentials (or default: `gtadmin@test.com` / `Test@123`)
**Not working?** Go back to the [Installation Guide](Installation) first.
---
## What This Runbook Covers
By the end, you will have:
- NVIDIA NIM cloud models configured (Kimi K2 for advanced AI tasks)
- Ollama installed with local Nemotron models on your NVIDIA DGX Spark
- Four demo agents ready to use
**Estimated time:** 30-45 minutes
---
## Part 1: Get Your NVIDIA NIM API Key
NVIDIA NIM gives you access to powerful AI models in the cloud via NVIDIA DGX Cloud.
### Step 1.1: Create an NVIDIA Developer Account
1. Open your web browser
2. Go to: **https://build.nvidia.com/**
3. Click the **Sign In** button (top right corner)
4. Click **Create Account** if you don't have one
5. Fill in your details and create your account
6. Check your email for a verification link
7. Click the link to verify your account
### Step 1.2: Generate Your API Key
1. Go to: **https://build.nvidia.com/**
2. Sign in with your account
3. Click on any model card (e.g., click on "Kimi K2")
4. Click **Get API Key** button
5. Copy the API key that appears
6. **Save this key** - you will need it in the next step
### Step 1.3: Add the API Key to GT AI OS
1. Open Control Panel: **http://localhost:3001**
2. Log in with your admin credentials
3. Click **API Keys** in the left sidebar
4. Click **Add API Key**
5. Fill in:
- **Provider:** Select **NVIDIA**
- **API Key:** Paste your NVIDIA API key
6. Click **Save**
7. Click **Test** next to your new key
8. You should see a green checkmark or "Valid" status
**Verification:** If the test fails, check that you copied the complete API key.
---
## Part 2: Install Ollama on NVIDIA DGX Spark
Ollama lets you run AI models locally. NVIDIA DGX Spark systems come with NVIDIA drivers pre-installed, so Ollama will automatically use your GPUs.
### Step 2.1: Install Ollama
1. Open a terminal on your NVIDIA DGX Spark
2. Run this command:
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
3. Wait for installation to complete
4. You should see: `Ollama has been installed successfully`
### Step 2.2: Configure Ollama for GT AI OS
Create the configuration so GT AI OS can connect to Ollama:
```bash
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_CONTEXT_LENGTH=131072"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KEEP_ALIVE=4h"
Environment="OLLAMA_MAX_LOADED_MODELS=3"
EOF
```
**What this does:**
- `OLLAMA_HOST=0.0.0.0:11434` - Allows GT AI OS Docker containers to connect
- `OLLAMA_CONTEXT_LENGTH=131072` - 128K token context window
- `OLLAMA_FLASH_ATTENTION=1` - Better performance
- `OLLAMA_KEEP_ALIVE=4h` - Keeps models loaded for faster responses
- `OLLAMA_MAX_LOADED_MODELS=3` - Multiple models can be loaded (NVIDIA DGX Spark has plenty of VRAM)
### Step 2.3: Start Ollama Service
```bash
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
```
### Step 2.4: Verify Ollama is Running
```bash
ollama list
```
You should see an empty list (no models yet). If you get an error, wait 10 seconds and try again.
---
## Part 3: Download Nemotron Models
NVIDIA Nemotron models are optimized for NVIDIA hardware.
### Step 3.1: Download Nemotron Mini
This is the faster, smaller model (~4GB):
```bash
ollama pull nemotron-mini:latest
```
Wait for download to complete (5-15 minutes depending on internet speed).
### Step 3.2: Download Nemotron Full
This is the more powerful model (~25GB):
```bash
ollama pull nemotron:latest
```
Wait for download to complete (15-45 minutes depending on internet speed).
### Step 3.3: Verify Models Downloaded
```bash
ollama list
```
You should see:
```
NAME SIZE
nemotron-mini:latest 4.1 GB
nemotron:latest 25.3 GB
```
### Step 3.4: Test the Models
Test Nemotron Mini:
```bash
ollama run nemotron-mini:latest "Hello, are you working?"
```
You should get a friendly response. Press `Ctrl+D` to exit.
Test Nemotron Full:
```bash
ollama run nemotron:latest "What is 2 + 2?"
```
You should get an answer. Press `Ctrl+D` to exit.
---
## Part 4: Add Nemotron Models to GT AI OS
Now configure GT AI OS to use your local Ollama models.
### Step 4.1: Add Nemotron Mini Model
1. Open Control Panel: **http://localhost:3001**
2. Log in with your admin credentials
3. Click **Models** in the left sidebar
4. Click **Add Model**
5. Fill in these exact values:
| Field | Value |
|-------|-------|
| **Model ID** | `nemotron-mini:latest` |
| **Name** | `Ollama Nemotron Mini` |
| **Provider** | `Local Ollama (Ubuntu x86 / DGX ARM)` |
| **Model Type** | `LLM` |
| **Endpoint URL** | `http://ollama-host:11434/v1/chat/completions` |
| **Context Window** | `8192` |
| **Max Tokens** | `4096` |
6. Click **Save**
### Step 4.2: Add Nemotron Full Model
1. Click **Add Model** again
2. Fill in:
| Field | Value |
|-------|-------|
| **Model ID** | `nemotron:latest` |
| **Name** | `Ollama Nemotron` |
| **Provider** | `Local Ollama (Ubuntu x86 / DGX ARM)` |
| **Model Type** | `LLM` |
| **Endpoint URL** | `http://ollama-host:11434/v1/chat/completions` |
| **Context Window** | `32768` |
| **Max Tokens** | `8192` |
3. Click **Save**
### Step 4.3: Assign Models to Your Tenant
1. Click **Tenant Access** in the left sidebar (or find it under Models)
2. Click **Assign Model to Tenant**
3. Select:
- **Model:** `nemotron-mini:latest`
- **Tenant:** Your tenant name
- **Rate Limit:** Choose a rate limit (e.g., Standard)
4. Click **Assign**
5. Repeat for `nemotron:latest`
---
## Part 5: Import Demo Agents
We provide four pre-built agents that demonstrate both NVIDIA NIM (cloud) and Ollama (local) capabilities.
### Step 5.1: Download the Agent Files
Download the CSV files for the agents you want to import:
| Agent | Download | Provider |
|-------|----------|----------|
| Python Coding Micro Project | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/python_coding_microproject.csv) | NVIDIA NIM |
| Kali Linux Simulation Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/kali_linux_shell_simulator.csv) | NVIDIA NIM |
| Nemotron Mini Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/nemotron-mini-agent.csv) | Ollama |
| Nemotron Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/nemotron-agent.csv) | Ollama |
Click the download link and the file will download automatically.
### Step 5.2: Import Agents into GT AI OS
1. Open Tenant App: **http://localhost:3002**
2. Log in with your credentials
3. Click **Agents** in the left sidebar
4. Click **Import** button
5. Click **Choose File** and select `python_coding_microproject.csv`
6. Click **Import**
7. Repeat steps 4-6 for each CSV file
### Step 5.3: Verify Agents Appear
In the **Agents** page, you should now see:
- **Python Coding Micro Project** - Uses NVIDIA NIM (cloud)
- **Kali Linux Simulation Agent** - Uses NVIDIA NIM (cloud)
- **Nemotron Mini Agent** - Uses Ollama (local)
- **Nemotron Agent** - Uses Ollama (local)
---
## Part 6: Test Everything
### Test a NVIDIA NIM Agent (Cloud)
1. In Tenant App, click **Agents**
2. Click **Python Coding Micro Project**
3. Click **Chat** or start a conversation
4. Type: `Help me make a simple Python program`
5. Press Enter
6. You should get Python code with explanations
### Test an Ollama Agent (Local)
1. Click **Agents**
2. Click **Nemotron Mini Agent**
3. Click **Chat**
4. Type: `What can you help me with?`
5. Press Enter
6. You should get a response from your local Nemotron model
---
## Agent Reference
### Cloud Agents (NVIDIA NIM)
These agents use NVIDIA NIM cloud inference:
| Agent | Model | What It Does |
|-------|-------|--------------|
| **Python Coding Micro Project** | `moonshotai/kimi-k2-instruct` | Python/Streamlit coding tutor with working code examples |
| **Kali Linux Simulation Agent** | `moonshotai/kimi-k2-instruct` | Simulates pentesting tools (MASSCAN, NMAP, Nikto) for training |
### Local Agents (Ollama)
These agents run entirely on your NVIDIA DGX Spark:
| Agent | Model | What It Does |
|-------|-------|--------------|
| **Nemotron Mini Agent** | `nemotron-mini:latest` | Fast general-purpose assistant |
| **Nemotron Agent** | `nemotron:latest` | Advanced reasoning and coding |
---
## Troubleshooting
### "Connection refused" when using Ollama agents
The agent can't connect to Ollama.
**Check Ollama is running:**
```bash
sudo systemctl status ollama
```
**If stopped, start it:**
```bash
sudo systemctl start ollama
```
**Verify it's accessible:**
```bash
curl http://localhost:11434/api/version
```
### "Model not found" error
GT AI OS can't find the model.
**Check the model ID matches exactly:**
```bash
ollama list
```
The Model ID in GT AI OS must match exactly what `ollama list` shows (e.g., `nemotron-mini:latest` not `nemotron-mini`).
### NVIDIA NIM agents return errors
**Check your API key:**
1. Go to Control Panel → **API Keys**
2. Click **Test** next to your NVIDIA key
3. If it fails, regenerate your key at https://build.nvidia.com/
### Ollama is slow
**Check GPU is being used:**
```bash
nvidia-smi
```
While using an Ollama model, you should see `ollama` or `ollama_llama_server` using GPU memory.
**If not using GPU:**
```bash
# Reinstall Ollama
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl restart ollama
```
---
## Related Guides
- [Ollama Setup](Ollama-Setup) - More Ollama configuration options
- [Control Panel Guide](Control-Panel-Guide) - Full admin configuration
- [Tenant App Guide](Tenant-App-Guide) - Using agents and chat
- [Demo Agents](Demo-Agents) - More pre-built agents
- [Troubleshooting](Troubleshooting) - Common issues