Add Projects for NVIDIA NIMs and Nemotron using Local Ollama
387
Projects-for-NVIDIA-NIMs-and-Nemotron-using-Local-Ollama.md
Normal file
387
Projects-for-NVIDIA-NIMs-and-Nemotron-using-Local-Ollama.md
Normal file
@@ -0,0 +1,387 @@
|
||||
# Projects for NVIDIA NIMs and Nemotron using Local Ollama
|
||||
|
||||
A step-by-step runbook for setting up cloud and local AI models on NVIDIA DGX Spark.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**You MUST complete these steps first before following this guide:**
|
||||
|
||||
| Step | Guide | What You'll Do |
|
||||
|------|-------|----------------|
|
||||
| 1 | [Installation Guide](Installation#nvidia-dgx-os-7-grace-blackwell-architecture) | Install GT AI OS on your NVIDIA DGX Spark system |
|
||||
| 2 | [Control Panel Guide](Control-Panel-Guide) | Create your admin account, delete default account, configure tenant |
|
||||
|
||||
**Verify your installation works:**
|
||||
1. Open http://localhost:3001 (Control Panel) - you should see the login page
|
||||
2. Open http://localhost:3002 (Tenant App) - you should see the login page
|
||||
3. Log in with your admin credentials (or default: `gtadmin@test.com` / `Test@123`)
|
||||
|
||||
**Not working?** Go back to the [Installation Guide](Installation) first.
|
||||
|
||||
---
|
||||
|
||||
## What This Runbook Covers
|
||||
|
||||
By the end, you will have:
|
||||
- NVIDIA NIM cloud models configured (Kimi K2 for advanced AI tasks)
|
||||
- Ollama installed with local Nemotron models on your NVIDIA DGX Spark
|
||||
- Four demo agents ready to use
|
||||
|
||||
**Estimated time:** 30-45 minutes
|
||||
|
||||
---
|
||||
|
||||
## Part 1: Get Your NVIDIA NIM API Key
|
||||
|
||||
NVIDIA NIM gives you access to powerful AI models in the cloud via NVIDIA DGX Cloud.
|
||||
|
||||
### Step 1.1: Create an NVIDIA Developer Account
|
||||
|
||||
1. Open your web browser
|
||||
2. Go to: **https://build.nvidia.com/**
|
||||
3. Click the **Sign In** button (top right corner)
|
||||
4. Click **Create Account** if you don't have one
|
||||
5. Fill in your details and create your account
|
||||
6. Check your email for a verification link
|
||||
7. Click the link to verify your account
|
||||
|
||||
### Step 1.2: Generate Your API Key
|
||||
|
||||
1. Go to: **https://build.nvidia.com/**
|
||||
2. Sign in with your account
|
||||
3. Click on any model card (e.g., click on "Kimi K2")
|
||||
4. Click **Get API Key** button
|
||||
5. Copy the API key that appears
|
||||
6. **Save this key** - you will need it in the next step
|
||||
|
||||
### Step 1.3: Add the API Key to GT AI OS
|
||||
|
||||
1. Open Control Panel: **http://localhost:3001**
|
||||
2. Log in with your admin credentials
|
||||
3. Click **API Keys** in the left sidebar
|
||||
4. Click **Add API Key**
|
||||
5. Fill in:
|
||||
- **Provider:** Select **NVIDIA**
|
||||
- **API Key:** Paste your NVIDIA API key
|
||||
6. Click **Save**
|
||||
7. Click **Test** next to your new key
|
||||
8. You should see a green checkmark or "Valid" status
|
||||
|
||||
**Verification:** If the test fails, check that you copied the complete API key.
|
||||
|
||||
---
|
||||
|
||||
## Part 2: Install Ollama on NVIDIA DGX Spark
|
||||
|
||||
Ollama lets you run AI models locally. NVIDIA DGX Spark systems come with NVIDIA drivers pre-installed, so Ollama will automatically use your GPUs.
|
||||
|
||||
### Step 2.1: Install Ollama
|
||||
|
||||
1. Open a terminal on your NVIDIA DGX Spark
|
||||
2. Run this command:
|
||||
|
||||
```bash
|
||||
curl -fsSL https://ollama.com/install.sh | sh
|
||||
```
|
||||
|
||||
3. Wait for installation to complete
|
||||
4. You should see: `Ollama has been installed successfully`
|
||||
|
||||
### Step 2.2: Configure Ollama for GT AI OS
|
||||
|
||||
Create the configuration so GT AI OS can connect to Ollama:
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /etc/systemd/system/ollama.service.d
|
||||
|
||||
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
|
||||
[Service]
|
||||
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||
Environment="OLLAMA_CONTEXT_LENGTH=131072"
|
||||
Environment="OLLAMA_FLASH_ATTENTION=1"
|
||||
Environment="OLLAMA_KEEP_ALIVE=4h"
|
||||
Environment="OLLAMA_MAX_LOADED_MODELS=3"
|
||||
EOF
|
||||
```
|
||||
|
||||
**What this does:**
|
||||
- `OLLAMA_HOST=0.0.0.0:11434` - Allows GT AI OS Docker containers to connect
|
||||
- `OLLAMA_CONTEXT_LENGTH=131072` - 128K token context window
|
||||
- `OLLAMA_FLASH_ATTENTION=1` - Better performance
|
||||
- `OLLAMA_KEEP_ALIVE=4h` - Keeps models loaded for faster responses
|
||||
- `OLLAMA_MAX_LOADED_MODELS=3` - Multiple models can be loaded (NVIDIA DGX Spark has plenty of VRAM)
|
||||
|
||||
### Step 2.3: Start Ollama Service
|
||||
|
||||
```bash
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable ollama
|
||||
sudo systemctl start ollama
|
||||
```
|
||||
|
||||
### Step 2.4: Verify Ollama is Running
|
||||
|
||||
```bash
|
||||
ollama list
|
||||
```
|
||||
|
||||
You should see an empty list (no models yet). If you get an error, wait 10 seconds and try again.
|
||||
|
||||
---
|
||||
|
||||
## Part 3: Download Nemotron Models
|
||||
|
||||
NVIDIA Nemotron models are optimized for NVIDIA hardware.
|
||||
|
||||
### Step 3.1: Download Nemotron Mini
|
||||
|
||||
This is the faster, smaller model (~4GB):
|
||||
|
||||
```bash
|
||||
ollama pull nemotron-mini:latest
|
||||
```
|
||||
|
||||
Wait for download to complete (5-15 minutes depending on internet speed).
|
||||
|
||||
### Step 3.2: Download Nemotron Full
|
||||
|
||||
This is the more powerful model (~25GB):
|
||||
|
||||
```bash
|
||||
ollama pull nemotron:latest
|
||||
```
|
||||
|
||||
Wait for download to complete (15-45 minutes depending on internet speed).
|
||||
|
||||
### Step 3.3: Verify Models Downloaded
|
||||
|
||||
```bash
|
||||
ollama list
|
||||
```
|
||||
|
||||
You should see:
|
||||
```
|
||||
NAME SIZE
|
||||
nemotron-mini:latest 4.1 GB
|
||||
nemotron:latest 25.3 GB
|
||||
```
|
||||
|
||||
### Step 3.4: Test the Models
|
||||
|
||||
Test Nemotron Mini:
|
||||
```bash
|
||||
ollama run nemotron-mini:latest "Hello, are you working?"
|
||||
```
|
||||
|
||||
You should get a friendly response. Press `Ctrl+D` to exit.
|
||||
|
||||
Test Nemotron Full:
|
||||
```bash
|
||||
ollama run nemotron:latest "What is 2 + 2?"
|
||||
```
|
||||
|
||||
You should get an answer. Press `Ctrl+D` to exit.
|
||||
|
||||
---
|
||||
|
||||
## Part 4: Add Nemotron Models to GT AI OS
|
||||
|
||||
Now configure GT AI OS to use your local Ollama models.
|
||||
|
||||
### Step 4.1: Add Nemotron Mini Model
|
||||
|
||||
1. Open Control Panel: **http://localhost:3001**
|
||||
2. Log in with your admin credentials
|
||||
3. Click **Models** in the left sidebar
|
||||
4. Click **Add Model**
|
||||
5. Fill in these exact values:
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Model ID** | `nemotron-mini:latest` |
|
||||
| **Name** | `Ollama Nemotron Mini` |
|
||||
| **Provider** | `Local Ollama (Ubuntu x86 / DGX ARM)` |
|
||||
| **Model Type** | `LLM` |
|
||||
| **Endpoint URL** | `http://ollama-host:11434/v1/chat/completions` |
|
||||
| **Context Window** | `8192` |
|
||||
| **Max Tokens** | `4096` |
|
||||
|
||||
6. Click **Save**
|
||||
|
||||
### Step 4.2: Add Nemotron Full Model
|
||||
|
||||
1. Click **Add Model** again
|
||||
2. Fill in:
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Model ID** | `nemotron:latest` |
|
||||
| **Name** | `Ollama Nemotron` |
|
||||
| **Provider** | `Local Ollama (Ubuntu x86 / DGX ARM)` |
|
||||
| **Model Type** | `LLM` |
|
||||
| **Endpoint URL** | `http://ollama-host:11434/v1/chat/completions` |
|
||||
| **Context Window** | `32768` |
|
||||
| **Max Tokens** | `8192` |
|
||||
|
||||
3. Click **Save**
|
||||
|
||||
### Step 4.3: Assign Models to Your Tenant
|
||||
|
||||
1. Click **Tenant Access** in the left sidebar (or find it under Models)
|
||||
2. Click **Assign Model to Tenant**
|
||||
3. Select:
|
||||
- **Model:** `nemotron-mini:latest`
|
||||
- **Tenant:** Your tenant name
|
||||
- **Rate Limit:** Choose a rate limit (e.g., Standard)
|
||||
4. Click **Assign**
|
||||
5. Repeat for `nemotron:latest`
|
||||
|
||||
---
|
||||
|
||||
## Part 5: Import Demo Agents
|
||||
|
||||
We provide four pre-built agents that demonstrate both NVIDIA NIM (cloud) and Ollama (local) capabilities.
|
||||
|
||||
### Step 5.1: Download the Agent Files
|
||||
|
||||
Download the CSV files for the agents you want to import:
|
||||
|
||||
| Agent | Download | Provider |
|
||||
|-------|----------|----------|
|
||||
| Python Coding Micro Project | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/python_coding_microproject.csv) | NVIDIA NIM |
|
||||
| Kali Linux Simulation Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/kali_linux_shell_simulator.csv) | NVIDIA NIM |
|
||||
| Nemotron Mini Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/nemotron-mini-agent.csv) | Ollama |
|
||||
| Nemotron Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/nemotron-agent.csv) | Ollama |
|
||||
|
||||
Click the download link and the file will download automatically.
|
||||
|
||||
### Step 5.2: Import Agents into GT AI OS
|
||||
|
||||
1. Open Tenant App: **http://localhost:3002**
|
||||
2. Log in with your credentials
|
||||
3. Click **Agents** in the left sidebar
|
||||
4. Click **Import** button
|
||||
5. Click **Choose File** and select `python_coding_microproject.csv`
|
||||
6. Click **Import**
|
||||
7. Repeat steps 4-6 for each CSV file
|
||||
|
||||
### Step 5.3: Verify Agents Appear
|
||||
|
||||
In the **Agents** page, you should now see:
|
||||
- **Python Coding Micro Project** - Uses NVIDIA NIM (cloud)
|
||||
- **Kali Linux Simulation Agent** - Uses NVIDIA NIM (cloud)
|
||||
- **Nemotron Mini Agent** - Uses Ollama (local)
|
||||
- **Nemotron Agent** - Uses Ollama (local)
|
||||
|
||||
---
|
||||
|
||||
## Part 6: Test Everything
|
||||
|
||||
### Test a NVIDIA NIM Agent (Cloud)
|
||||
|
||||
1. In Tenant App, click **Agents**
|
||||
2. Click **Python Coding Micro Project**
|
||||
3. Click **Chat** or start a conversation
|
||||
4. Type: `Help me make a simple Python program`
|
||||
5. Press Enter
|
||||
6. You should get Python code with explanations
|
||||
|
||||
### Test an Ollama Agent (Local)
|
||||
|
||||
1. Click **Agents**
|
||||
2. Click **Nemotron Mini Agent**
|
||||
3. Click **Chat**
|
||||
4. Type: `What can you help me with?`
|
||||
5. Press Enter
|
||||
6. You should get a response from your local Nemotron model
|
||||
|
||||
---
|
||||
|
||||
## Agent Reference
|
||||
|
||||
### Cloud Agents (NVIDIA NIM)
|
||||
|
||||
These agents use NVIDIA NIM cloud inference:
|
||||
|
||||
| Agent | Model | What It Does |
|
||||
|-------|-------|--------------|
|
||||
| **Python Coding Micro Project** | `moonshotai/kimi-k2-instruct` | Python/Streamlit coding tutor with working code examples |
|
||||
| **Kali Linux Simulation Agent** | `moonshotai/kimi-k2-instruct` | Simulates pentesting tools (MASSCAN, NMAP, Nikto) for training |
|
||||
|
||||
### Local Agents (Ollama)
|
||||
|
||||
These agents run entirely on your NVIDIA DGX Spark:
|
||||
|
||||
| Agent | Model | What It Does |
|
||||
|-------|-------|--------------|
|
||||
| **Nemotron Mini Agent** | `nemotron-mini:latest` | Fast general-purpose assistant |
|
||||
| **Nemotron Agent** | `nemotron:latest` | Advanced reasoning and coding |
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Connection refused" when using Ollama agents
|
||||
|
||||
The agent can't connect to Ollama.
|
||||
|
||||
**Check Ollama is running:**
|
||||
```bash
|
||||
sudo systemctl status ollama
|
||||
```
|
||||
|
||||
**If stopped, start it:**
|
||||
```bash
|
||||
sudo systemctl start ollama
|
||||
```
|
||||
|
||||
**Verify it's accessible:**
|
||||
```bash
|
||||
curl http://localhost:11434/api/version
|
||||
```
|
||||
|
||||
### "Model not found" error
|
||||
|
||||
GT AI OS can't find the model.
|
||||
|
||||
**Check the model ID matches exactly:**
|
||||
```bash
|
||||
ollama list
|
||||
```
|
||||
|
||||
The Model ID in GT AI OS must match exactly what `ollama list` shows (e.g., `nemotron-mini:latest` not `nemotron-mini`).
|
||||
|
||||
### NVIDIA NIM agents return errors
|
||||
|
||||
**Check your API key:**
|
||||
1. Go to Control Panel → **API Keys**
|
||||
2. Click **Test** next to your NVIDIA key
|
||||
3. If it fails, regenerate your key at https://build.nvidia.com/
|
||||
|
||||
### Ollama is slow
|
||||
|
||||
**Check GPU is being used:**
|
||||
```bash
|
||||
nvidia-smi
|
||||
```
|
||||
|
||||
While using an Ollama model, you should see `ollama` or `ollama_llama_server` using GPU memory.
|
||||
|
||||
**If not using GPU:**
|
||||
```bash
|
||||
# Reinstall Ollama
|
||||
curl -fsSL https://ollama.com/install.sh | sh
|
||||
sudo systemctl restart ollama
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Guides
|
||||
|
||||
- [Ollama Setup](Ollama-Setup) - More Ollama configuration options
|
||||
- [Control Panel Guide](Control-Panel-Guide) - Full admin configuration
|
||||
- [Tenant App Guide](Tenant-App-Guide) - Using agents and chat
|
||||
- [Demo Agents](Demo-Agents) - More pre-built agents
|
||||
- [Troubleshooting](Troubleshooting) - Common issues
|
||||
Reference in New Issue
Block a user