Add Projects for NVIDIA NIMs and Nemotron using Local Ollama

2026-01-10 03:26:41 +00:00
parent 33cbd13760
commit 1fdb517463
1 changed files with 387 additions and 0 deletions
--- a/Projects-for-NVIDIA-NIMs-and-Nemotron-using-Local-Ollama.md
+++ b/Projects-for-NVIDIA-NIMs-and-Nemotron-using-Local-Ollama.md
@@ -0,0 +1,387 @@
+# Projects for NVIDIA NIMs and Nemotron using Local Ollama
+
+A step-by-step runbook for setting up cloud and local AI models on NVIDIA DGX Spark.
+
+---
+
+## Prerequisites
+
+**You MUST complete these steps first before following this guide:**
+
+| Step | Guide | What You'll Do |
+|------|-------|----------------|
+| 1 | [Installation Guide](Installation#nvidia-dgx-os-7-grace-blackwell-architecture) | Install GT AI OS on your NVIDIA DGX Spark system |
+| 2 | [Control Panel Guide](Control-Panel-Guide) | Create your admin account, delete default account, configure tenant |
+
+**Verify your installation works:**
+1. Open http://localhost:3001 (Control Panel) - you should see the login page
+2. Open http://localhost:3002 (Tenant App) - you should see the login page
+3. Log in with your admin credentials (or default: `gtadmin@test.com` / `Test@123`)
+
+**Not working?** Go back to the [Installation Guide](Installation) first.
+
+---
+
+## What This Runbook Covers
+
+By the end, you will have:
+- NVIDIA NIM cloud models configured (Kimi K2 for advanced AI tasks)
+- Ollama installed with local Nemotron models on your NVIDIA DGX Spark
+- Four demo agents ready to use
+
+**Estimated time:** 30-45 minutes
+
+---
+
+## Part 1: Get Your NVIDIA NIM API Key
+
+NVIDIA NIM gives you access to powerful AI models in the cloud via NVIDIA DGX Cloud.
+
+### Step 1.1: Create an NVIDIA Developer Account
+
+1. Open your web browser
+2. Go to: **https://build.nvidia.com/**
+3. Click the **Sign In** button (top right corner)
+4. Click **Create Account** if you don't have one
+5. Fill in your details and create your account
+6. Check your email for a verification link
+7. Click the link to verify your account
+
+### Step 1.2: Generate Your API Key
+
+1. Go to: **https://build.nvidia.com/**
+2. Sign in with your account
+3. Click on any model card (e.g., click on "Kimi K2")
+4. Click **Get API Key** button
+5. Copy the API key that appears
+6. **Save this key** - you will need it in the next step
+
+### Step 1.3: Add the API Key to GT AI OS
+
+1. Open Control Panel: **http://localhost:3001**
+2. Log in with your admin credentials
+3. Click **API Keys** in the left sidebar
+4. Click **Add API Key**
+5. Fill in:
+   - **Provider:** Select **NVIDIA**
+   - **API Key:** Paste your NVIDIA API key
+6. Click **Save**
+7. Click **Test** next to your new key
+8. You should see a green checkmark or "Valid" status
+
+**Verification:** If the test fails, check that you copied the complete API key.
+
+---
+
+## Part 2: Install Ollama on NVIDIA DGX Spark
+
+Ollama lets you run AI models locally. NVIDIA DGX Spark systems come with NVIDIA drivers pre-installed, so Ollama will automatically use your GPUs.
+
+### Step 2.1: Install Ollama
+
+1. Open a terminal on your NVIDIA DGX Spark
+2. Run this command:
+
+```bash
+curl -fsSL https://ollama.com/install.sh | sh
+```
+
+3. Wait for installation to complete
+4. You should see: `Ollama has been installed successfully`
+
+### Step 2.2: Configure Ollama for GT AI OS
+
+Create the configuration so GT AI OS can connect to Ollama:
+
+```bash
+sudo mkdir -p /etc/systemd/system/ollama.service.d
+
+sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
+[Service]
+Environment="OLLAMA_HOST=0.0.0.0:11434"
+Environment="OLLAMA_CONTEXT_LENGTH=131072"
+Environment="OLLAMA_FLASH_ATTENTION=1"
+Environment="OLLAMA_KEEP_ALIVE=4h"
+Environment="OLLAMA_MAX_LOADED_MODELS=3"
+EOF
+```
+
+**What this does:**
+- `OLLAMA_HOST=0.0.0.0:11434` - Allows GT AI OS Docker containers to connect
+- `OLLAMA_CONTEXT_LENGTH=131072` - 128K token context window
+- `OLLAMA_FLASH_ATTENTION=1` - Better performance
+- `OLLAMA_KEEP_ALIVE=4h` - Keeps models loaded for faster responses
+- `OLLAMA_MAX_LOADED_MODELS=3` - Multiple models can be loaded (NVIDIA DGX Spark has plenty of VRAM)
+
+### Step 2.3: Start Ollama Service
+
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable ollama
+sudo systemctl start ollama
+```
+
+### Step 2.4: Verify Ollama is Running
+
+```bash
+ollama list
+```
+
+You should see an empty list (no models yet). If you get an error, wait 10 seconds and try again.
+
+---
+
+## Part 3: Download Nemotron Models
+
+NVIDIA Nemotron models are optimized for NVIDIA hardware.
+
+### Step 3.1: Download Nemotron Mini
+
+This is the faster, smaller model (~4GB):
+
+```bash
+ollama pull nemotron-mini:latest
+```
+
+Wait for download to complete (5-15 minutes depending on internet speed).
+
+### Step 3.2: Download Nemotron Full
+
+This is the more powerful model (~25GB):
+
+```bash
+ollama pull nemotron:latest
+```
+
+Wait for download to complete (15-45 minutes depending on internet speed).
+
+### Step 3.3: Verify Models Downloaded
+
+```bash
+ollama list
+```
+
+You should see:
+```
+NAME                    SIZE
+nemotron-mini:latest    4.1 GB
+nemotron:latest         25.3 GB
+```
+
+### Step 3.4: Test the Models
+
+Test Nemotron Mini:
+```bash
+ollama run nemotron-mini:latest "Hello, are you working?"
+```
+
+You should get a friendly response. Press `Ctrl+D` to exit.
+
+Test Nemotron Full:
+```bash
+ollama run nemotron:latest "What is 2 + 2?"
+```
+
+You should get an answer. Press `Ctrl+D` to exit.
+
+---
+
+## Part 4: Add Nemotron Models to GT AI OS
+
+Now configure GT AI OS to use your local Ollama models.
+
+### Step 4.1: Add Nemotron Mini Model
+
+1. Open Control Panel: **http://localhost:3001**
+2. Log in with your admin credentials
+3. Click **Models** in the left sidebar
+4. Click **Add Model**
+5. Fill in these exact values:
+
+| Field | Value |
+|-------|-------|
+| **Model ID** | `nemotron-mini:latest` |
+| **Name** | `Ollama Nemotron Mini` |
+| **Provider** | `Local Ollama (Ubuntu x86 / DGX ARM)` |
+| **Model Type** | `LLM` |
+| **Endpoint URL** | `http://ollama-host:11434/v1/chat/completions` |
+| **Context Window** | `8192` |
+| **Max Tokens** | `4096` |
+
+6. Click **Save**
+
+### Step 4.2: Add Nemotron Full Model
+
+1. Click **Add Model** again
+2. Fill in:
+
+| Field | Value |
+|-------|-------|
+| **Model ID** | `nemotron:latest` |
+| **Name** | `Ollama Nemotron` |
+| **Provider** | `Local Ollama (Ubuntu x86 / DGX ARM)` |
+| **Model Type** | `LLM` |
+| **Endpoint URL** | `http://ollama-host:11434/v1/chat/completions` |
+| **Context Window** | `32768` |
+| **Max Tokens** | `8192` |
+
+3. Click **Save**
+
+### Step 4.3: Assign Models to Your Tenant
+
+1. Click **Tenant Access** in the left sidebar (or find it under Models)
+2. Click **Assign Model to Tenant**
+3. Select:
+   - **Model:** `nemotron-mini:latest`
+   - **Tenant:** Your tenant name
+   - **Rate Limit:** Choose a rate limit (e.g., Standard)
+4. Click **Assign**
+5. Repeat for `nemotron:latest`
+
+---
+
+## Part 5: Import Demo Agents
+
+We provide four pre-built agents that demonstrate both NVIDIA NIM (cloud) and Ollama (local) capabilities.
+
+### Step 5.1: Download the Agent Files
+
+Download the CSV files for the agents you want to import:
+
+| Agent | Download | Provider |
+|-------|----------|----------|
+| Python Coding Micro Project | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/python_coding_microproject.csv) | NVIDIA NIM |
+| Kali Linux Simulation Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/kali_linux_shell_simulator.csv) | NVIDIA NIM |
+| Nemotron Mini Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/nemotron-mini-agent.csv) | Ollama |
+| Nemotron Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/nemotron-agent.csv) | Ollama |
+
+Click the download link and the file will download automatically.
+
+### Step 5.2: Import Agents into GT AI OS
+
+1. Open Tenant App: **http://localhost:3002**
+2. Log in with your credentials
+3. Click **Agents** in the left sidebar
+4. Click **Import** button
+5. Click **Choose File** and select `python_coding_microproject.csv`
+6. Click **Import**
+7. Repeat steps 4-6 for each CSV file
+
+### Step 5.3: Verify Agents Appear
+
+In the **Agents** page, you should now see:
+- **Python Coding Micro Project** - Uses NVIDIA NIM (cloud)
+- **Kali Linux Simulation Agent** - Uses NVIDIA NIM (cloud)
+- **Nemotron Mini Agent** - Uses Ollama (local)
+- **Nemotron Agent** - Uses Ollama (local)
+
+---
+
+## Part 6: Test Everything
+
+### Test a NVIDIA NIM Agent (Cloud)
+
+1. In Tenant App, click **Agents**
+2. Click **Python Coding Micro Project**
+3. Click **Chat** or start a conversation
+4. Type: `Help me make a simple Python program`
+5. Press Enter
+6. You should get Python code with explanations
+
+### Test an Ollama Agent (Local)
+
+1. Click **Agents**
+2. Click **Nemotron Mini Agent**
+3. Click **Chat**
+4. Type: `What can you help me with?`
+5. Press Enter
+6. You should get a response from your local Nemotron model
+
+---
+
+## Agent Reference
+
+### Cloud Agents (NVIDIA NIM)
+
+These agents use NVIDIA NIM cloud inference:
+
+| Agent | Model | What It Does |
+|-------|-------|--------------|
+| **Python Coding Micro Project** | `moonshotai/kimi-k2-instruct` | Python/Streamlit coding tutor with working code examples |
+| **Kali Linux Simulation Agent** | `moonshotai/kimi-k2-instruct` | Simulates pentesting tools (MASSCAN, NMAP, Nikto) for training |
+
+### Local Agents (Ollama)
+
+These agents run entirely on your NVIDIA DGX Spark:
+
+| Agent | Model | What It Does |
+|-------|-------|--------------|
+| **Nemotron Mini Agent** | `nemotron-mini:latest` | Fast general-purpose assistant |
+| **Nemotron Agent** | `nemotron:latest` | Advanced reasoning and coding |
+
+---
+
+## Troubleshooting
+
+### "Connection refused" when using Ollama agents
+
+The agent can't connect to Ollama.
+
+**Check Ollama is running:**
+```bash
+sudo systemctl status ollama
+```
+
+**If stopped, start it:**
+```bash
+sudo systemctl start ollama
+```
+
+**Verify it's accessible:**
+```bash
+curl http://localhost:11434/api/version
+```
+
+### "Model not found" error
+
+GT AI OS can't find the model.
+
+**Check the model ID matches exactly:**
+```bash
+ollama list
+```
+
+The Model ID in GT AI OS must match exactly what `ollama list` shows (e.g., `nemotron-mini:latest` not `nemotron-mini`).
+
+### NVIDIA NIM agents return errors
+
+**Check your API key:**
+1. Go to Control Panel → **API Keys**
+2. Click **Test** next to your NVIDIA key
+3. If it fails, regenerate your key at https://build.nvidia.com/
+
+### Ollama is slow
+
+**Check GPU is being used:**
+```bash
+nvidia-smi
+```
+
+While using an Ollama model, you should see `ollama` or `ollama_llama_server` using GPU memory.
+
+**If not using GPU:**
+```bash
+# Reinstall Ollama
+curl -fsSL https://ollama.com/install.sh | sh
+sudo systemctl restart ollama
+```
+
+---
+
+## Related Guides
+
+- [Ollama Setup](Ollama-Setup) - More Ollama configuration options
+- [Control Panel Guide](Control-Panel-Guide) - Full admin configuration
+- [Tenant App Guide](Tenant-App-Guide) - Using agents and chat
+- [Demo Agents](Demo-Agents) - More pre-built agents
+- [Troubleshooting](Troubleshooting) - Common issues