From 1fdb51746361178b0e9d95a8e53d4d4fc38c50aa Mon Sep 17 00:00:00 2001 From: daniel Date: Sat, 10 Jan 2026 03:26:41 +0000 Subject: [PATCH] Add Projects for NVIDIA NIMs and Nemotron using Local Ollama --- ...IA-NIMs-and-Nemotron-using-Local-Ollama.md | 387 ++++++++++++++++++ 1 file changed, 387 insertions(+) create mode 100644 Projects-for-NVIDIA-NIMs-and-Nemotron-using-Local-Ollama.md diff --git a/Projects-for-NVIDIA-NIMs-and-Nemotron-using-Local-Ollama.md b/Projects-for-NVIDIA-NIMs-and-Nemotron-using-Local-Ollama.md new file mode 100644 index 0000000..299408a --- /dev/null +++ b/Projects-for-NVIDIA-NIMs-and-Nemotron-using-Local-Ollama.md @@ -0,0 +1,387 @@ +# Projects for NVIDIA NIMs and Nemotron using Local Ollama + +A step-by-step runbook for setting up cloud and local AI models on NVIDIA DGX Spark. + +--- + +## Prerequisites + +**You MUST complete these steps first before following this guide:** + +| Step | Guide | What You'll Do | +|------|-------|----------------| +| 1 | [Installation Guide](Installation#nvidia-dgx-os-7-grace-blackwell-architecture) | Install GT AI OS on your NVIDIA DGX Spark system | +| 2 | [Control Panel Guide](Control-Panel-Guide) | Create your admin account, delete default account, configure tenant | + +**Verify your installation works:** +1. Open http://localhost:3001 (Control Panel) - you should see the login page +2. Open http://localhost:3002 (Tenant App) - you should see the login page +3. Log in with your admin credentials (or default: `gtadmin@test.com` / `Test@123`) + +**Not working?** Go back to the [Installation Guide](Installation) first. + +--- + +## What This Runbook Covers + +By the end, you will have: +- NVIDIA NIM cloud models configured (Kimi K2 for advanced AI tasks) +- Ollama installed with local Nemotron models on your NVIDIA DGX Spark +- Four demo agents ready to use + +**Estimated time:** 30-45 minutes + +--- + +## Part 1: Get Your NVIDIA NIM API Key + +NVIDIA NIM gives you access to powerful AI models in the cloud via NVIDIA DGX Cloud. + +### Step 1.1: Create an NVIDIA Developer Account + +1. Open your web browser +2. Go to: **https://build.nvidia.com/** +3. Click the **Sign In** button (top right corner) +4. Click **Create Account** if you don't have one +5. Fill in your details and create your account +6. Check your email for a verification link +7. Click the link to verify your account + +### Step 1.2: Generate Your API Key + +1. Go to: **https://build.nvidia.com/** +2. Sign in with your account +3. Click on any model card (e.g., click on "Kimi K2") +4. Click **Get API Key** button +5. Copy the API key that appears +6. **Save this key** - you will need it in the next step + +### Step 1.3: Add the API Key to GT AI OS + +1. Open Control Panel: **http://localhost:3001** +2. Log in with your admin credentials +3. Click **API Keys** in the left sidebar +4. Click **Add API Key** +5. Fill in: + - **Provider:** Select **NVIDIA** + - **API Key:** Paste your NVIDIA API key +6. Click **Save** +7. Click **Test** next to your new key +8. You should see a green checkmark or "Valid" status + +**Verification:** If the test fails, check that you copied the complete API key. + +--- + +## Part 2: Install Ollama on NVIDIA DGX Spark + +Ollama lets you run AI models locally. NVIDIA DGX Spark systems come with NVIDIA drivers pre-installed, so Ollama will automatically use your GPUs. + +### Step 2.1: Install Ollama + +1. Open a terminal on your NVIDIA DGX Spark +2. Run this command: + +```bash +curl -fsSL https://ollama.com/install.sh | sh +``` + +3. Wait for installation to complete +4. You should see: `Ollama has been installed successfully` + +### Step 2.2: Configure Ollama for GT AI OS + +Create the configuration so GT AI OS can connect to Ollama: + +```bash +sudo mkdir -p /etc/systemd/system/ollama.service.d + +sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF' +[Service] +Environment="OLLAMA_HOST=0.0.0.0:11434" +Environment="OLLAMA_CONTEXT_LENGTH=131072" +Environment="OLLAMA_FLASH_ATTENTION=1" +Environment="OLLAMA_KEEP_ALIVE=4h" +Environment="OLLAMA_MAX_LOADED_MODELS=3" +EOF +``` + +**What this does:** +- `OLLAMA_HOST=0.0.0.0:11434` - Allows GT AI OS Docker containers to connect +- `OLLAMA_CONTEXT_LENGTH=131072` - 128K token context window +- `OLLAMA_FLASH_ATTENTION=1` - Better performance +- `OLLAMA_KEEP_ALIVE=4h` - Keeps models loaded for faster responses +- `OLLAMA_MAX_LOADED_MODELS=3` - Multiple models can be loaded (NVIDIA DGX Spark has plenty of VRAM) + +### Step 2.3: Start Ollama Service + +```bash +sudo systemctl daemon-reload +sudo systemctl enable ollama +sudo systemctl start ollama +``` + +### Step 2.4: Verify Ollama is Running + +```bash +ollama list +``` + +You should see an empty list (no models yet). If you get an error, wait 10 seconds and try again. + +--- + +## Part 3: Download Nemotron Models + +NVIDIA Nemotron models are optimized for NVIDIA hardware. + +### Step 3.1: Download Nemotron Mini + +This is the faster, smaller model (~4GB): + +```bash +ollama pull nemotron-mini:latest +``` + +Wait for download to complete (5-15 minutes depending on internet speed). + +### Step 3.2: Download Nemotron Full + +This is the more powerful model (~25GB): + +```bash +ollama pull nemotron:latest +``` + +Wait for download to complete (15-45 minutes depending on internet speed). + +### Step 3.3: Verify Models Downloaded + +```bash +ollama list +``` + +You should see: +``` +NAME SIZE +nemotron-mini:latest 4.1 GB +nemotron:latest 25.3 GB +``` + +### Step 3.4: Test the Models + +Test Nemotron Mini: +```bash +ollama run nemotron-mini:latest "Hello, are you working?" +``` + +You should get a friendly response. Press `Ctrl+D` to exit. + +Test Nemotron Full: +```bash +ollama run nemotron:latest "What is 2 + 2?" +``` + +You should get an answer. Press `Ctrl+D` to exit. + +--- + +## Part 4: Add Nemotron Models to GT AI OS + +Now configure GT AI OS to use your local Ollama models. + +### Step 4.1: Add Nemotron Mini Model + +1. Open Control Panel: **http://localhost:3001** +2. Log in with your admin credentials +3. Click **Models** in the left sidebar +4. Click **Add Model** +5. Fill in these exact values: + +| Field | Value | +|-------|-------| +| **Model ID** | `nemotron-mini:latest` | +| **Name** | `Ollama Nemotron Mini` | +| **Provider** | `Local Ollama (Ubuntu x86 / DGX ARM)` | +| **Model Type** | `LLM` | +| **Endpoint URL** | `http://ollama-host:11434/v1/chat/completions` | +| **Context Window** | `8192` | +| **Max Tokens** | `4096` | + +6. Click **Save** + +### Step 4.2: Add Nemotron Full Model + +1. Click **Add Model** again +2. Fill in: + +| Field | Value | +|-------|-------| +| **Model ID** | `nemotron:latest` | +| **Name** | `Ollama Nemotron` | +| **Provider** | `Local Ollama (Ubuntu x86 / DGX ARM)` | +| **Model Type** | `LLM` | +| **Endpoint URL** | `http://ollama-host:11434/v1/chat/completions` | +| **Context Window** | `32768` | +| **Max Tokens** | `8192` | + +3. Click **Save** + +### Step 4.3: Assign Models to Your Tenant + +1. Click **Tenant Access** in the left sidebar (or find it under Models) +2. Click **Assign Model to Tenant** +3. Select: + - **Model:** `nemotron-mini:latest` + - **Tenant:** Your tenant name + - **Rate Limit:** Choose a rate limit (e.g., Standard) +4. Click **Assign** +5. Repeat for `nemotron:latest` + +--- + +## Part 5: Import Demo Agents + +We provide four pre-built agents that demonstrate both NVIDIA NIM (cloud) and Ollama (local) capabilities. + +### Step 5.1: Download the Agent Files + +Download the CSV files for the agents you want to import: + +| Agent | Download | Provider | +|-------|----------|----------| +| Python Coding Micro Project | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/python_coding_microproject.csv) | NVIDIA NIM | +| Kali Linux Simulation Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/kali_linux_shell_simulator.csv) | NVIDIA NIM | +| Nemotron Mini Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/nemotron-mini-agent.csv) | Ollama | +| Nemotron Agent | [Download CSV](https://github.com/GT-Edge-AI-Internal/gt-ai-os-community/releases/download/v2.0.33/nemotron-agent.csv) | Ollama | + +Click the download link and the file will download automatically. + +### Step 5.2: Import Agents into GT AI OS + +1. Open Tenant App: **http://localhost:3002** +2. Log in with your credentials +3. Click **Agents** in the left sidebar +4. Click **Import** button +5. Click **Choose File** and select `python_coding_microproject.csv` +6. Click **Import** +7. Repeat steps 4-6 for each CSV file + +### Step 5.3: Verify Agents Appear + +In the **Agents** page, you should now see: +- **Python Coding Micro Project** - Uses NVIDIA NIM (cloud) +- **Kali Linux Simulation Agent** - Uses NVIDIA NIM (cloud) +- **Nemotron Mini Agent** - Uses Ollama (local) +- **Nemotron Agent** - Uses Ollama (local) + +--- + +## Part 6: Test Everything + +### Test a NVIDIA NIM Agent (Cloud) + +1. In Tenant App, click **Agents** +2. Click **Python Coding Micro Project** +3. Click **Chat** or start a conversation +4. Type: `Help me make a simple Python program` +5. Press Enter +6. You should get Python code with explanations + +### Test an Ollama Agent (Local) + +1. Click **Agents** +2. Click **Nemotron Mini Agent** +3. Click **Chat** +4. Type: `What can you help me with?` +5. Press Enter +6. You should get a response from your local Nemotron model + +--- + +## Agent Reference + +### Cloud Agents (NVIDIA NIM) + +These agents use NVIDIA NIM cloud inference: + +| Agent | Model | What It Does | +|-------|-------|--------------| +| **Python Coding Micro Project** | `moonshotai/kimi-k2-instruct` | Python/Streamlit coding tutor with working code examples | +| **Kali Linux Simulation Agent** | `moonshotai/kimi-k2-instruct` | Simulates pentesting tools (MASSCAN, NMAP, Nikto) for training | + +### Local Agents (Ollama) + +These agents run entirely on your NVIDIA DGX Spark: + +| Agent | Model | What It Does | +|-------|-------|--------------| +| **Nemotron Mini Agent** | `nemotron-mini:latest` | Fast general-purpose assistant | +| **Nemotron Agent** | `nemotron:latest` | Advanced reasoning and coding | + +--- + +## Troubleshooting + +### "Connection refused" when using Ollama agents + +The agent can't connect to Ollama. + +**Check Ollama is running:** +```bash +sudo systemctl status ollama +``` + +**If stopped, start it:** +```bash +sudo systemctl start ollama +``` + +**Verify it's accessible:** +```bash +curl http://localhost:11434/api/version +``` + +### "Model not found" error + +GT AI OS can't find the model. + +**Check the model ID matches exactly:** +```bash +ollama list +``` + +The Model ID in GT AI OS must match exactly what `ollama list` shows (e.g., `nemotron-mini:latest` not `nemotron-mini`). + +### NVIDIA NIM agents return errors + +**Check your API key:** +1. Go to Control Panel → **API Keys** +2. Click **Test** next to your NVIDIA key +3. If it fails, regenerate your key at https://build.nvidia.com/ + +### Ollama is slow + +**Check GPU is being used:** +```bash +nvidia-smi +``` + +While using an Ollama model, you should see `ollama` or `ollama_llama_server` using GPU memory. + +**If not using GPU:** +```bash +# Reinstall Ollama +curl -fsSL https://ollama.com/install.sh | sh +sudo systemctl restart ollama +``` + +--- + +## Related Guides + +- [Ollama Setup](Ollama-Setup) - More Ollama configuration options +- [Control Panel Guide](Control-Panel-Guide) - Full admin configuration +- [Tenant App Guide](Tenant-App-Guide) - Using agents and chat +- [Demo Agents](Demo-Agents) - More pre-built agents +- [Troubleshooting](Troubleshooting) - Common issues