Table of Contents
- Projects for NVIDIA NIMs and Nemotron using Local Ollama
- Prerequisites
- What This Runbook Covers
- Part 1: Get Your NVIDIA NIM API Key
- Step 1.1: Create an NVIDIA Developer Account
- Step 1.2: Generate Your API Key
- Step 1.3: Add the API Key to GT AI OS
- Part 2: Install Ollama on NVIDIA DGX Spark
- Step 2.1: Install Ollama
- Step 2.2: Configure Ollama for GT AI OS
- Step 2.3: Start Ollama Service
- Step 2.4: Verify Ollama is Running
- Part 3: Download Nemotron Models
- Step 3.1: Download Nemotron Mini
- Step 3.2: Download Nemotron Full
- Step 3.3: Verify Models Downloaded
- Step 3.4: Test the Models
- Part 4: Add Nemotron Models to GT AI OS
- Step 4.1: Add Nemotron Mini Model
- Step 4.2: Add Nemotron Full Model
- Step 4.3: Assign Models to Your Tenant
- Part 5: Import Demo Agents
- Step 5.1: Download the Agent Files
- Step 5.2: Import Agents into GT AI OS
- Step 5.3: Verify Agents Appear
- Part 6: Test Everything
- Agent Reference
- Troubleshooting
- "Connection refused" when using Ollama agents
- "Model not found" error
- NVIDIA NIM agents return errors
- Ollama is slow
- Related Guides
Projects for NVIDIA NIMs and Nemotron using Local Ollama
A step-by-step runbook for setting up cloud and local AI models on NVIDIA DGX Spark.
Prerequisites
You MUST complete these steps first before following this guide:
| Step | Guide | What You'll Do |
|---|---|---|
| 1 | Installation Guide | Install GT AI OS on your NVIDIA DGX Spark system |
| 2 | Control Panel Guide | Create your admin account, delete default account, configure tenant |
Verify your installation works:
- Open http://localhost:3001 (Control Panel) - you should see the login page
- Open http://localhost:3002 (Tenant App) - you should see the login page
- Log in with your admin credentials (or default:
gtadmin@test.com/Test@123)
Not working? Go back to the Installation Guide first.
What This Runbook Covers
By the end, you will have:
- NVIDIA NIM cloud models configured (Kimi K2 for advanced AI tasks)
- Ollama installed with local Nemotron models on your NVIDIA DGX Spark
- Four demo agents ready to use
Estimated time: 30-45 minutes
Part 1: Get Your NVIDIA NIM API Key
NVIDIA NIM gives you access to powerful AI models in the cloud via NVIDIA DGX Cloud.
Step 1.1: Create an NVIDIA Developer Account
- Open your web browser
- Go to: https://build.nvidia.com/
- Click the Sign In button (top right corner)
- Click Create Account if you don't have one
- Fill in your details and create your account
- Check your email for a verification link
- Click the link to verify your account
Step 1.2: Generate Your API Key
- Go to: https://build.nvidia.com/
- Sign in with your account
- Click on any model card (e.g., click on "Kimi K2")
- Click Get API Key button
- Copy the API key that appears
- Save this key - you will need it in the next step
Step 1.3: Add the API Key to GT AI OS
- Open Control Panel: http://localhost:3001
- Log in with your admin credentials
- Click API Keys in the left sidebar
- Click Add API Key
- Fill in:
- Provider: Select NVIDIA
- API Key: Paste your NVIDIA API key
- Click Save
- Click Test next to your new key
- You should see a green checkmark or "Valid" status
Verification: If the test fails, check that you copied the complete API key.
Part 2: Install Ollama on NVIDIA DGX Spark
Ollama lets you run AI models locally. NVIDIA DGX Spark systems come with NVIDIA drivers pre-installed, so Ollama will automatically use your GPUs.
Step 2.1: Install Ollama
- Open a terminal on your NVIDIA DGX Spark
- Run this command:
curl -fsSL https://ollama.com/install.sh | sh
- Wait for installation to complete
- You should see:
Ollama has been installed successfully
Step 2.2: Configure Ollama for GT AI OS
Create the configuration so GT AI OS can connect to Ollama:
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_CONTEXT_LENGTH=131072"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KEEP_ALIVE=4h"
Environment="OLLAMA_MAX_LOADED_MODELS=3"
EOF
What this does:
OLLAMA_HOST=0.0.0.0:11434- Allows GT AI OS Docker containers to connectOLLAMA_CONTEXT_LENGTH=131072- 128K token context windowOLLAMA_FLASH_ATTENTION=1- Better performanceOLLAMA_KEEP_ALIVE=4h- Keeps models loaded for faster responsesOLLAMA_MAX_LOADED_MODELS=3- Multiple models can be loaded (NVIDIA DGX Spark has plenty of VRAM)
Step 2.3: Start Ollama Service
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
Step 2.4: Verify Ollama is Running
ollama list
You should see an empty list (no models yet). If you get an error, wait 10 seconds and try again.
Part 3: Download Nemotron Models
NVIDIA Nemotron models are optimized for NVIDIA hardware.
Step 3.1: Download Nemotron Mini
This is the faster, smaller model (~4GB):
ollama pull nemotron-mini:latest
Wait for download to complete (5-15 minutes depending on internet speed).
Step 3.2: Download Nemotron Full
This is the more powerful model (~25GB):
ollama pull nemotron:latest
Wait for download to complete (15-45 minutes depending on internet speed).
Step 3.3: Verify Models Downloaded
ollama list
You should see:
NAME SIZE
nemotron-mini:latest 4.1 GB
nemotron:latest 25.3 GB
Step 3.4: Test the Models
Test Nemotron Mini:
ollama run nemotron-mini:latest "Hello, are you working?"
You should get a friendly response. Press Ctrl+D to exit.
Test Nemotron Full:
ollama run nemotron:latest "What is 2 + 2?"
You should get an answer. Press Ctrl+D to exit.
Part 4: Add Nemotron Models to GT AI OS
Now configure GT AI OS to use your local Ollama models.
Step 4.1: Add Nemotron Mini Model
- Open Control Panel: http://localhost:3001
- Log in with your admin credentials
- Click Models in the left sidebar
- Click Add Model
- Fill in these exact values:
| Field | Value |
|---|---|
| Model ID | nemotron-mini:latest |
| Name | Ollama Nemotron Mini |
| Provider | Local Ollama (Ubuntu x86 / DGX ARM) |
| Model Type | LLM |
| Endpoint URL | http://ollama-host:11434/v1/chat/completions |
| Context Window | 8192 |
| Max Tokens | 4096 |
- Click Save
Step 4.2: Add Nemotron Full Model
- Click Add Model again
- Fill in:
| Field | Value |
|---|---|
| Model ID | nemotron:latest |
| Name | Ollama Nemotron |
| Provider | Local Ollama (Ubuntu x86 / DGX ARM) |
| Model Type | LLM |
| Endpoint URL | http://ollama-host:11434/v1/chat/completions |
| Context Window | 32768 |
| Max Tokens | 8192 |
- Click Save
Step 4.3: Assign Models to Your Tenant
- Click Tenant Access in the left sidebar (or find it under Models)
- Click Assign Model to Tenant
- Select:
- Model:
nemotron-mini:latest - Tenant: Your tenant name
- Rate Limit: Choose a rate limit (e.g., Standard)
- Model:
- Click Assign
- Repeat for
nemotron:latest
Part 5: Import Demo Agents
We provide four pre-built agents that demonstrate both NVIDIA NIM (cloud) and Ollama (local) capabilities.
Step 5.1: Download the Agent Files
Download the CSV files for the agents you want to import:
| Agent | Download | Provider |
|---|---|---|
| Python Coding Micro Project | Download CSV | NVIDIA NIM |
| Kali Linux Simulation Agent | Download CSV | NVIDIA NIM |
| Nemotron Mini Agent | Download CSV | Ollama |
| Nemotron Agent | Download CSV | Ollama |
Click the download link and the file will download automatically.
Step 5.2: Import Agents into GT AI OS
- Open Tenant App: http://localhost:3002
- Log in with your credentials
- Click Agents in the left sidebar
- Click Import button
- Click Choose File and select
python_coding_microproject.csv - Click Import
- Repeat steps 4-6 for each CSV file
Step 5.3: Verify Agents Appear
In the Agents page, you should now see:
- Python Coding Micro Project - Uses NVIDIA NIM (cloud)
- Kali Linux Simulation Agent - Uses NVIDIA NIM (cloud)
- Nemotron Mini Agent - Uses Ollama (local)
- Nemotron Agent - Uses Ollama (local)
Part 6: Test Everything
Test a NVIDIA NIM Agent (Cloud)
- In Tenant App, click Agents
- Click Python Coding Micro Project
- Click Chat or start a conversation
- Type:
Help me make a simple Python program - Press Enter
- You should get Python code with explanations
Test an Ollama Agent (Local)
- Click Agents
- Click Nemotron Mini Agent
- Click Chat
- Type:
What can you help me with? - Press Enter
- You should get a response from your local Nemotron model
Agent Reference
Cloud Agents (NVIDIA NIM)
These agents use NVIDIA NIM cloud inference:
| Agent | Model | What It Does |
|---|---|---|
| Python Coding Micro Project | moonshotai/kimi-k2-instruct |
Python/Streamlit coding tutor with working code examples |
| Kali Linux Simulation Agent | moonshotai/kimi-k2-instruct |
Simulates pentesting tools (MASSCAN, NMAP, Nikto) for training |
Local Agents (Ollama)
These agents run entirely on your NVIDIA DGX Spark:
| Agent | Model | What It Does |
|---|---|---|
| Nemotron Mini Agent | nemotron-mini:latest |
Fast general-purpose assistant |
| Nemotron Agent | nemotron:latest |
Advanced reasoning and coding |
Troubleshooting
"Connection refused" when using Ollama agents
The agent can't connect to Ollama.
Check Ollama is running:
sudo systemctl status ollama
If stopped, start it:
sudo systemctl start ollama
Verify it's accessible:
curl http://localhost:11434/api/version
"Model not found" error
GT AI OS can't find the model.
Check the model ID matches exactly:
ollama list
The Model ID in GT AI OS must match exactly what ollama list shows (e.g., nemotron-mini:latest not nemotron-mini).
NVIDIA NIM agents return errors
Check your API key:
- Go to Control Panel → API Keys
- Click Test next to your NVIDIA key
- If it fails, regenerate your key at https://build.nvidia.com/
Ollama is slow
Check GPU is being used:
nvidia-smi
While using an Ollama model, you should see ollama or ollama_llama_server using GPU memory.
If not using GPU:
# Reinstall Ollama
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl restart ollama
Related Guides
- Ollama Setup - More Ollama configuration options
- Control Panel Guide - Full admin configuration
- Tenant App Guide - Using agents and chat
- Demo Agents - More pre-built agents
- Troubleshooting - Common issues