Step	Guide	What You'll Do
1	Installation Guide	Install GT AI OS on your NVIDIA DGX Spark system
2	Control Panel Guide	Create your admin account, delete default account, configure tenant

Verify your installation works:

Open http://localhost:3001 (Control Panel) - you should see the login page
Open http://localhost:3002 (Tenant App) - you should see the login page
Log in with your admin credentials (or default: gtadmin@test.com / Test@123)

Not working? Go back to the Installation Guide first.

What This Runbook Covers

By the end, you will have:

NVIDIA NIM cloud models configured (Kimi K2 for advanced AI tasks)
Ollama installed with local Nemotron models on your NVIDIA DGX Spark
Four demo agents ready to use

Estimated time: 30-45 minutes

Part 1: Get Your NVIDIA NIM API Key

NVIDIA NIM gives you access to powerful AI models in the cloud via NVIDIA DGX Cloud.

Step 1.1: Create an NVIDIA Developer Account

Open your web browser
Go to: https://build.nvidia.com/
Click the Sign In button (top right corner)
Click Create Account if you don't have one
Fill in your details and create your account
Check your email for a verification link
Click the link to verify your account

Step 1.2: Generate Your API Key

Go to: https://build.nvidia.com/
Sign in with your account
Click on any model card (e.g., click on "Kimi K2")
Click Get API Key button
Copy the API key that appears
Save this key - you will need it in the next step

Step 1.3: Add the API Key to GT AI OS

Open Control Panel: http://localhost:3001
Log in with your admin credentials
Click API Keys in the left sidebar
Click Add API Key
Fill in:
- Provider: Select NVIDIA
- API Key: Paste your NVIDIA API key
Click Save
Click Test next to your new key
You should see a green checkmark or "Valid" status

Verification: If the test fails, check that you copied the complete API key.

Part 2: Install Ollama on NVIDIA DGX Spark

Ollama lets you run AI models locally. NVIDIA DGX Spark systems come with NVIDIA drivers pre-installed, so Ollama will automatically use your GPUs.

Step 2.1: Install Ollama

Open a terminal on your NVIDIA DGX Spark
Run this command:

curl -fsSL https://ollama.com/install.sh | sh

Wait for installation to complete
You should see: Ollama has been installed successfully

Step 2.2: Configure Ollama for GT AI OS

Create the configuration so GT AI OS can connect to Ollama:

sudo mkdir -p /etc/systemd/system/ollama.service.d

sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_CONTEXT_LENGTH=131072"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KEEP_ALIVE=4h"
Environment="OLLAMA_MAX_LOADED_MODELS=3"
EOF

What this does:

OLLAMA_HOST=0.0.0.0:11434 - Allows GT AI OS Docker containers to connect
OLLAMA_CONTEXT_LENGTH=131072 - 128K token context window
OLLAMA_FLASH_ATTENTION=1 - Better performance
OLLAMA_KEEP_ALIVE=4h - Keeps models loaded for faster responses
OLLAMA_MAX_LOADED_MODELS=3 - Multiple models can be loaded (NVIDIA DGX Spark has plenty of VRAM)

Step 2.3: Start Ollama Service

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

Step 2.4: Verify Ollama is Running

ollama list

You should see an empty list (no models yet). If you get an error, wait 10 seconds and try again.

Part 3: Download Nemotron Models

NVIDIA Nemotron models are optimized for NVIDIA hardware.

Step 3.1: Download Nemotron Mini

This is the faster, smaller model (~4GB):

ollama pull nemotron-mini:latest

Wait for download to complete (5-15 minutes depending on internet speed).

Step 3.2: Download Nemotron Full

This is the more powerful model (~25GB):

ollama pull nemotron:latest

Wait for download to complete (15-45 minutes depending on internet speed).

Step 3.3: Verify Models Downloaded

ollama list

You should see:

NAME                    SIZE
nemotron-mini:latest    4.1 GB
nemotron:latest         25.3 GB

Step 3.4: Test the Models

Test Nemotron Mini:

ollama run nemotron-mini:latest "Hello, are you working?"

You should get a friendly response. Press Ctrl+D to exit.

Test Nemotron Full:

ollama run nemotron:latest "What is 2 + 2?"

You should get an answer. Press Ctrl+D to exit.

Part 4: Add Nemotron Models to GT AI OS

Now configure GT AI OS to use your local Ollama models.

Step 4.1: Add Nemotron Mini Model

Open Control Panel: http://localhost:3001
Log in with your admin credentials
Click Models in the left sidebar
Click Add Model
Fill in these exact values:

Field	Value
Model ID	`nemotron-mini:latest`
Name	`Ollama Nemotron Mini`
Provider	`Local Ollama (Ubuntu x86 / DGX ARM)`
Model Type	`LLM`
Endpoint URL	`http://ollama-host:11434/v1/chat/completions`
Context Window	`8192`
Max Tokens	`4096`

Click Save

Step 4.2: Add Nemotron Full Model

Click Add Model again
Fill in:

Field	Value
Model ID	`nemotron:latest`
Name	`Ollama Nemotron`
Provider	`Local Ollama (Ubuntu x86 / DGX ARM)`
Model Type	`LLM`
Endpoint URL	`http://ollama-host:11434/v1/chat/completions`
Context Window	`32768`
Max Tokens	`8192`

Click Save

Step 4.3: Assign Models to Your Tenant

Click Tenant Access in the left sidebar (or find it under Models)
Click Assign Model to Tenant
Select:
- Model: nemotron-mini:latest
- Tenant: Your tenant name
- Rate Limit: Choose a rate limit (e.g., Standard)
Click Assign
Repeat for nemotron:latest

Part 5: Import Demo Agents

We provide four pre-built agents that demonstrate both NVIDIA NIM (cloud) and Ollama (local) capabilities.

Step 5.1: Download the Agent Files

Download the CSV files for the agents you want to import:

Agent	Download	Provider
Python Coding Micro Project	Download CSV	NVIDIA NIM
Kali Linux Simulation Agent	Download CSV	NVIDIA NIM
Nemotron Mini Agent	Download CSV	Ollama
Nemotron Agent	Download CSV	Ollama

Click the download link and the file will download automatically.

Step 5.2: Import Agents into GT AI OS

Open Tenant App: http://localhost:3002
Log in with your credentials
Click Agents in the left sidebar
Click Import button
Click Choose File and select python_coding_microproject.csv
Click Import
Repeat steps 4-6 for each CSV file

Step 5.3: Verify Agents Appear

In the Agents page, you should now see:

Python Coding Micro Project - Uses NVIDIA NIM (cloud)
Kali Linux Simulation Agent - Uses NVIDIA NIM (cloud)
Nemotron Mini Agent - Uses Ollama (local)
Nemotron Agent - Uses Ollama (local)

Part 6: Test Everything

Test a NVIDIA NIM Agent (Cloud)

In Tenant App, click Agents
Click Python Coding Micro Project
Click Chat or start a conversation
Type: Help me make a simple Python program
Press Enter
You should get Python code with explanations

Test an Ollama Agent (Local)

Click Agents
Click Nemotron Mini Agent
Click Chat
Type: What can you help me with?
Press Enter
You should get a response from your local Nemotron model

Agent Reference

Cloud Agents (NVIDIA NIM)

These agents use NVIDIA NIM cloud inference:

Agent	Model	What It Does
Python Coding Micro Project	`moonshotai/kimi-k2-instruct`	Python/Streamlit coding tutor with working code examples
Kali Linux Simulation Agent	`moonshotai/kimi-k2-instruct`	Simulates pentesting tools (MASSCAN, NMAP, Nikto) for training

Local Agents (Ollama)

These agents run entirely on your NVIDIA DGX Spark:

Agent	Model	What It Does
Nemotron Mini Agent	`nemotron-mini:latest`	Fast general-purpose assistant
Nemotron Agent	`nemotron:latest`	Advanced reasoning and coding

Troubleshooting

"Connection refused" when using Ollama agents

The agent can't connect to Ollama.

Check Ollama is running:

sudo systemctl status ollama

If stopped, start it:

sudo systemctl start ollama

Verify it's accessible:

curl http://localhost:11434/api/version

"Model not found" error

GT AI OS can't find the model.

Check the model ID matches exactly:

ollama list

The Model ID in GT AI OS must match exactly what ollama list shows (e.g., nemotron-mini:latest not nemotron-mini).

NVIDIA NIM agents return errors

Check your API key:

Go to Control Panel → API Keys
Click Test next to your NVIDIA key
If it fails, regenerate your key at https://build.nvidia.com/

Ollama is slow

Check GPU is being used:

nvidia-smi

While using an Ollama model, you should see ollama or ollama_llama_server using GPU memory.

If not using GPU:

# Reinstall Ollama
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl restart ollama

Ollama Setup - More Ollama configuration options
Control Panel Guide - Full admin configuration
Tenant App Guide - Using agents and chat
Demo Agents - More pre-built agents
Troubleshooting - Common issues

Projects for NVIDIA NIMs and Nemotron using Local Ollama

Prerequisites