1
Projects for NVIDIA NIMs and Nemotron using Local Ollama
daniel edited this page 2026-01-10 03:26:41 +00:00

Projects for NVIDIA NIMs and Nemotron using Local Ollama

A step-by-step runbook for setting up cloud and local AI models on NVIDIA DGX Spark.


Prerequisites

You MUST complete these steps first before following this guide:

Step Guide What You'll Do
1 Installation Guide Install GT AI OS on your NVIDIA DGX Spark system
2 Control Panel Guide Create your admin account, delete default account, configure tenant

Verify your installation works:

  1. Open http://localhost:3001 (Control Panel) - you should see the login page
  2. Open http://localhost:3002 (Tenant App) - you should see the login page
  3. Log in with your admin credentials (or default: gtadmin@test.com / Test@123)

Not working? Go back to the Installation Guide first.


What This Runbook Covers

By the end, you will have:

  • NVIDIA NIM cloud models configured (Kimi K2 for advanced AI tasks)
  • Ollama installed with local Nemotron models on your NVIDIA DGX Spark
  • Four demo agents ready to use

Estimated time: 30-45 minutes


Part 1: Get Your NVIDIA NIM API Key

NVIDIA NIM gives you access to powerful AI models in the cloud via NVIDIA DGX Cloud.

Step 1.1: Create an NVIDIA Developer Account

  1. Open your web browser
  2. Go to: https://build.nvidia.com/
  3. Click the Sign In button (top right corner)
  4. Click Create Account if you don't have one
  5. Fill in your details and create your account
  6. Check your email for a verification link
  7. Click the link to verify your account

Step 1.2: Generate Your API Key

  1. Go to: https://build.nvidia.com/
  2. Sign in with your account
  3. Click on any model card (e.g., click on "Kimi K2")
  4. Click Get API Key button
  5. Copy the API key that appears
  6. Save this key - you will need it in the next step

Step 1.3: Add the API Key to GT AI OS

  1. Open Control Panel: http://localhost:3001
  2. Log in with your admin credentials
  3. Click API Keys in the left sidebar
  4. Click Add API Key
  5. Fill in:
    • Provider: Select NVIDIA
    • API Key: Paste your NVIDIA API key
  6. Click Save
  7. Click Test next to your new key
  8. You should see a green checkmark or "Valid" status

Verification: If the test fails, check that you copied the complete API key.


Part 2: Install Ollama on NVIDIA DGX Spark

Ollama lets you run AI models locally. NVIDIA DGX Spark systems come with NVIDIA drivers pre-installed, so Ollama will automatically use your GPUs.

Step 2.1: Install Ollama

  1. Open a terminal on your NVIDIA DGX Spark
  2. Run this command:
curl -fsSL https://ollama.com/install.sh | sh
  1. Wait for installation to complete
  2. You should see: Ollama has been installed successfully

Step 2.2: Configure Ollama for GT AI OS

Create the configuration so GT AI OS can connect to Ollama:

sudo mkdir -p /etc/systemd/system/ollama.service.d

sudo tee /etc/systemd/system/ollama.service.d/override.conf > /dev/null <<'EOF'
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_CONTEXT_LENGTH=131072"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KEEP_ALIVE=4h"
Environment="OLLAMA_MAX_LOADED_MODELS=3"
EOF

What this does:

  • OLLAMA_HOST=0.0.0.0:11434 - Allows GT AI OS Docker containers to connect
  • OLLAMA_CONTEXT_LENGTH=131072 - 128K token context window
  • OLLAMA_FLASH_ATTENTION=1 - Better performance
  • OLLAMA_KEEP_ALIVE=4h - Keeps models loaded for faster responses
  • OLLAMA_MAX_LOADED_MODELS=3 - Multiple models can be loaded (NVIDIA DGX Spark has plenty of VRAM)

Step 2.3: Start Ollama Service

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

Step 2.4: Verify Ollama is Running

ollama list

You should see an empty list (no models yet). If you get an error, wait 10 seconds and try again.


Part 3: Download Nemotron Models

NVIDIA Nemotron models are optimized for NVIDIA hardware.

Step 3.1: Download Nemotron Mini

This is the faster, smaller model (~4GB):

ollama pull nemotron-mini:latest

Wait for download to complete (5-15 minutes depending on internet speed).

Step 3.2: Download Nemotron Full

This is the more powerful model (~25GB):

ollama pull nemotron:latest

Wait for download to complete (15-45 minutes depending on internet speed).

Step 3.3: Verify Models Downloaded

ollama list

You should see:

NAME                    SIZE
nemotron-mini:latest    4.1 GB
nemotron:latest         25.3 GB

Step 3.4: Test the Models

Test Nemotron Mini:

ollama run nemotron-mini:latest "Hello, are you working?"

You should get a friendly response. Press Ctrl+D to exit.

Test Nemotron Full:

ollama run nemotron:latest "What is 2 + 2?"

You should get an answer. Press Ctrl+D to exit.


Part 4: Add Nemotron Models to GT AI OS

Now configure GT AI OS to use your local Ollama models.

Step 4.1: Add Nemotron Mini Model

  1. Open Control Panel: http://localhost:3001
  2. Log in with your admin credentials
  3. Click Models in the left sidebar
  4. Click Add Model
  5. Fill in these exact values:
Field Value
Model ID nemotron-mini:latest
Name Ollama Nemotron Mini
Provider Local Ollama (Ubuntu x86 / DGX ARM)
Model Type LLM
Endpoint URL http://ollama-host:11434/v1/chat/completions
Context Window 8192
Max Tokens 4096
  1. Click Save

Step 4.2: Add Nemotron Full Model

  1. Click Add Model again
  2. Fill in:
Field Value
Model ID nemotron:latest
Name Ollama Nemotron
Provider Local Ollama (Ubuntu x86 / DGX ARM)
Model Type LLM
Endpoint URL http://ollama-host:11434/v1/chat/completions
Context Window 32768
Max Tokens 8192
  1. Click Save

Step 4.3: Assign Models to Your Tenant

  1. Click Tenant Access in the left sidebar (or find it under Models)
  2. Click Assign Model to Tenant
  3. Select:
    • Model: nemotron-mini:latest
    • Tenant: Your tenant name
    • Rate Limit: Choose a rate limit (e.g., Standard)
  4. Click Assign
  5. Repeat for nemotron:latest

Part 5: Import Demo Agents

We provide four pre-built agents that demonstrate both NVIDIA NIM (cloud) and Ollama (local) capabilities.

Step 5.1: Download the Agent Files

Download the CSV files for the agents you want to import:

Agent Download Provider
Python Coding Micro Project Download CSV NVIDIA NIM
Kali Linux Simulation Agent Download CSV NVIDIA NIM
Nemotron Mini Agent Download CSV Ollama
Nemotron Agent Download CSV Ollama

Click the download link and the file will download automatically.

Step 5.2: Import Agents into GT AI OS

  1. Open Tenant App: http://localhost:3002
  2. Log in with your credentials
  3. Click Agents in the left sidebar
  4. Click Import button
  5. Click Choose File and select python_coding_microproject.csv
  6. Click Import
  7. Repeat steps 4-6 for each CSV file

Step 5.3: Verify Agents Appear

In the Agents page, you should now see:

  • Python Coding Micro Project - Uses NVIDIA NIM (cloud)
  • Kali Linux Simulation Agent - Uses NVIDIA NIM (cloud)
  • Nemotron Mini Agent - Uses Ollama (local)
  • Nemotron Agent - Uses Ollama (local)

Part 6: Test Everything

Test a NVIDIA NIM Agent (Cloud)

  1. In Tenant App, click Agents
  2. Click Python Coding Micro Project
  3. Click Chat or start a conversation
  4. Type: Help me make a simple Python program
  5. Press Enter
  6. You should get Python code with explanations

Test an Ollama Agent (Local)

  1. Click Agents
  2. Click Nemotron Mini Agent
  3. Click Chat
  4. Type: What can you help me with?
  5. Press Enter
  6. You should get a response from your local Nemotron model

Agent Reference

Cloud Agents (NVIDIA NIM)

These agents use NVIDIA NIM cloud inference:

Agent Model What It Does
Python Coding Micro Project moonshotai/kimi-k2-instruct Python/Streamlit coding tutor with working code examples
Kali Linux Simulation Agent moonshotai/kimi-k2-instruct Simulates pentesting tools (MASSCAN, NMAP, Nikto) for training

Local Agents (Ollama)

These agents run entirely on your NVIDIA DGX Spark:

Agent Model What It Does
Nemotron Mini Agent nemotron-mini:latest Fast general-purpose assistant
Nemotron Agent nemotron:latest Advanced reasoning and coding

Troubleshooting

"Connection refused" when using Ollama agents

The agent can't connect to Ollama.

Check Ollama is running:

sudo systemctl status ollama

If stopped, start it:

sudo systemctl start ollama

Verify it's accessible:

curl http://localhost:11434/api/version

"Model not found" error

GT AI OS can't find the model.

Check the model ID matches exactly:

ollama list

The Model ID in GT AI OS must match exactly what ollama list shows (e.g., nemotron-mini:latest not nemotron-mini).

NVIDIA NIM agents return errors

Check your API key:

  1. Go to Control Panel → API Keys
  2. Click Test next to your NVIDIA key
  3. If it fails, regenerate your key at https://build.nvidia.com/

Ollama is slow

Check GPU is being used:

nvidia-smi

While using an Ollama model, you should see ollama or ollama_llama_server using GPU memory.

If not using GPU:

# Reinstall Ollama
curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl restart ollama