I just build my own GPU dashboard, hope you like it

1 month ago 2

30+ GPU Metrics - Utilization, temperature, memory, power, clocks, encoder/decoder stats, and more per GPU
Multi-GPU Support - Automatic detection and independent monitoring of all NVIDIA GPUs
Live Historical Charts - Real-time graphs with statistics (min/max/avg), threshold indicators, and contextual tooltips
Process Monitoring - Track active GPU processes with memory usage and PIDs
Clean UI - Responsive interface with glassmorphism design and smooth animations
WebSocket Updates - Sub-second refresh rates (2s) for real-time monitoring
Docker Deployment - One-command setup with NVIDIA Container Toolkit support
Zero Configuration - Works out of the box with any NVIDIA GPU

GPU & Memory Utilization (%)
Core & Memory Temperature (°C)
Memory Usage (Used/Free/Total MB)
Power Draw & Limits (W)
Fan Speed (%)
Clock Speeds (Graphics, SM, Memory, Video MHz)

PCIe Generation & Lane Width (Current/Max)
Performance State (P-State)
Compute Mode
Encoder/Decoder Sessions & Stats
Driver & VBIOS Version
Throttle Status Detection

Host CPU & RAM Usage
Active GPU Processes with Memory Tracking

Docker Deployment (Recommended)

git clone https://github.com/psalias2006/gpu-hot cd gpu-hot docker-compose up --build

Access the dashboard at http://localhost:1312

pip install -r requirements.txt python app.py

Access the dashboard at http://localhost:1312

NVIDIA GPU with drivers installed (verify with nvidia-smi)
Docker & Docker Compose (for containerized deployment)
NVIDIA Container Toolkit (for Docker GPU access)
Python 3.8+ (for local development)

NVIDIA Container Toolkit Setup

Required for Docker deployment to access GPUs.

Installation:

Follow the official installation guide for your distribution:
📖 NVIDIA Container Toolkit Installation Guide

The guide includes instructions for Ubuntu, Debian, RHEL, CentOS, Fedora, and other distributions.

Verify Installation:

docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

gpu-hot/ ├── app.py # Flask application with WebSocket server ├── templates/ │ └── index.html # Web dashboard with live charts ├── requirements.txt # Python dependencies ├── Dockerfile # Container configuration ├── docker-compose.yml # Docker Compose setup └── README.md # Documentation

HTTP:

GET / - Dashboard interface
GET /api/gpu-data - Current GPU metrics (JSON)

WebSocket:

gpu_data - Real-time metrics broadcast (2s interval)
connect / disconnect - Connection events

Environment Variables:

NVIDIA_VISIBLE_DEVICES - GPU visibility (default: all)
NVIDIA_DRIVER_CAPABILITIES - GPU capabilities (default: all)

Customization in app.py:

Update interval: Modify eventlet.sleep(2) for refresh rate
Port: Change in socketio.run() (default: 1312)
Chart history: Adjust data retention (default: 30 points)

Base: nvidia/cuda:12.1-devel-ubuntu22.04
Self-contained nvidia-smi (no host mounting required)
Health checks every 30s
Automatic restart on failure
Exposes port 1312

Adding New Metrics:

Modify nvidia-smi query in parse_nvidia_smi()
Update frontend to display new metrics
Add chart configuration if needed

Local Development:

pip install -r requirements.txt python app.py

Access at http://localhost:1312 (Dashboard) or http://localhost:1312/api/gpu-data (API)

Verify NVIDIA drivers: nvidia-smi
Install NVIDIA Container Toolkit (see Installation section)
Restart Docker daemon: sudo systemctl restart docker
Test GPU access: docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

Check host GPU access: nvidia-smi
Verify Container Toolkit: nvidia-ctk --version
Review logs: docker-compose logs
Configure Docker runtime: sudo nvidia-ctk runtime configure --runtime=docker