System Overview

From TAMUQ Research Computing User Documentation Wiki
Jump to navigation Jump to search

Introduction

GPU Cluster at TAMUQ is part of raad2 GPU Cluster. The system is equipped with NVIDIA V100 GPUs and Intel Xeon Skylake processors. Users who want to accelerate their AI, HPC or Data science applications can largely benefit from this resource. Most commonly used GPU packages are already available on the system.

GPU   02 Tesla V100 Per Node
GPU Nodes   gfx[1-4] 
Memory   192GB Per Node
NVIDIA Tensor Cores   640 Per GPU
NVIDIA CUDA Cores   5,120 Per GPU
CPU   Intel Xeon Gold 6140
CPU Base Frequency    2.30 GHz
CPU Max Turbo Frequency   3.70 GHz
 Sockets   02 Per Server
Cores Per Socket   18

Job scheduler

GPU Cluster uses 'slurm' has a job scheduler.

Workload Manager   Slurm 20.11.7
Queue   gpu 
Local SSD Storage   /tmp
Per User GPU limit   1 GPU Per Job
Per User CPU limit   18 CPUs Per Job
Per User memory limit   92GB Per Job
Default Walltime job   1 Hour
Maximum Walltime job   24 Hours