System Overview -- RAAD2

From TAMUQ Research Computing User Documentation Wiki
Jump to navigation Jump to search

Our flagship system, named "raad2" is a Cray XC40 with 4,128 Intel Xeon Haswell cores. Put into production in early 2017, it has 172 compute nodes with an aggregate peak performance of 120+ TFLOPS(Linpack). The system uses the proprietary Aries dragonfly interconnect. Each compute node contains 2 sockets with 12 physical cores per processor chip -- giving a total of 24 cores per node, along with 128GB of RAM.

The Cray is served by a Lustre (DDN EXAScaler) shared storage system with a usable capacity of 800TB, and a filesystem peak aggregate read bandwidth of about 16GB/s. Peak client bandwidth can reach 2.5GB/s for reads concurrently with 2.5GB/s for writes.

Raad2 uses SLURM as its workload manager to allocate and manage compute resources.

Hostname   raad2.qatar.tamu.edu 
Batch Scheduler   SLURM 20.02.7
Operating System   SLES 15.3 (Linux)
Compute Nodes   172 (single cabinet)
 Cores & Mem per Node   24 cores, 128GB
Interconnect   Aries Dragonfly
Parallel Filesystem   Lustre (800TB usable)