System Overview -- RAAD2

From TAMUQ Research Computing User Documentation Wiki
Jump to navigation Jump to search

Our flagship system, named "raad2" is a Cray XC40 with 4,128 Intel Xeon Haswell cores. Put into production in early 2017, it has 172 compute nodes with an aggregate peak performance of 120+ TFLOPS(Linpack). The system uses the proprietary Aries dragonfly interconnect. Each compute node contains 2 sockets with 12 physical cores per processor chip -- giving a total of 24 cores per node, along with 128GB of RAM.

The Cray is served by a Lustre (DDN EXAScaler) shared storage system with a usable capacity of 800TB, and a filesystem peak aggregate read bandwidth of about 16GB/s. Peak client bandwidth can reach 2.5GB/s for reads concurrently with 2.5GB/s for writes.

Raad2 uses SLURM as its workload manager to allocate and manage compute resources.

Batch Scheduler   SLURM 20.02.7
Operating System   SLES 15.3 (Linux)
Compute Nodes   172 (single cabinet)
 Cores & Mem per Node   24 cores, 128GB
Interconnect   Aries Dragonfly
Parallel Filesystem   Lustre (800TB usable)