• International
  • Sitio local
  • Learning

MainLBTSEpsilon


This page describes the current version of the computer. For a description of earlier implementations, follow this link

LBTS-εpsilon

A compact single-user supercomputer

31 Tflops (at FP32 precision)
256 GB RAM (max in single motherboard)
12000+ math coprocessors (FPUs)
100% allocated to the research of LBTS


LBTS-Epsilon

General description


LBTS-εpsilon is our own small supercomputer, which design combines "relatively affordable and small" (hence the name) with "massively parallel supercomputer".


Smallness & affordability produce the key advantage of not having to share the computer with different research groups: its resources are allocated 100% to us. As a consequence, and in spite of being a modest machine when compared with todays' most powerful computers, LBTS-εpsilon gives us forefront numerical calculation capabilities (see comparisons).


architecture LBTS-Epsilon

The computer has a hybrid architecture composed by CPU processors (x86_64 instruction set, 64 cores distributed in three nodes) plus GPGPU cards configured as mathematical coprocessors (nVidia CUDA, 12288 cores total in three nodes, one FPU per core). The system is optimized for math operations at FP32 precision, in which it delivers 31Tflops max computing speed. (It is able of 2.7Tflops at FP64 precision; for comparison, a typical x86_64 core is able of about 0.01 TflopsFP64, 0.02 TflopsFP32).


Comparisons

Usually, other supercomputers serve hundreds of researchers simultaneously. In these computers, each investigation benefits from only a small fraction of the total computing capabilities. The 31Tflops math performance of LBTS-εpsilon suffices to nominally enter in Spain's top-10 list of supercomputers but, in practice, the main benefit of our setup derives from its single-user character: This is illustrated by the table below, comparing LBTS-εpsilon with some typical supercomputers. In that table, the main figures of interest are the Tflops per user and the maximum RAM memory in a single motherboard. The first determines the speed seen by each computing job. The second determines the size of the largest problem that can be efficiently handled by the computer (i.e., without the performance bottlenecks of node interconnects and/or scratch disk accesses). LBTS-εpsilon is very competitive in both aspects.

computerpeak computing speed (FP32)number of math cores (i.e., FPUs)typical number
of users
Tflops per usermax RAM in single board

LBTS-εpsilon

31 Tflops12288131256 GB
BSC-MareNostrum
(top computer in Spain,
29th worldwide in 2013)
2034 Tflops4889658 [1]3532 GB
CESGA-Finisterrae
(top computer in Galicia)
32 Tflops2528345 [2]0.1128 GB
( 1 TB [3] )

Table valid as of May 2015. Footnotes:
[1] Number of research projects accepted by the access comitee in february 2015.
[2] Number of users acording to the 2012 annual report.
[3] The 1 TB space is accessible only with special permission and from a single node capable of 0.5 Tflops.



Additional info

LBTS-εpsilon has a mixed CPU+GPU architecture, distributed in only three nodes. The "node BIG" is built upon an HP Proliant DL585G7 server with 48 CPU cores AMD Opteron and 256 GB of main RAM. The "node small 1" and "node small 2" are also based on Proliant servers. We performed custom modifications of the power source subsystem of these computers to be able to accomodate a number of CUDA cards larger than originally specified by the manufacturer; the specific cards (nVidia GTX660Ti or GTX570 depending on the node) were chosen for their advantageous performance-per-watt figures.

The two small secondary nodes are used for the crucial purpose of developing and optimizing the software routines for each research. The optimization for CUDA coprocessors usually involves calling the corresponding specialized versions of most scientific libraries (CUBLAS, MAGMA or HiPLAR, etc.) or compiling the code against CUDA-enabled math libraries.

If some calculation demands it, it is possible to move further GPU cards from the secondary nodes to the larger one, for a total of 27.8Tflops, 11376 cores in the main node.




The LBTS laboratory is a partner of the ENERMAT transnational network


Printer-friendly version - Contact webmaster - Website admin - Page admin
Page last modified on December 16, 2015, at 01:36 PM.