LBTS-εpsilon (version 2015)
This page describes an old version of the computer (corresponding to the year 2015).
For a description of the current implementation, follow this link
LBTS-εpsilon (version 2015)
A compact single-user supercomputer
31 Tflops (at FP32 precision)
256 GB RAM (max in single motherboard)
12000+ math coprocessors (FPUs)
100% allocated to the research of LBTS
LBTS-εpsilon is our own small supercomputer, which design combines "relatively affordable and small" (hence the name) with "massively parallel supercomputer".
Smallness & affordability produce the key advantage of not having to share the computer with different research groups: its resources are allocated 100% to us. As a consequence, and in spite of being a modest machine when compared with todays' most powerful computers, LBTS-εpsilon gives us forefront numerical calculation capabilities (see comparisons).
The computer has a hybrid architecture composed by CPU processors (x86_64 instruction set, 64 cores distributed in three nodes) plus GPGPU cards configured as mathematical coprocessors (nVidia CUDA, 12288 cores total in three nodes, one FPU per core). The system is optimized for math operations at FP32 precision, in which it delivers 31Tflops max computing speed. (It is able of 2.7Tflops at FP64 precision; for comparison, a typical x86_64 core is able of about 0.01 TflopsFP64, 0.02 TflopsFP32).
Usually, other supercomputers serve hundreds of researchers simultaneously. In these computers, each investigation benefits from only a small fraction of the total computing capabilities. The 31Tflops math performance of LBTS-εpsilon suffices to nominally enter in Spain's top-10 list of supercomputers but, in practice, the main benefit of our setup derives from its single-user character: This is illustrated by the table below, comparing LBTS-εpsilon with some typical supercomputers. In that table, the main figures of interest are the Tflops per user and the maximum RAM memory in a single motherboard. The first determines the speed seen by each computing job. The second determines the size of the largest problem that can be efficiently handled by the computer (i.e., without the performance bottlenecks of node interconnects and/or scratch disk accesses). LBTS-εpsilon is very competitive in both aspects.
|computer||peak computing speed (FP32)||number of math cores (i.e., FPUs)||typical number|
|Tflops per user||max RAM in single board|
|31 Tflops||12288||1||31||256 GB|
(top computer in Spain,
29th worldwide in 2013)
|2034 Tflops||48896||58 ||35||32 GB|
(top computer in Galicia)
|32 Tflops||2528||345 ||0.1||128 GB |
( 1 TB  )
Table valid as of May 2015. Footnotes:
 Number of research projects accepted by the access comitee in february 2015.
 Number of users acording to the 2012 annual report.
 The 1 TB space is accessible only with special permission and from a single node capable of 0.5 Tflops.
LBTS-εpsilon has a mixed CPU+GPU architecture, distributed in only three nodes. The "node BIG" is built upon an HP Proliant DL585G7 server with 48 CPU cores AMD Opteron and 256 GB of main RAM. The "node small 1" and "node small 2" are also based on Proliant servers. We performed custom modifications of the power source subsystem of these computers to be able to accomodate a number of CUDA cards larger than originally specified by the manufacturer; the specific cards (nVidia GTX660Ti or GTX570 depending on the node) were chosen for their advantageous performance-per-watt figures.
The two small secondary nodes are used for the crucial purpose of developing and optimizing the software routines for each research. The optimization for CUDA coprocessors usually involves calling the corresponding specialized versions of most scientific libraries (CUBLAS, MAGMA or HiPLAR, etc.) or compiling the code against CUDA-enabled math libraries.
If some calculation demands it, it is possible to move further GPU cards from the secondary nodes to the larger one, for a total of 27.8Tflops, 11376 cores in the main node.