• Global site
  • Actividades locais



MainLBTSEpsilon-version2014


LBTS-εpsilon (version 2014)

This page describes an old version of the computer (corresponding to the year 2014).
For a description of the current implementation,
follow this link




LBTS-εpsilon (version 2014)

A compact single-user supercomputer

20 Tflops (at FP32 precision)
256 GB RAM (max in single motherboard)
9460 math coprocessors
100% allocated to the research of LBTS

LBTS-εpsilon is our own small supercomputer, which design combines "relatively affordable and small" (hence the name) with "massively parallel supercomputer".

LBTS-Epsilon
20 Tflops, 9460 math cores CPU+GPU,
256 GB RAM in single board
100% allocated to the research of LBTS

Smallness & affordability produce the key advantage of not having to share the computer with different research groups: its resources are allocated 100% to us. As a consequence, and in spite of being a modest machine when compared with todays' most powerful computers, LBTS-εpsilon gives us forefront numerical calculation capabilities:

Usually, other supercomputers serve hundreds of researchers simultaneously. In these computers, each investigation benefits from only a very small fraction of the total computing capabilities. LBTS-εpsilon has a peak computing speed of 20 Tflops. While this is already well within the top-10 list in Spain, the main benefit of LBTS-εpsilon derives from its single-user design: This is illustrated by the table below, comparing LBTS-εpsilon with some typical supercomputers. In that table, the main figures of interest are the Tflops per user and the maximum RAM memory in a single motherboard: The first determines the speed seen by each computing job, and the second the size of the largest problem that can be efficiently handled by the computer (i.e., without the performance bottlenecks of node interconnects and/or scratch disk accesses). LBTS-εpsilon is competitive in both aspects.

Tflops = 1012 math operations per second
(at 32-bit floating-point precision, all through this webpage)
A typical x86_64 core is able of about 0.02 Tflops

computerpeak computing speednumber of math corestypical number
of users
Tflops per usermax RAM in single board

LBTS-εpsilon

20 Tflops9460120256 GB
BSC-MareNostrum
(top computer in Spain,
29th worldwide in 2013)
2034 Tflops4889659 [1]3432 GB
CESGA-Finisterrae
(top computer in Galicia)
32 Tflops2528345 [2]0.1128 GB
( 1 TB [3] )

Table valid as of October 2013. Footnotes:
[1] Number of research projects accepted by the access comitee in june 2013.
[2] Number of users acording to the 2012 annual report.
[3] The 1 TB space is accessible only with special permission and from a single node capable of 0.5 Tflops.


Architecture

LBTS-εpsilon has a mixed CPU+GPU architecture, distributed in only two nodes. The main node is built upon an HP Proliant DL585G7 server with 48 CPU cores AMD Opteron and 256 GB of main RAM, to which we connected 5 GPU cards that we configured as math processors. The cards are Nvidia GTX660Ti, having 1344 CUDA processors each. We have chosen them for their advantageous performance-per-watt figures (for instance, at first we tested the more powerful GTX570 cards, but because the host server could safely power only three of them the final aggregated speed would be far below the current setup).

A smaller secondary node is used mainly for the crucial purpose of developing and optimizing the software routines for each research. The optimization for CUDA coprocessors usually involves calling the corresponding specialized versions of most scientific libraries, such as CUBLAS, MAGMA or HiPLAR.

If some calculation demands it, it is possible to move the GPU cards from the secondary node to the large one, for a total of 7 GPU cards working in the main node.

architecture LBTS-Epsilon



Printer-friendly version - Contact webmaster - Website admin - Page admin
Page last modified on December 16, 2015, at 01:35 PM.