Stats section computing facilities
Four different types of compute facilities are available to users in the Stats section:
- general purpose Linux compute systems
- a large Hadoop cluster and two small test Hadoop clusters
- large compute servers for special projects
- storage servers
Because many of the various research projects undertaken in this section overlap to some extent and/or use overlapping/complementary computing technologies, these compute facilities are interlinked to varying degrees so that data stored in one research area can be made available to another. Computing demands in Stats can be very reactive with a need to set up new facilities at short notice, so a flexible environment that supports this is required.
All of the compute systems, Hadoop cluster and compute servers run the server edition of Ubuntu Linux and have a wide range of mathematical and statistical software packages installed in addition to the standard Linux applications. The four storage servers run FreeBSD UNIX, using ZFS as the disk storage pool technology.
General purpose compute systems
- Five high performance general purpose compute systems are available to all users in the section:
- fallas 12 CPU cores, 24 GB memory
- festival 8 CPU cores, 32 GB memory
- fiesta 12 CPU cores, 16 GB memory
- fira 8 CPU cores, 32 GB memory
- hustler 12 CPU cores, 24 GB memory
- and a further two systems specifically for MSc project use have recently been added:
- apollo 12 CPU cores, 192 GB memory
- artemis 12 CPU cores, 192 GB memory
Users in the Stats research section can log into these remotely using ssh and run jobs such as R, Matlab, Maple and many other packages as described for the Maths compute cluster but do note that unlike the main compute cluster, the Stats compute systems do not use job scheduling or any kind of user/task control. So you are free to run whatever you wish whenever you wish.
At present, home directories on these systems are by default on the college's ICNFS service but for reasons of performance and available storage space, the department's computing is gradually moving away from this to our own in-house fileservers and all of the Maths general access servers - silos 1-4, calculus, clustor and clustor2 - are mounted on these compute systems and can be used instead of ICNFS. In addition, data from other parts of the Stats compute facilities is available to you on these systems as follows:
- if you have an account on the Stat's modal and medial compute servers, your local home directory on these systems will be available on fallas, festival, etc as follows:
- for modal: /home/modal/username
- for medial: /home/medial/username
- if you have an account on the Stat's Hadoop cluster, your Hadoop HDFS storage can accessed at:
- if you are working with Netflow data and are a member of the netflow group, you'll find the netflow data archive under /home/netflow_2013, /home/netflow_2014, etc where each folder contains the netflow data available for that year.
Bazooka Hadoop cluster
A 16 node Hadoop cluster is available to Stats users - this is often called the Bazooka cluster since the head node which research users log into to use the cluster is known as bazooka.ma. For those interested in numbers(!), this cluster provides 512 processor cores, 1739 GB of memory and 155.48 terabytes of storage. Another node called athena, intended primarily for teaching courses, was added to this cluster in June 2019.
- If you want to use the Bazooka Hadoop cluster, just ask for an account; this will include both a local conventional home directory on bazooka as well as a Hadoop home directory whose storage is distributed throughout the entire cluster using the HDFS filesystem. Your Hadoop HDFS directory is available on all the general purpose compute systems as well as on the modal and medial compute servers (if you have an account on these) - you'll find this under /home/hadoop on all Stats systems. On request, your Hadoop home directory can also be mounted on your own desktop system(s) but for security reasons, these need to be named systems with wired Ethernet network connections and static IP addresses registered in the college's HDB (hosts database).
Mortar and Churchill Hadoop test clusters
There are two other small 4 node Hadoop clusters, with 8 processor cores, 32 GB of memory and 27 TB of storage which are normally used for test and development purposes but these can be made available for general use on request.
Large compute servers for special projects
- Known as modal, model, medial, madul and midal, five compute servers are available each having 64 CPU cores (four 16-core AMD Opteron CPUs), 512 GB of memory; in addition modal and medial have plenty of local disk storage (19 TB on modal, 10 TB on medial) with both local and remote home directories. These servers are used mainly for cyber security-related projects with accounts being set up on request; madul and midal were added in February 2020 and have the latest Ubuntu 18.04, Matlab R2019a and R 3.6.2 versions installed while model, added in March 2020, now has R version 4.0.0. installed.
- Four dedicated fileservers are installed in the Stats section - two have over 10 TB capacity and are named fusion and enkidu; fusion can be used by any Stats user and accounts are set up on request while enkidu is reserved for security-related projects. The other two servers, flowdata3 and an identical mirror server (flowdata3-backup) each with a capacity of 60 TB, contain the Netflow data archive.
- Data on fusion is available on all of the other Stats compute systems under /home/fusion while flowdata3 is connected in a similar way and contains several distinct Netflow archives named netflow_2013, netflow_2016, etc and these can be found under /home/netflow_2013, /home/netflow_2016 and so on on connected compute systems.
- Note that not all datasources are accessible to all users - this is for security reasons with access being configured at both individual user and group levels on a "need to have" basis.
Because of the size of the datasets now being worked with in the Stats section, the network bandwidth - that is, the speed of the interconnections between the Stats systems - is an important issue. With the exception of hustler, all of the Stats compute systems are accommodated in the Maths server room where three dedicated internal networks - one each for modal/medial storage, Hadoop cluster data and Netflow data - have now been installed to enhance both bandwidth and security. In addition to a normal college gigabit (1000 Mbits/second) network connection, each system now has additional connections to each of these three separate gigabit networks to ensure quick data transfer between systems. (hustler is in a staff office on another floor where it is not possible to connect systems to server room networks).
Research Computing Manager,
Department of Mathematics
last updated: 18.05.2020