New servers
Our VMware infrastructure is being upgraded so we got some hand-me-downs. 2 x BL460 servers with 8 cores and 32GB of RAM each. These have been added to the 400M series. This brings our HPC cluster up to just over 150 cores online.
$year++;
Research Portal Project
Top UCT researcher, Ed Rybicki, will be assisting in the creation of a Research Portal for UCT researchers. The HPC tools and resources will be integrated into this portal and we hope to see the first phase of this project completed in early 2012. You can read more on the development of the Research Portal here.
Submitting jobs with a environment tool called 'modules'
We've been busy looking for new ways to reduce complexity in our HPC environment. One tool which has been around for many years assists with setting up your environment where multiple versions of the same application exists. The name of the tool is called “modules” - http://sourceforge.net/projects/modules/files/Modules/modules-3.2.8/
Modules allows the user to select a particular version of application where multiples exist. For example, if the system had gcc 4.1.2 installed but you wanted gcc 4.6.1 loaded then you would use modules to setup the 4.6.1 environment.
Let us list the modules which are available on the head node. Executing “ module avail “ at the shell should list the following.
---------------------------------- /opt/exp_soft/Modules/3.2.8/modulefiles -----------------------------------
module-gcc46
To activate gcc 4.6.1 ,execute “ module add module-gcc46 “. This will not return any data to the prompt but when you execute “ gcc –version “ you should see a different version of gcc configured. The include paths and library paths should reflect the new location for gcc as well.
An example of a job submission script would look something like the following below.
#PBS -N TestJob
#PBS -m e
#PBS -M e-mail_address
#PBS -l nodes=1:series400:ppn=2
#PBS -V
whoami
gcc --version
module add module-gcc46
gcc –version
Notice the #PBS directive. If you load a module for an application and submit a job without -V set, the module is not present on the compute nodes and your job will most likely fail.
Execute “ module list” will list the current modules which are loaded under your environment. “ module –help “ will list the usage commands.
Mail notifications for HPC jobs
We have introduced mail notifications within PBS to allow our users to be notified when a job begins, aborts and completes. The following additional directives will need to be set at the top of your job shell submission script.
#PBS -m bae
#PBS -M e-mail@domain.com
The first directive "-m" indicates the notification options, (b) begins, (a) aborts, (e) completion or end. You can either have all the options set or a single option. So if you just wanted to be notified about jobs completed, specify the "e" option within the directive.
#PBS -m e
The second directive is " -M " and this implies that a e-mail address be specified to send the notifications to.
Depending on the number of jobs being submitted we would strongly suggest creating a separate mail folder in which your notifications could be filtered into. Nobody likes a clunky inbox :-)
Happy qsub'ing
Keeping up appearances
One of Scientific Linux's stronger points is its stability. It achieves this however by lagging considerably in its suite of default packages, even those provided by the EPEL repository. This is all well and good until some enthusiastic coven of programmers decide that their creation cannot exist without calling upon the most bleeding edge collection of dependencies.
With this in mind we've provided a more modern version of the gcc compiler (4.6.1) in order to obviate the need for that awkward #include <iron-age.h> library call. However it still requires some paper-clips and string to hold it all together so you'll need the following in your .bashrc file:
LD_LIBRARY_PATH=/opt/exp_soft/mpc-0.9/lib:/opt/exp_soft/mpfr-3.0.1/lib:/opt/exp_soft/gmp-5.0.2/lib:/opt/exp_soft/gcc-4.6.1/lib/:/opt/exp_soft/gcc-4.6.1/lib64:$LD_LIBRARY_PATH
COMPILE_OF=/opt/exp_soft/gcc-4.6.1/bin:/opt/exp_soft/gcc-4.6.1/lib:/opt/exp_soft/binutils-2.21.1/bin
PATH=$COMPILE_OF:$PATH
CPATH=/opt/exp_soft/gcc-4.6.1/include
And yes, we're aware that 4.6.2 was released (or escaped) less than a month ago.
At some stage we may explore other OS's to support our HPC infrastructure, however this all depends on how our investigation into cloud based provisioning pans out. As is usually the case there's a balance between swings and round-abouts, and providing HPC completely independent of OS and architecture is a noble but elusive pursuit.
Google reCaptcha service via caching or proxy servers
Googles Recaptcha service is great to get rid of those web bots which enter false information into web forms. Its great for the elimination of SPAM. In most organizations the use of proxy servers or caching servers are used for either caching content because of a saturated internet link or for authentication and moderation. The “recaptcha” library unfortunately doesn't cater for the use of the proxy servers and therefore we needed to update the library with a bit of PHP proxy information. The error message was “ Could not open socket “. This means that it had a problem connecting to port 80. This can be one of two reasons, either a host based firewall has blocked port 80 or you have a caching server in place running on the standard proxy ports, 3128 or 8080.
Locate the PHP library file “ recaptchalib.php “ and update the “The reCAPTCHA server URL's“ section with the information below. Set the proxy from false to true and enter the proxy server information.
define("PROXY_HOST", ""); //define proxy
define("PROXY_PORT", ""); //define port
define("USE_PROXY", false); //set this true if you want to use proxy.
Now that we have told our PHP script where to look for the proxy server we need to tell the HTTP POST to submit via the proxy server to the reCAPTCHA server. Add the code below to the recaptchalib.php file.
/**
* Submits an HTTP POST to a reCAPTCHA server
* @param string $host
* @param string $path
* @param array $data
* @param int port
* @return array response
*/
function _recaptcha_http_post($host, $path, $data, $port = 80) {
if(USE_PROXY) {
$req = _recaptcha_qsencode ($data);
$http_request = "POST http://" . $host . $path . " HTTP/1.0\r\n";
$http_request .= "Host: $host\r\n";
$http_request .= "Content-Type: application/x-www-form-urlencoded;\r\n";
$http_request .= "Content-Length: " . strlen($req) . "\r\n";
$http_request .= "User-Agent: reCAPTCHA/PHP\r\n";
$http_request .= "\r\n";
$http_request .= $req;
$response = '';
if( false == ( $fs = @fsockopen(PROXY_HOST, PROXY_PORT, $errno, $errstr, 10) ) ) {
echo $errno . $errstr;
die ('Could not open socket to proxy');
}
}
else {
$req = _recaptcha_qsencode ($data);
$http_request = "POST $path HTTP/1.0\r\n";
$http_request .= "Host: $host\r\n";
$http_request .= "Content-Type: application/x-www-form-urlencoded;\r\n";
$http_request .= "Content-Length: " . strlen($req) . "\r\n";
$http_request .= "User-Agent: reCAPTCHA/PHP\r\n";
$http_request .= "\r\n";
$http_request .= $req;
$response = '';
if( false == ( $fs = @fsockopen($host, $port, $errno, $errstr, 10) ) ) {
die ('Could not open socket');
}
}
fwrite($fs, $http_request);
while ( !feof($fs) )
$response .= fgets($fs, 1160); // One TCP-IP packet
fclose($fs);
$response = explode("\r\n\r\n", $response, 3);
//print_r($response);
return $response;
}
The reCaptcha service should now send HTTP POST requests via your proxy server.
New worker nodes
We've added two new nodes, 213 and 214, to the 200 series. Both nodes have 8GB of RAM. This brings our cluster up to 136 cores.
RAM upgrade and more worker nodes
The RAM of worker node 209 has been upgraded from 4 to 8GB and the node moved back to the 200 series. We will be deploying a few more 200 series and at least 1 more 400M series worker nodes in the near future.
Gluster wobble
So we've been making use of GlusterFS for a while now and generally it's been great. Gluster allows us to present unused space on a number of networked servers (our HPC worker nodes) as a single disk pool and make this available to researchers. We currently have two scratch areas, each one made up from a respective group of worker nodes; the 200 series join together to make scratch01 and the 400 series join together to make scratch02.
We learned a while back that it's improtant to get disk striping correct. When scratch02 was added the striping was set to 1, which meant that files were written to individual worker nodes causing the file systems to fill up consecutively (especially when large files were written) rather than all the worker node disks filling up simultaneously but far more slowly. Additionally in future iterations of HPC clusters we'll ensure that free space and OS areas are partitioned to avoid contention for critical file space.
Our more recent issue was a bit more esoteric. Gluster is an abstratcion of disk space, but there is another layer of abstraction hidden from the users known as peering, where the gluster daemons on the worker nodes communicate amongst themselves to advertise availability and resources. The peering of our 200 and 400 series are intermingled, which means that the 200 and 400 series nodes are aware of each other at a peering level, even though they never interact. This theoretically is not a problem. However as in all things in life theory and reality can diverge and there are a number of learning points we'll be taking with us when we start working on the next iteration of our HPC cluster.
That being said, scratch02 is available again.
The error can be seen below and its location is /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer
Since the 200 / 400 series peers are all considered as one gluster resource this makes it difficult to just restart volumes which are consumed from the 400 series bricks only. The solution was to unmount all volumes and restart the gluster daemon on all peers. After the restart the gluster volumes mounted successfully. We've since implemented daemon monitoring on all cluster nodes.
Disk space increase
Added an NFS mount to the Computational Biology JBOD. This will add 7+TB of disk space which should suffice until the ICTS cheap storage node arrives.
100kH
Yesterday we finally surpassed the 100000 hour mark. The total number of computational hours is far higher (almost 13 years of computational work) however the 100kH takes into account only genuine research work after test and development jobs have been stripped away.
ZA-UCT-ICTS status
Our Grid cluster currently has an issue with gLite services. While our overall infrastructure is fine (SAGrid site advertising and WMS are working properly) no jobs will be able to run on UCT's Grid cluster until this is resolved. We have logged a call via GGUS on the Africa ROC.
Jobs are still running on the HPC cluster as this cluster is not part of the Grid infrastructure.
SANREN fault
We have received the following communication regarding the SANREN link. During this time access to Grid resources may be degraded.
Hi All,
We have received further communications regarding the fault that we are currently experiencing.
The fault has, as stated, been confirmed to be on the TE-NORTH segment of the cable between Egypt and France, and based on what SEACOM has just sent us, this is confirmed as a problem on the undersea portion of the cable that will require physical work.
The expected time of repair is 5 to 10 days, as the boats require permits to sale which will take up to 5 days, and then there is an approximate week time frame after the granting of permits to actually sail, get to the problem point and repair.
As a result of this, during this period we will continue to utilize the DR bandwidth solution. TENET has contracted for 2gigabit of DR bandwidth, and while we acknowledge that this is probably going to be slightly saturated during this period, we believe that it should be enough capacity to keep the network running with only a small degradation of service.
In order to mitigate congestion on the network during this period, TENET has changed the bandwidth limiting device to acknowledge that it only has 2 gigabit of bandwidth available. This means that the burst ratios we apply on campuses will only apply if the bandwidth is available within the 2gigabit as a result:
If a site has 100mbit bandwidth limit, with a 10% burst level (as we apply in default situations):
The bandwidth controller will attempt to give the site their 100mbit per second as a priority. If the link is running below 2gig (I.E late at night) the site will receive their 10% burst and be able to run at 110mbit. If the link is running heavily utilized, the 10% burst will not be available.
I have included SEACOM’s latest communication at the bottom of the email.
Many Thanks
Andrew Alston
TENET – Chief Technology Officer
SEACOM Communication:
As previously advised SEACOM's Mediterranean partner network is experiencing a service affecting outage between Abu Talat and Marseilles. This is currently impacting all of your services to Europe.
We have now received an indication of the ETR for the fault: permitting is expected to be completed within one week and the repairs are then expected to take 5 days. The repair vessel has been notified of the callout and mobilization will occur once permits are received.
SEACOM is continuing to work on additional restoration routes. As options become available your account manager will be in contact to discuss. Should you have specific requirements or wish to discuss options please contact your account manager.
We will continue to keep you informed on progress of the repair.