Big Bytes

HPC School 2012

The CHPC invites applications from suitably qualified candidates to attend the above HPC school and enter the competition.  Its purpose is to introduce South African students to fundamental knowledge of high performance computing techniques and select regional teams for the CHPC's Student Cluster Competition in December 2012. 

GCC 4.7 made available on the ICTS HPC Cluster

GCC 4.7 has been compiled and installed on the ICTS HPC cluster. In order to make use of it execute "module add module-gcc47" from the bash prompt on the HPC headnode and it gets enabled with the correct paths to the various libraries.

We made it available just in case students from CS ask "Why ??" we are not bleeding edge enough :-)

Installation was trival and followed the documentation as per a earlier blog post which I wrote up. It can be found here - Compile / Install / Deploy OpenFOAM 2.0

Enjoy !  

Research Portal Project

Top UCT researcher, Ed Rybicki, will be assisting in the creation of a Research Portal for UCT researchers.  The HPC tools and resources will be integrated into this portal and we hope to see the first phase of this project completed in early 2012.  You can read more on the development of the Research Portal here.

Submitting jobs with a environment tool called 'modules'

We've been busy looking for new ways to reduce complexity in our HPC environment. One tool which has been around for many years assists with setting up your environment where multiple versions of the same application exists. The name of the tool is called “modules” - http://sourceforge.net/projects/modules/files/Modules/modules-3.2.8/


Modules allows the user to select a particular version of application where multiples exist. For example, if the system had gcc 4.1.2 installed but you wanted gcc 4.6.1 loaded then you would use modules to setup the 4.6.1 environment.

Let us list the modules which are available on the head node. Executing “ module avail “ at the shell should list the following.

---------------------------------- /opt/exp_soft/Modules/3.2.8/modulefiles -----------------------------------

module-gcc46

 

To activate gcc 4.6.1 ,execute “ module load module-gcc46 “. This will not return any data to the prompt but when you execute “ gcc –version “ you should see a different version of gcc configured. The include paths and library paths should reflect the new location for gcc as well.

An example of a job submission script would look something like the following below.

#PBS -N TestJob

#PBS -m e

#PBS -M e-mail_address

#PBS -l nodes=1:series400:ppn=2

#PBS -V

whoami

gcc --version

module load module-gcc46

gcc –version


Notice the #PBS directive. If you load a module for an application and submit a job without -V set, the module is not present on the compute nodes and your job will most likely fail.

Execute “ module list” will list the current modules which are loaded under your environment. “ module –help “ will list the usage commands.

Google reCaptcha service via caching or proxy servers

Googles Recaptcha service is great to get rid of those web bots which enter false information into web forms. Its great for the elimination of SPAM. In most organizations the use of proxy servers or caching servers are used for either caching content because of a saturated internet link or for authentication and moderation. The “recaptcha” library unfortunately doesn't cater for the use of the proxy servers and therefore we needed to update the library with a bit of PHP proxy information. The error message was “ Could not open socket “. This means that it had a problem connecting to port 80. This can be one of two reasons, either a host based firewall has blocked port 80 or you have a caching server in place running on the standard proxy ports, 3128 or 8080.

Locate the PHP library file “ recaptchalib.php “ and update the “The reCAPTCHA server URL's“ section with the information below. Set the proxy from false to true and enter the proxy server information.

 

define("PROXY_HOST", "");   //define proxy
define("PROXY_PORT", ""); //define port
define("USE_PROXY", false); //set this true if you want to use proxy.

Now that we have told our PHP script where to look for the proxy server we need to tell the HTTP POST to submit via the proxy server to the reCAPTCHA server. Add the code below to the recaptchalib.php file. 

/**
* Submits an HTTP POST to a reCAPTCHA server
* @param string $host
* @param string $path
* @param array $data
* @param int port
* @return array response
*/
function _recaptcha_http_post($host, $path, $data, $port = 80) {

if(USE_PROXY) {
$req = _recaptcha_qsencode ($data);
$http_request = "POST http://" . $host . $path . " HTTP/1.0\r\n";
$http_request .= "Host: $host\r\n";
$http_request .= "Content-Type: application/x-www-form-urlencoded;\r\n";
$http_request .= "Content-Length: " . strlen($req) . "\r\n";
$http_request .= "User-Agent: reCAPTCHA/PHP\r\n";
$http_request .= "\r\n";
$http_request .= $req;

$response = '';

if( false == ( $fs = @fsockopen(PROXY_HOST, PROXY_PORT, $errno, $errstr, 10) ) ) {
echo $errno . $errstr;
die ('Could not open socket to proxy');
}
}
else {

$req = _recaptcha_qsencode ($data);

$http_request = "POST $path HTTP/1.0\r\n";
$http_request .= "Host: $host\r\n";
$http_request .= "Content-Type: application/x-www-form-urlencoded;\r\n";
$http_request .= "Content-Length: " . strlen($req) . "\r\n";
$http_request .= "User-Agent: reCAPTCHA/PHP\r\n";
$http_request .= "\r\n";
$http_request .= $req;

$response = '';
if( false == ( $fs = @fsockopen($host, $port, $errno, $errstr, 10) ) ) {
die ('Could not open socket');
}


}


fwrite($fs, $http_request);

while ( !feof($fs) )
$response .= fgets($fs, 1160); // One TCP-IP packet
fclose($fs);
$response = explode("\r\n\r\n", $response, 3);
//print_r($response);
return $response;
}

 

 

The reCaptcha service should now send HTTP POST requests via your proxy server. 

 


Node decommission

We will be decommissioning our test node hpc300 as it will become part of our Stratus Lab test bed.  More on this project later...

Update on nodes; We've added a new 200 series node, srvslnhpc212 with 8GB RAM.  We've also created a subseries, 200T, for node srvslnhpc209.  The reason behind this is that 209 still only has 4GB of RAM.  This node can still be used, but only implicitly, rather than by automatic series selection.

We will be deploying more 200 series servers later in the month and will be incorporating a RAM upgrade for node 209 into this deployment.

Compile / Install / Deploy OpenFOAM 2.0

Big-Bytes decided to write up a quick tutorial on how to install OpenFOAM 2 in the user space. Administrators and users constantly battle with dependencies especially with the latest software releases. This problem begins when a system administrator installs and configures a Linux system. All is well in the beginning and users are happy but at some point the user will begin to complain about outdated software dependency versions. Some system dependencies can be upgraded rather quickly but others not and a user space compile is imminent. This was in the case of GCC, its linkers and the dependencies required to build GCC.

OpenFoam 2 requires GCC 4.4 and above to compile correctly and on a Scientific Linux 5.4 system only GCC 4.1.2 was available. It is not recommended to upgrade a system-wide compiler such as GCC and its also not trivial to upgrade a linux machine while having to deal with the demands of users.


Compile GCC

Create a directory $HOME/contrib and $HOME/contrib/build. You can of course name these directories anyway you like.
Download all the dependencies for GCC. These are listed below and extract into contrib.

Begin to compile the dependencies first before GCC. You may need to compile the GMP dependency first as the other dependencies depend on it. 

Dependency Compile and Install

    GMP:
  • Extract with tar xfvz gmp.tar.gz
  • Change into the directory and compile with ./configure --prefix=$HOME/contrib/gmp-complete
  • make && make install

     MPFR:
  • Extract with tar xfvz mpfr.tar.gz
  • Change into the directory and compile with ./configure --prefix=$HOME/contrib/mpfr-complete --with-gmp-build=<directory in which the source is in, the extracted directory and not the compiled directory>
  • make && make install

    MPC:
  • Extract with tar xfvz mpc.tar.gz
  • Change into the directory and compile with ./configure --prefix=$HOME/contrib/mpc-complete --with-gmp=$HOME/contrib/gmp-complete --with-mpfr=$HOME/contrib/mpfr-complete
  • make && make install

   

     BinUtils:

  • Extract with bunzip binutils && tar xfv binutils.tar
  • Change into the directory and compile with ./configure --prefix=$HOME/contrib/binutils-complete
  • make && make install
  • make target_header_dir=$HOME/binutils-complete/include -C libiberty


    Library and Path location updates

Now that we have completed all the dependency requirements its important to update the library paths and system paths. Add the following below to $HOME/.bashrc ( User specific functions).

LD_LIBRARY_PATH=$HOME/contrib/mpc-complete/lib:$HOME/contrib/mpfr-complete/lib:$HOME/contrib/gmp-complete/lib:$HOME/contrib/gcc-complete/lib/:$HOME/contrib/gcc-complete/lib64:$LD_LIBRARY_PATH

COMPILE_OF=$HOME/contrib/gcc-complete/bin:$HOME/contrib/gcc-complete/lib:$HOME/contrib/binutils-complete/bin
PATH=$COMPILE_OF:$PATH
CPATH=$HOME/contrib/gcc-complete/include


Logout and login to refresh the environment. Its important to execute the " which ld " and " which gcc ". This will ensure that you have your environment setup correctly. To confirm that you have the correct version of GCC installed, execute " gcc --version "


       Compile latest version of GCC

  • Download the latest version of GCC from a mirror site into $HOME/contrib and extract there.
  • Change directory into the $HOME/contrib/build. Notice that we are NOT executing the "./configure" script from within the extracted directory but rather from a build directory. Its extremely important to adhere to this rule.
  • cd $HOME/contrib/build
  • ../gcc-4.6.1/configure --prefix=$HOME/contrib/gcc-complete/ --enable-languages=c,c++ --with-gmp=$HOME/contrib/gmp-complete --with-mpc=$HOME/contrib/mpc-complete/ --with-mpfr=$HOME/contrib/mpfr-complete/  ( NB: notice the importance of the two periods before the configure script )
  • make -j 4 ( This assumes that you would like the compile of GCC to consume 4 CPU processors.
  • make install
  • make clean ( This will erase all the intermediate files )
  • $HOME/contrib/gcc-complete/bin/gcc –version should give show the correct version.


 

Install and Compile OpenFOAM 2

You will need to do this as your normal user, not root, in order to maintain access to gcc 4.6.  Compile in $HOME as per default and copy the compiled package to the shared software area later.
  • Create directory $HOME/OpenFOAM
  • Download the OpenFOAM2 and ThirdParty compression files from the OpenFOAM website
  • Extract both into the $HOME/OpenFOAM directory with the " tar xfvz <filename>" command
  • Change directory into  $HOME/OpenFOAM/OpenFOAM-2.0.1/etc
  • Open up the " bashrc " file and locate the " foamInstall=$HOME " variable. If you are installing OpenFOAM into a alternate location you may update the location here. 
  • Save the file and exit
  • Add the following to $HOME/.bashrc - "source  $HOME/OpenFOAM/OpenFOAM-2.0.1/etc/bashrc"
  • Set the following environment variable to get OpenFOAM to utilize the maximum number of CPU processors on the system. " export WM_NCOMPPROCS=4 "
  • Change directory into $HOME/OpenFOAM/ThirdParty-2.0.1/
  • Execute the file " ./Allwmake "
  • When completed successfully execute "./$HOME/OpenFOAM/OpenFOAM-2.0.1/etc/bashrc". This will refresh the shell environment.
  • Change directory into $HOME/OpenFOAM/OpenFOAM-2.0.1/ and execute " ./Allwmake "
If all goes well you should have a fully compiled version of OpenFOAM. Tutorials are available on the OpenFOAM website.

 

SAGrid LFC Service Reinstated

After a disk crash yesterday the LFC service has been reinstated.

GPGPU Training

Interested in GPGPU? The CHPC are holding a workshop - 19 July 2011 8:30 AM - which will focus on OpenCL. Lunch is included. 

http://www.meetup.com/GPGPU-ZA/events/23227161/

First Light

OpenMPI is now configured on the new cluster.  There was an issue with the installation, in that the package was pre-configured to expect Infiniband which we do not have (yet).  However after several hours spent battling with it we found the configuration parameter to bypass this requirement and MPI jobs are now running.

Our newest HPC user is currently submitting jobs on the cluster using MPI compiled C code.  Things seem to be running smoothly and we'll continue to monitor the job progress over the weekend.  While we've been running live user jobs for the last two months this is actually a major step for us as it represents a maturation in our ability to provision an independent cluster from the ground up, with user and software support in under 48 hours.

We are also anticipating increasing the CPU count by 8 early next week with the addition of two extra servers.  We will use the new kit in a proof of concept arrangement to test partitioning the cluster to segregate resources for specific user groups.

OpenMPI

Installed openmpi 1.4-4 as well as the MPI development suite on the test cluter.

Configured host based authentication to obviate the ssh password authentication.

Compiled a test case program.  Seeing a 50% speed improvement when splitting the workload over 3 worker nodes (all communication simulated ethernet in a virtual environment).

In our current model the executable does not need to be copied to the worker nodes as it appears in the NFS mounted file system.  The user only needs to execute mpirun with the correct parameters.

Work to do:

 - test with torque\pbs and qsub directives in order to reserve worker nodes and CPUs appropriately.

 - develop and practice data sharing methodologies.

Multi threaded programming and SAGrid CRL error

Investigated mixed cluster options with multiple queues for Torque.

Brief test of OMP - successful.

Started investigating MPI.

Fixed an issue with SAGrid:  The WMS had lost access to the internet due to a proxy change hence CRL downloads were failing.  This resulted in an end point failure when users attempted to create proxies. This has now been fixed and CRLs are once again being downloaded.  GLite services have been restarted and users can now delegate proxies.

Parallel code, benefits and pit-falls

Most high end platforms for high performance computing are equipped with multi-core CPUs.  In order to fully utilize the CPUs multiple jobs must be run on each platform or the code must be changed to utilize multiple CPUs.  There are several methods used to take advantage of multiple CPUs; OpenMP, MPI, MPICH etc.  The simpler approaches utilize one server and all its CPUs in a shared memory model, the more complex approach is to split the code accross several servers with a master process handling communication between the shared memories and aggragating the results.  Either way, well written code split accross multiple CPUs can generally increase job efficiency.

There are obviously several caveats; some code cannot be 'parallelized' due to the nature of the algorithm, the code should be correctly optimized, disk IO should be reduced and in the shared memory model network latency can become a significant delaying factor.

Below is a graph of job completion times, where a lower (faster) result is better.  The first bar is the time for the job to complete using only one processor.  This is a simple array calculation compiled in C++ running on a BL460 blade with dual quad cores.  The single CPU iterative job completes in 20 seconds.  Next the code is compiled with the omp.h library allowing it to parallelize the array calculation loops.  Unexpectedly the time to complete is longer than the iterative job.  This is because the job was only allowed to run on one core.  The overhead of the omp library managing multi-threading in the core is what caused the increase in run-time.
MPI

By increasing the number cores on which the job is allowed to run we see an immediate increase in speed and reduction of job time.  This is unfortunately not a linear improvement due to communication latency, in this case in the processor cache.  OMP allows more threads to run than there are physical cores which is fine for the purpose of testing.  Additionally one can run more than one multi-threaded job per server.  These practices however should be avoided as they cause processor contention as the tasks are switched in and out of CPU context.  This behaviour is clearly seen in the last two job runs.

C vs Fortran

Surprisingly, given the proliferation of C code, Fortran out-performs C in many areas.  Fortran allows better numerical array manipulation, provides a rich set of highly optimized precision numeric functions making it more predictable and faster than C and also provides extremely efficient IO functionality.

However something to keep in mind when writing Fortran applications or porting code is "row versus column order".  Fortran and C differ in their methods of storing arrays in linear memory.  Fortran uses column-major order (as does Matlab) while C uses row-major order.  While there is no intrinsic benefit in either approach, a lack of undertsanding of row versus column ordering can lead to speed degredation.  This is because the elements of the array that are being traversed in RAM are not contiguous when using the incorrect method and for very large arrays the data may not be cached.  This is especially true for large higher dimension arrays.

As a simple example consider the 2 dimensional array:

Array

Fortran would store the array in memory as follows:

While C would store the array as follows:

A programmer should take care when formulating "for loops" to ensure that the array traversal variables i and j are ordered correctly. For example the C code:

    for (i=0; i<MAXi; i++)
        {
        for (j=0; j<MAXj; j++)
            {
                [arithmetic caculation on  array[i][j]]          
            }
        }

...is optimal as the primary array elements are addressed in "row major order" by the outer loop.

The graph below shows what happens when identical array computations are performed using the incorrect array adressing scheme (a lower number is better).  The blue graph shows the time taken to complete a programm compiled in Fortran.  Here column major addressing is clearly faster than row major.  The red graph shows time taken to run the same code compiled in C.  Here it is clear that the inverse is true, the column major scheme takes slightly longer to run than row major.

C vs Fortran

Additionally, it can be seen that the Fortran code is significantly and consistently faster than C.  These tests were performed repeatedly and an average taken to avoid inconsistencies caused by data caching.  Additionally the tests were run using the OMP library to make use of multiple processors.  The findings were consistent and independent of the number of cores used.  The ICTS cluster provides both GNU Fortran and C compilers.

Catania application porting

The ICTS HPC team spent a month in Catania at the Institute for Nuclear Physics, working as part of a South African scientific application porting team.  Once again the trip was supported by EPIKH and was extremely successful.

After the Africa-2 application course in Johannesburg earlier in the year a number applications were put forward by South African scientists for conversion to Grid format.  The South African team consisted of the ICTS specialists, Andrew Lewis and Timothy Carr, as well as Albert van Eck from the University of the Free State.  The team was headed up by Dr Bruce Becker of the Meraka Institute.  By working at the INFN the team had direct access to Grid specialists in porting, software and gLite middleware.

The conversion course was extremely successful and saw a number of applications being converted and deployed to South African grid sites as packaged RPM modules.  Some progress was also made in understanding how MPI is used in the Grid environment.  Unfortunately during the course the SEACOM link was interupted, however we still had access to the GILDA training laboratory.

NAMD ebola

Portion of the Ebola virus rendered with NAMD

Once again the team took the opportunity to do some site seeing and take in as much of the local culture and history as time allowed.  We visited Taormina in the North of Sicily, as well as Castelmola.  The highlight was a weekend trip to Italy to visit Sorrento and the ancient Roman town of Pompeii, an unforgetable experience.

Pompeii

Pompeii, with Vesuvius in the background

We would like to thank the INFN team, Andrea Cort, Fabrizio Pistgna, Emidio Giorgio, Valeria Ardezonne and Dr Roberto Barbera for their hospitality and enthusiastic support.


Clockwise from left: Timothy Carr, Andrew Lewis, Valeria Ardizonne, Albert van Eck, Dr Roberto Barbera.

We would also like to thank Sakkie van Rensburg, Andre le Roux and Eugene van Rooyen for making this opportunity available to us.

1 2  Next»