Autodock
We've installed Autodock4.2 on the cluster. It's a popular molecular modeling simulation package and is one of the most cited docking packages in the research community. We've installed both the 32 and 64 bit versions.

We've installed Autodock4.2 on the cluster. It's a popular molecular modeling simulation package and is one of the most cited docking packages in the research community. We've installed both the 32 and 64 bit versions.

We've recently taken on some new users and a new package. Researchers in UCT's Blast Impact and Survivability Unit will be making use of ls-dyna, a strongly coupled multi-physics solver, on ICTS's HPC cluster.
Our first user is a 2nd year Master student doing a thesis in the field of structural impact loading. Our second user is a PhD student investigating the effects of the different levels of confinement of a blast to the response of a deformable plates.
We've upgraded our Crux tandem mass spectrometry analysis software from version 1.35 to 1.36 to take advantage of new functionality.
Big-Bytes decided to write up a quick tutorial on how to install OpenFOAM 2 in the user space. Administrators and users constantly battle with dependencies especially with the latest software releases. This problem begins when a system administrator installs and configures a Linux system. All is well in the beginning and users are happy but at some point the user will begin to complain about outdated software dependency versions. Some system dependencies can be upgraded rather quickly but others not and a user space compile is imminent. This was in the case of GCC, its linkers and the dependencies required to build GCC.
OpenFoam 2 requires GCC 4.4 and above to compile correctly and on a Scientific Linux 5.4 system only GCC 4.1.2 was available. It is not recommended to upgrade a system-wide compiler such as GCC and its also not trivial to upgrade a linux machine while having to deal with the demands of users.
Compile GCC
Create a directory $HOME/contrib and $HOME/contrib/build. You can of course name these directories anyway you like.
Download all the dependencies for GCC. These are listed below and extract into contrib.
BinUtils:
Library and Path location updates
Now
that we have completed all the dependency requirements its important to
update the library paths and system paths. Add the following below to
$HOME/.bashrc ( User specific functions).
LD_LIBRARY_PATH=$HOME/contrib/mpc-complete/lib:$HOME/contrib/mpfr-complete/lib:$HOME/contrib/gmp-complete/lib:$HOME/contrib/gcc-complete/lib/:$HOME/contrib/gcc-complete/lib64:$LD_LIBRARY_PATH
COMPILE_OF=$HOME/contrib/gcc-complete/bin:$HOME/contrib/gcc-complete/lib:$HOME/contrib/binutils-complete/bin
PATH=$COMPILE_OF:$PATH
CPATH=$HOME/contrib/gcc-complete/include
Logout
and login to refresh the environment. Its important to execute the "
which ld " and " which gcc ". This will ensure that you have your
environment setup correctly. To confirm that you have the correct
version of GCC installed, execute " gcc --version "
Compile latest version of GCC
From the ABySS web site: ABySS (Assembly By Short Sequences) is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
We decided to document the installation of ABySS 1.2.7
First, you will need to download a dependency called " Sparsehash ". The download can be found here.
Google Sparsehash Installation
If you have administrative permissions then follow this section
- ./configure
- make
- make install
NB: This install the sparse hash into /usr/local/include
For non administrative users
- ./configure --prefix=$HOME/user
- make
- make install
ABySS Installation
- ./configure --prefix=/opt/exp_soft/sagrid/abyss-1.2.7 --with-mpi=/usr/lib64/openmpi CPPFLAGS=-I/opt/exp_soft/sagrid/abyss-1.2.7/google-sparsehash-1.11/include
The CPPFLAGS entry is used to reference the location of the dependency installed earlier.
NB: Ensure that the following is correct before continuation. The config.log file generated during the compile will indicate if the CPPFLAGS entry was successfully used.
checking google/sparse_hash_map usability... yes
checking google/sparse_hash_map presence... yes
checking for google/sparse_hash_map... yes
- make
If you are running a earlier version of GCC you would see an error to the effect of " "cc1plus: warnings being treated as errors
DistanceEst.cpp:392: warning: ignoring #pragma omp task "
Update your make command with :
- make AM_CXXFLAGS=-Wall
- make install
Installation is now complete and access path to the binary is available at - /opt/exp_soft/sagrid/abyss-1.2.7 - or whichever location you have made available in your --prefix=.
Today the combined work effort of our clusters reached 10 years worth of processing time, that is the equivalent of one processor working continuously for 10 years. We only started taking on serious workloads in April this year so while this is not a huge milestone compared to other well established institutes it's a gratifying achievement given our relative short term exposure to HPC.
The graphs below are the total number of jobs submitted to our two clusters. Many of the jobs submitted to the grid cluster are probe jobs from international projects checking on our system status, so while there have been more jobs on this cluster the predominant amount of work has been done on our HPC cluster.
HPC cluster:

Grid cluster:
Given our new application porting strategy for grid based projects we are hoping to see a significant increase in jobs submitted to our grid cluster in the next few months.
HPV infection is causally linked to cervical and other genital cancers. The human papillomavirus (HPV) research group at IIDMM, UCT are using next-generation sequencing (NGS) to explore the diversity of this virus in HIV infected individuals. NGS is a powerful new technology that allows us to examine viral diversity directly in clinical specimens. This technology is transforming the world of genomics, but many biologists are struggling to keep up with the analysis of all the data produced; both in terms of having the required computing power and the technical know-how. Thanks to the HPC team at ICTS who have set-up several programs for assembling and analysing NGS data. We look forward to exciting results.
Dr. Tracy L. Meiring
Over the past week we became aware of an issue with Matlab and its desire to consume as many CPUs as possible. This despite the fact that we were no longer running the parallel library. This thread outlines the problem we were experiencing and this thread gives the solution, namely calling Matlab with the -singleCompThread argument. In the process list we still noticed 2 threads but one was set to 'Sleep' while only one consumed actual CPU time. The designated CPU seemed to be static, in other words the thread did not jump from one CPU to another and the consumed resource via qstat and SNMP was consistent. Obviously from a user's perspective it is desirable to consume as many CPUs as possible, however a shared environment where the assignemnt of resources is removed from the scheduler will invariably lead to resource contention and overall performance degredation.
We recently ported Velvet, a short sequence assembler, to our cluster. It can be compiled with multi-threading via OMP, however there does not seem to be an elegant way to control its behaviour in a clustered environment other than to use PBS directives to book an entire node. The problem here is that a user would need a completely free node to start with. We'll see how well this works, however we may elect to recompile the application without OMP support.
In addition to Velvet we installed the Velvet Optimiser. This is very simple to install, however one of the requirements is apparently BioPerl. This install was a bit more complex and required an install base directive to ensure that the scripts ended up in the NFS application mount.
Velvet is memory hungry, on a node with 4GB of RAM an assembly took over an hour to reconstruct, whereas on a node with 24GB or RAM the same read took just over a minute. We're hoping that this application is swiftly ported to Grid format as our site has a distinct lack of large memory machines.

We noticed that while MrBayes jobs have been running OK, the sumt process SegFaults. Not a huge crisis as this can be finished up in post-processing. However our user had already solved this issue on his desktop by applying a patch for the 64 bit version of MrBayes mpi. The problem was resolved after applying the patch.
Jack Viljoen, a masters student in the Botany department is working on the evolution of Cape Sedges.
For reconstructing the evolutionary trees of groups like the Cypereae and Schoeneae (Fig. 1),

.... we use Monte Carlo Markov Chain (MCMC) sampling to estimate parameter values related to models of DNA sequence evolution. This involves running programs like MrBayes for about 5–50 million iterations, in order to explore parameter space to find the region of highest likelihood (Fig. 2a), and to sample this region sufficiently to get good parameter estimates (Fig. 2b).
This process is made much more efficient by running multiple instances of the sampler simultaneously on different CPUs, and we are grateful to the HPC unit at UCT for allowing us to do just that.
Our monthly ICT maintenance slots allow for system administrators at ICTS to conduct configuration changes, system and security updates. Unfortunately during this maintenance slot its a complete power shutdown and therefore no ICT services will be available during 09:00 - 17:00
We are asking all our HPC and Grid users to ensure that they checkpoint their jobs before Sunday.
We spent this week porting a suite of packages for short sequence analysis: Velvet, Oases, FastX, FastQC and Maq.
We've only tested Velvet so far, it's memory hungry and seems to run best on the 400 series blades by several orders of magnitude. It's also OMP capable, but not MPI, which means instances need to be launched carefully in order to avoid hogging CPU resources.
We recently installed Crux on our cluster for researchers in the IIDMM department at UCT. Unfortunately it's not MPI aware and only runs on single cores. Additionaly the binary we downloaded did not run on the cluster due to a library mis-match. However the site allows one to apply for a free academic license to download the source code. Once we'd recompiled the software it ran just fine. Still awaiting the first results, we'll publish test images and data as they come in.
Subsequent to the installation of MrBayes we decided to deploy BLAST on the HPC cluster to complement the package on our Grid cluster. BLAST (Basic Local Alignment Search Tool) is a free bioinformatics package for genome searches and is hosted by the National Center for Biotechnology Information.
We hope to install an MPI version of BLAST in the future but this will probably require the assistance of researchers to identify the best method of database sharing.