Big Bytes

Autodock

We've installed Autodock4.2 on the cluster.  It's a popular molecular modeling simulation package and is one of the most cited docking packages in the research community.  We've installed both the 32 and 64 bit versions.

autodock

BISRU and LS-DYNA

We've recently taken on some new users and a new package.  Researchers in UCT's Blast Impact and Survivability Unit will be making use of ls-dyna, a strongly coupled multi-physics solver, on ICTS's HPC cluster.

Our first user is a 2nd year Master student doing a thesis in the field of structural impact loading.  Our second user is a PhD student investigating the effects of the different levels of confinement of a blast to the response of a deformable plates.

New version of Crux availabe

We've upgraded our Crux tandem mass spectrometry analysis software from version 1.35 to 1.36 to take advantage of new functionality.

Compile / Install / Deploy OpenFOAM 2.0

Big-Bytes decided to write up a quick tutorial on how to install OpenFOAM 2 in the user space. Administrators and users constantly battle with dependencies especially with the latest software releases. This problem begins when a system administrator installs and configures a Linux system. All is well in the beginning and users are happy but at some point the user will begin to complain about outdated software dependency versions. Some system dependencies can be upgraded rather quickly but others not and a user space compile is imminent. This was in the case of GCC, its linkers and the dependencies required to build GCC.

OpenFoam 2 requires GCC 4.4 and above to compile correctly and on a Scientific Linux 5.4 system only GCC 4.1.2 was available. It is not recommended to upgrade a system-wide compiler such as GCC and its also not trivial to upgrade a linux machine while having to deal with the demands of users.


Compile GCC

Create a directory $HOME/contrib and $HOME/contrib/build. You can of course name these directories anyway you like.
Download all the dependencies for GCC. These are listed below and extract into contrib.

Begin to compile the dependencies first before GCC. You may need to compile the GMP dependency first as the other dependencies depend on it. 

Dependency Compile and Install

    GMP:
  • Extract with tar xfvz gmp.tar.gz
  • Change into the directory and compile with ./configure --prefix=$HOME/contrib/gmp-complete
  • make && make install

     MPFR:
  • Extract with tar xfvz mpfr.tar.gz
  • Change into the directory and compile with ./configure --prefix=$HOME/contrib/mpfr-complete --with-gmp-build=<directory in which the source is in, the extracted directory and not the compiled directory>
  • make && make install

    MPC:
  • Extract with tar xfvz mpc.tar.gz
  • Change into the directory and compile with ./configure --prefix=$HOME/contrib/mpc-complete --with-gmp=$HOME/contrib/gmp-complete --with-mpfr=$HOME/contrib/mpfr-complete
  • make && make install

   

     BinUtils:

  • Extract with bunzip binutils && tar xfv binutils.tar
  • Change into the directory and compile with ./configure --prefix=$HOME/contrib/binutils-complete
  • make && make install
  • make target_header_dir=$HOME/binutils-complete/include -C libiberty


    Library and Path location updates

Now that we have completed all the dependency requirements its important to update the library paths and system paths. Add the following below to $HOME/.bashrc ( User specific functions).

LD_LIBRARY_PATH=$HOME/contrib/mpc-complete/lib:$HOME/contrib/mpfr-complete/lib:$HOME/contrib/gmp-complete/lib:$HOME/contrib/gcc-complete/lib/:$HOME/contrib/gcc-complete/lib64:$LD_LIBRARY_PATH

COMPILE_OF=$HOME/contrib/gcc-complete/bin:$HOME/contrib/gcc-complete/lib:$HOME/contrib/binutils-complete/bin
PATH=$COMPILE_OF:$PATH
CPATH=$HOME/contrib/gcc-complete/include


Logout and login to refresh the environment. Its important to execute the " which ld " and " which gcc ". This will ensure that you have your environment setup correctly. To confirm that you have the correct version of GCC installed, execute " gcc --version "


       Compile latest version of GCC

  • Download the latest version of GCC from a mirror site into $HOME/contrib and extract there.
  • Change directory into the $HOME/contrib/build. Notice that we are NOT executing the "./configure" script from within the extracted directory but rather from a build directory. Its extremely important to adhere to this rule.
  • cd $HOME/contrib/build
  • ../gcc-4.6.1/configure --prefix=$HOME/contrib/gcc-complete/ --enable-languages=c,c++ --with-gmp=$HOME/contrib/gmp-complete --with-mpc=$HOME/contrib/mpc-complete/ --with-mpfr=$HOME/contrib/mpfr-complete/  ( NB: notice the importance of the two periods before the configure script )
  • make -j 4 ( This assumes that you would like the compile of GCC to consume 4 CPU processors.
  • make install
  • make clean ( This will erase all the intermediate files )
  • $HOME/contrib/gcc-complete/bin/gcc –version should give show the correct version.


 

Install and Compile OpenFOAM 2

You will need to do this as your normal user, not root, in order to maintain access to gcc 4.6.  Compile in $HOME as per default and copy the compiled package to the shared software area later.
  • Create directory $HOME/OpenFOAM
  • Download the OpenFOAM2 and ThirdParty compression files from the OpenFOAM website
  • Extract both into the $HOME/OpenFOAM directory with the " tar xfvz <filename>" command
  • Change directory into  $HOME/OpenFOAM/OpenFOAM-2.0.1/etc
  • Open up the " bashrc " file and locate the " foamInstall=$HOME " variable. If you are installing OpenFOAM into a alternate location you may update the location here. 
  • Save the file and exit
  • Add the following to $HOME/.bashrc - "source  $HOME/OpenFOAM/OpenFOAM-2.0.1/etc/bashrc"
  • Set the following environment variable to get OpenFOAM to utilize the maximum number of CPU processors on the system. " export WM_NCOMPPROCS=4 "
  • Change directory into $HOME/OpenFOAM/ThirdParty-2.0.1/
  • Execute the file " ./Allwmake "
  • When completed successfully execute "./$HOME/OpenFOAM/OpenFOAM-2.0.1/etc/bashrc". This will refresh the shell environment.
  • Change directory into $HOME/OpenFOAM/OpenFOAM-2.0.1/ and execute " ./Allwmake "
If all goes well you should have a fully compiled version of OpenFOAM. Tutorials are available on the OpenFOAM website.

 

Compiling ABySS

From the ABySS web site: ABySS (Assembly By Short Sequences) is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.

We decided to document the installation of ABySS 1.2.7

First, you will need to download a dependency called " Sparsehash ". The download can be found here.

Google Sparsehash Installation
If you have administrative permissions then follow this section
        - ./configure
        - make
        - make install

NB: This install the sparse hash into /usr/local/include
   
For non administrative users
        - ./configure --prefix=$HOME/user
        - make
        - make install 

ABySS Installation 

Download

 - ./configure --prefix=/opt/exp_soft/sagrid/abyss-1.2.7 --with-mpi=/usr/lib64/openmpi CPPFLAGS=-I/opt/exp_soft/sagrid/abyss-1.2.7/google-sparsehash-1.11/include

The CPPFLAGS entry is used to reference the location of the dependency installed earlier.

NB: Ensure that the following is correct before continuation. The config.log file generated during the compile will indicate if the CPPFLAGS entry was successfully used. 

checking google/sparse_hash_map usability... yes
checking google/sparse_hash_map presence... yes
checking for google/sparse_hash_map... yes


- make

 If you are running a earlier version of GCC you would see an error to the effect of " "cc1plus: warnings being treated as errors
DistanceEst.cpp:392: warning: ignoring #pragma omp task "
Update your make command with :


- make AM_CXXFLAGS=-Wall
- make install


Installation is now complete and access path to the binary is available at - /opt/exp_soft/sagrid/abyss-1.2.7 - or whichever location you have made available in your --prefix=.

10 years of computing

Today the combined work effort of our clusters reached 10 years worth of processing time, that is the equivalent of one processor working continuously for 10 years.  We only started taking on serious workloads in April this year so while this is not a huge milestone compared to other well established institutes it's a gratifying achievement given our relative short term exposure to HPC.

The graphs below are the total number of jobs submitted to our two clusters.  Many of the jobs submitted to the grid cluster are probe jobs from international projects checking on our system status, so while there have been more jobs on this cluster the predominant amount of work has been done on our HPC cluster.

HPC cluster: 

HPC jobs

Grid cluster:

Grid jobs

Given our new application porting strategy for grid based projects we are hoping to see a significant increase in jobs submitted to our grid cluster in the next few months.

 

Deep sequencing of human papillomavirus

HPV seq

HPV infection is causally linked to cervical and other genital cancers.   The human papillomavirus (HPV) research group at IIDMM, UCT are using next-generation sequencing (NGS) to explore the diversity of this virus in HIV infected individuals. NGS is a powerful new technology that allows us to examine viral diversity directly in clinical specimens. This technology is transforming the world of genomics, but many biologists are struggling to keep up with the analysis of all the data produced; both in terms of having the required computing power and the technical know-how.  Thanks to the HPC team at ICTS who have set-up several programs for assembling and analysing NGS data.  We look forward to exciting results.

Dr. Tracy L. Meiring

Matlab on diet

Over the past week we became aware of an issue with Matlab and its desire to consume as many CPUs as possible.  This despite the fact that we were no longer running the parallel library.  This thread outlines the problem we were experiencing and this thread gives the solution, namely calling Matlab with the -singleCompThread argument.  In the process list we still noticed 2 threads but one was set to 'Sleep' while only one consumed actual CPU time.  The designated CPU seemed to be static, in other words the thread did not jump from one CPU to another and the consumed resource via qstat and SNMP was consistent.  Obviously from a user's perspective it is desirable to consume as many CPUs as possible, however a shared environment where the assignemnt of resources is removed from the scheduler will invariably lead to resource contention and overall performance degredation.

Velvet and BioPerl

We recently ported Velvet, a short sequence assembler, to our cluster.  It can be compiled with multi-threading via OMP, however there does not seem to be an elegant way to control its behaviour in a clustered environment other than to use PBS directives to book an entire node.  The problem here is that a user would need a completely free node to start with.  We'll see how well this works, however we may elect to recompile the application without OMP support.

In addition to Velvet we installed the Velvet Optimiser.  This is very simple to install, however one of the requirements is apparently BioPerl.  This install was a bit more complex and required an install base directive to ensure that the scripts ended up in the NFS application mount.

Velvet is memory hungry, on a node with 4GB of RAM an assembly took over an hour to reconstruct, whereas on a node with 24GB or RAM the same read took just over a minute.  We're hoping that this application is swiftly ported to Grid format as our site has a distinct lack of large memory machines.

Assemble

MrBayes Sumt SegFaults

We noticed that while MrBayes jobs have been running OK, the sumt process SegFaults.  Not a huge crisis as this can be finished up in post-processing.  However our user had already solved this issue on his desktop by applying a patch for the 64 bit version of MrBayes mpi.  The problem was resolved after applying the patch.

Evolution of the Cape Sedges

Jack Viljoen, a masters student in the Botany department is working on the evolution of Cape Sedges.
For reconstructing the evolutionary trees of groups like the Cypereae and Schoeneae (Fig. 1),

Mr Bayes

.... we use Monte Carlo Markov Chain (MCMC) sampling to estimate parameter values related to models of DNA sequence evolution.  This involves running programs like MrBayes for about 5–50 million iterations, in order to explore parameter space to find the region of highest likelihood (Fig. 2a), and to sample this region sufficiently to get good parameter estimates (Fig. 2b).

Mr Bayes


This process is made much more efficient by running multiple instances of the sampler simultaneously on different CPUs, and we are grateful to the HPC unit at UCT for allowing us to do just that.

Maintenance slot scheduled for Sunday - 19 July 2011

Our monthly ICT maintenance slots allow for system administrators at ICTS to conduct configuration changes, system and security updates. Unfortunately during this maintenance slot its a complete power shutdown and therefore no ICT services will be available during 09:00 - 17:00

We are asking all our HPC and Grid users to ensure that they checkpoint their jobs before Sunday.

Sequencing suite

We spent this week porting a suite of packages for short sequence analysis: Velvet, Oases, FastX, FastQC and Maq.

We've only tested Velvet so far, it's memory hungry and seems to run best on the 400 series blades by several orders of magnitude. It's also OMP capable, but not MPI, which means instances need to be launched carefully in order to avoid hogging CPU resources.

 

Crux tandem mass spectrometry

We recently installed Crux on our cluster for researchers in the IIDMM department at UCT.  Unfortunately it's not MPI aware and only runs on single cores.  Additionaly the binary we downloaded did not run on the cluster due to a library mis-match.  However the site allows one to apply for a free academic license to download the source code.  Once we'd recompiled the software it ran just fine.  Still awaiting the first results, we'll publish test images and data as they come in.

Blast!

Subsequent to the installation of MrBayes we decided to deploy BLAST on the HPC cluster to complement the package on our Grid cluster.  BLAST (Basic Local Alignment Search Tool) is a free bioinformatics package for genome searches and is hosted by the National Center for Biotechnology Information.

We hope to install an MPI version of BLAST in the future but this will probably require the assistance of researchers to identify the best method of database sharing.

1 2 3  Next»