BLAST

Building BLAST with Arm Compiler


Overview Before you begin Procedure Testing Related information

How to build BLAST with Arm Compiler.

BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. For more information, see the BLAST website.

The following components are used for this build:

Component Version
Ncbi BLAST 2.7.1
Arm compiler version  18.4
LMDB 0.9.22
Operating system RHEL 7.5
Hardware Cavium ThunderX2

Procedure

  1. Download, unpack, and set the directory path for the BLAST source code.

    wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.7.1+-src.tar.gz
    tar xzf ncbi-blast-2.7.1+-src.tar.gz
    cd ncbi-blast-2.7.1+-src
    export BLAST_TOP=`pwd`
    
  2. Configure the compiler.

    export CC=armclang
    export CXX=armclang++
    
  3. Clone the LMDB repository, set the environment variable LMDB_PATH to the location of the new repository, and checkout version 0.9.22.

    git clone https://github.com/LMDB/lmdb.git $BLAST_TOP/LMDB
    export LMDB_PATH=$BLAST_TOP/LMDB
    cd $LMDB_PATH
    git checkout LMDB_0.9.22 LMDB_0.9.22
    
  4. Patch the BLAST configuration scripts for AArch64 and armclang.

    cd $BLAST_TOP
    patch -p0 <<EOF
    --- ./c++/src/build-system/configure    2018-03-09 15:52:57.387959311 +0900
    +++ ./c++/src/build-system/configure    2018-03-09 16:00:21.301852097 +0900
    @@ -6300,7 +6300,7 @@
       test -z "\$as_dir" && as_dir=.
       for ac_exec_ext in '' \$ac_executable_extensions; do
       if { test -f "\$as_dir/\$ac_word\$ac_exec_ext" && \$as_executable_p "\$as_dir/\$ac_word\$ac_exec_ext"; }; then
    -    ac_cv_prog_ac_ct_CC_FOR_BUILD="gcc"
    +    ac_cv_prog_ac_ct_CC_FOR_BUILD=\${CC}
         echo "\$as_me:\$LINENO: found \$as_dir/\$ac_word\$ac_exec_ext" >&5
         break 2
       fi
    @@ -8437,6 +8437,9 @@
         mips*:GCC )
           ARCH_CFLAGS="-mips64"
           ;;
    +    aarch64:* )
    +      ARCH_CFLAGS=
    +      ;;
         *:GCC )
           # May not work prior to GCC 3.1.
           ARCH_CFLAGS="-m64"
    --- ./c++/src/build-system/configure.ac 2018-03-09 15:52:57.381959125 +0900
    +++ ./c++/src/build-system/configure.ac 2018-03-09 16:01:25.954757724 +0900
    @@ -1606,6 +1606,9 @@
         mips*:GCC )
           ARCH_CFLAGS="-mips64"
           ;;
    +    aarch64:* )
    +      ARCH_CFLAGS=
    +      ;;
         *:GCC )
           # May not work prior to GCC 3.1.
           ARCH_CFLAGS="-m64"

    EOF

    patch --ignore-whitespace -p0 <<EOF
    --- c++/include/corelib/ncbifloat.h     2018-05-08 13:23:48.685315074 +0100
    +++ c++/include/corelib/ncbifloat_new.h 2018-05-08 13:23:32.455419771 +0100
    @@ -69,7 +69,7 @@
     #    undef isnan
     #  endif
     #  if __cplusplus >= 201103L  &&  defined(_GLIBCXX_CONSTEXPR)  \\
    -    &&  !defined(__MIC__) 
    +    &&  !defined(__MIC__) && !defined (__clang__)
     #    define ISNAN_CONSTEXPR _GLIBCXX_CONSTEXPR
     #  else
     #    define ISNAN_CONSTEXPR
    EOF
  5. Configure and build BLAST from the c++ directory with Arm Compiler for HPC.
    The executables are generated in $BLAST_TOP/c++/ReleaseMT/bin.

    cd $BLAST_TOP/c++
    ./configure 
    cd ReleaseMT/build
    make all_r -j
    

Testing

  1. Download, decompress and rename the swissprot proteins database from https://www.ncbi.nlm.nih.gov/:

    wget  ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/swissprot.gz
    # Decompress the swissprot and renamet with .fa extension
    gzip -d ./swissprot.gz
    mv swissprot swissprot.fa
  2. Download the protein_query.fasta file as an input protein.

  3. Create a directory, data, and move all the databases to data.

    mkdir data
    mv ./swissprot.fa data/.
    mv ./protein_query.fasta data/.
  4. Build and index the protein database:

    makeblastdb -in data/swissprot.fa -dbtype prot

    The following swissprot files are generated:

    • swissprot.fa

    • swissprot.fa.pin

    • swissprot.fa.phr

    • swissprot.fa.psq

  5. Search and generate the protein alignment against the swissprot indexed database:

    blastp -query data/protein_query.fasta -db data/swissprot.fa > output_protein_alignments.txt
  6. Ensure that the contents of the output file and the reference file match each other. There must be no differences, except in the headers of each file .
    Download the reference file for comparing with the output file. 

    diff output_protein_alignments.txt reference_protein_alignments.txt