URL: http://blast.ncbi.nlm.nih.gov/
Categories: application, open-source
Genomics; C++;
BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance.
Version
Ncbi blast 2.7.1 Arm - Ncbi blast 2.7.1 GNU
Build Details For Version 2.7.1 Arm
Configuration
- Ncbi blast 2.7.1
- LMDB 0.9.22
- Arm compiler version 18.4
Build instructions
Downloading and unpack the packages
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.7.1+-src.tar.gz
tar xzf ncbi-blast-2.7.1+-src.tar.gz
cd ncbi-blast-2.7.1+-src
export BLAST_TOP=`pwd`
Compiler configuration
export CC=armclang
export CXX=armclang++
Build configuration
Clone the LMDB repository and set the environment variable LMDB_PATH
to the location of the cloned repository. Then checkout the 0.9.22 release version:
git clone https://github.com/LMDB/lmdb.git $BLAST_TOP/LMDB
export LMDB_PATH=$BLAST_TOP/LMDB
cd $LMDB_PATH
git checkout LMDB_0.9.22 LMDB_0.9.22
BLAST's configure scripts will need to be patched for AArch64 and armclang
:
cd $BLAST_TOP
patch -p0 <<EOF
--- ./c++/src/build-system/configure 2018-03-09 15:52:57.387959311 +0900
+++ ./c++/src/build-system/configure 2018-03-09 16:00:21.301852097 +0900
@@ -6300,7 +6300,7 @@
test -z "\$as_dir" && as_dir=.
for ac_exec_ext in '' \$ac_executable_extensions; do
if { test -f "\$as_dir/\$ac_word\$ac_exec_ext" && \$as_executable_p "\$as_dir/\$ac_word\$ac_exec_ext"; }; then
- ac_cv_prog_ac_ct_CC_FOR_BUILD="gcc"
+ ac_cv_prog_ac_ct_CC_FOR_BUILD=\${CC}
echo "\$as_me:\$LINENO: found \$as_dir/\$ac_word\$ac_exec_ext" >&5
break 2
fi
@@ -8437,6 +8437,9 @@
mips*:GCC )
ARCH_CFLAGS="-mips64"
;;
+ aarch64:* )
+ ARCH_CFLAGS=
+ ;;
*:GCC )
# May not work prior to GCC 3.1.
ARCH_CFLAGS="-m64"
--- ./c++/src/build-system/configure.ac 2018-03-09 15:52:57.381959125 +0900
+++ ./c++/src/build-system/configure.ac 2018-03-09 16:01:25.954757724 +0900
@@ -1606,6 +1606,9 @@
mips*:GCC )
ARCH_CFLAGS="-mips64"
;;
+ aarch64:* )
+ ARCH_CFLAGS=
+ ;;
*:GCC )
# May not work prior to GCC 3.1.
ARCH_CFLAGS="-m64"
EOF
patch --ignore-whitespace -p0 <<EOF
--- c++/include/corelib/ncbifloat.h 2018-05-08 13:23:48.685315074 +0100
+++ c++/include/corelib/ncbifloat_new.h 2018-05-08 13:23:32.455419771 +0100
@@ -69,7 +69,7 @@
# undef isnan
# endif
# if __cplusplus >= 201103L && defined(_GLIBCXX_CONSTEXPR) \\
- && !defined(__MIC__)
+ && !defined(__MIC__) && !defined (__clang__)
# define ISNAN_CONSTEXPR _GLIBCXX_CONSTEXPR
# else
# define ISNAN_CONSTEXPR
EOF
Build BLAST with Arm Compiler for HPC
Configure and build BLAST from the c++ directory.
cd $BLAST_TOP/c++
./configure
cd ReleaseMT/build
make all_r -j
The executables generated will be in $BLAST_TOP/c++/ReleaseMT/bin
A test example is provided at the end.
Build Details For Version 2.7.1 GNU
Configuration
- Ncbi blast 2.7.1
- LMDB 0.9.22
- Gcc compiler version 7.1
Build instructions
Downloading and unpack the packages
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.7.1+-src.tar.gz
tar xzf ncbi-blast-2.7.1+-src.tar.gz
cd ncbi-blast-2.7.1+-src
export BLAST_TOP=`pwd`
Compiler configuration
export CC=gcc
export CXX=g++
Build configuration
Clone the LMDB repository and set the environment variable LMDB_PATH
to the location of the cloned repository. Then checkout the 0.9.22 release version:
git clone https://github.com/LMDB/lmdb.git $BLAST_TOP/LMDB
export LMDB_PATH=$BLAST_TOP/LMDB
cd $LMDB_PATH
git checkout LMDB_0.9.22 LMDB_0.9.22
BLAST's configure scripts will need to be patched for AArch64:
cd $BLAST_TOP
patch -p0 <<EOF
--- ./c++/src/build-system/configure 2018-03-09 15:52:57.387959311 +0900
+++ ./c++/src/build-system/configure 2018-03-09 16:00:21.301852097 +0900
@@ -6300,7 +6300,7 @@
test -z "\$as_dir" && as_dir=.
for ac_exec_ext in '' \$ac_executable_extensions; do
if { test -f "\$as_dir/\$ac_word\$ac_exec_ext" && \$as_executable_p "\$as_dir/\$ac_word\$ac_exec_ext"; }; then
- ac_cv_prog_ac_ct_CC_FOR_BUILD="gcc"
+ ac_cv_prog_ac_ct_CC_FOR_BUILD=\${CC}
echo "\$as_me:\$LINENO: found \$as_dir/\$ac_word\$ac_exec_ext" >&5
break 2
fi
@@ -8437,6 +8437,9 @@
mips*:GCC )
ARCH_CFLAGS="-mips64"
;;
+ aarch64:* )
+ ARCH_CFLAGS=
+ ;;
*:GCC )
# May not work prior to GCC 3.1.
ARCH_CFLAGS="-m64"
--- ./c++/src/build-system/configure.ac 2018-03-09 15:52:57.381959125 +0900
+++ ./c++/src/build-system/configure.ac 2018-03-09 16:01:25.954757724 +0900
@@ -1606,6 +1606,9 @@
mips*:GCC )
ARCH_CFLAGS="-mips64"
;;
+ aarch64:* )
+ ARCH_CFLAGS=
+ ;;
*:GCC )
# May not work prior to GCC 3.1.
ARCH_CFLAGS="-m64"
EOF
Build and install BLAST with Gcc Compiler for HPC
Configure and build BLAST from the c++ directory.
cd $BLAST_TOP/c++
./configure
cd ReleaseMT/build
make all_r -j
The executables generated will be in $BLAST_TOP/c++/ReleaseMT/bin
The executables generated will be in $BLAST_TOP/c++/ReleaseMT/bin
A test example is provided at the end.
Testing
First we need to download a protein database, here we will use the traditional and widely used proteins database swissprot https://en.wikipedia.org/wiki/UniProt#UniProtKB.2FSwiss-Prot More databases available in the next link ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/
wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/swissprot.gz
# Decompress the swissprot and renamet with .fa extension
gzip -d ./swissprot.gz
mv swissprot swissprot.fa
Download this protein_query.fasta as an input protein.
Create a directory, data, and move all the databases to data:
mkdir data
mv ./swissprot.fa data/.
mv ./protein_query.fasta data/.
Build and index the protein database:
makeblastdb -in data/swissprot.fa -dbtype prot
You should be expecting to see the next files:
- swissprot.fa
- swissprot.fa.pin
- swissprot.fa.phr
- swissprot.fa.psq
Search and generate the protein alignment against the swissprot indexed database:
blastp -query data/protein_query.fasta -db data/swissprot.fa > output_protein_alignments.txt
Finally, check the output (output_protein_alignments.txt) against this reference file.
diff output_protein_alignments.txt reference_protein_alignments.txt
There should be no differences other than the header.