Bioinformatics Training Platform (BTP) User Guide
Monash eResearch Centre have been partnering with Bioplatforms Australia (BPA) and CSIRO for delivering hands-on bioinformatics workshops to Australian researchers. This partnership started back in 2012 when the first next-generation sequencing workshop was delivered at Monash University. The Bioinformatics Training Platform (BTP) has been developed as a broadly accessible solution for delivering these workshops using cloud computing infrastructures and addressing the increasing demand for training workshops in Australia.
The standard NGS workshop delivered by BPA and CSIRO is composed of a number of training modules. Each training module is composed of the following components: analysis tools, datasets and training materials. The analysis tools are essential to a hands-on bioinformatics workshop as it teaches scientists how such tools are used to interrogate and interpret results. Most of the analysis tools used by the BTP are open source and easily accessible on various platforms. The BTP maintains these analysis tools on BPA-CSIRO GitHub Project. To support the hands-on bioinformatics workshop, analysis and reference datasets are used together for analysis. Datasets are stored and managed on the NeCTAR object storage service. The training materials (presentations and handouts) are also developed and maintained as part of the BTP. A high level overview of the BTP architecture is show below:
A workshop is composed of a number of training modules. Each training module has an associated analysis tools and datasets metadata. These metadata are used by the orchestration system for automation of tools installation during BTP image creation and datasets syncing during BTP instance deployments.
The BTP as a solution is composed of a number of components: Analysis Tools, Datasets and Training Materials. These components are encapsulated on each of the training modules. A brief summary of each one are included below.
Analysis tools are essential for delivering hands-on training workshops. The BTP teaches trainees how these tools can be used to help interrogate and interpret their analysis results. Most of these analysis tools are available as stand alone executables while others have to be built from source. The most common analysis tools used and packaged in the BTP are listed in the table below:
|AMOS Hawkeye||Genome data visualization|
|BEDTools||Genome data manipulation|
|BLAT||Sequence location lookup in the genome|
|CummeRbund||RNA-Seq analysis using R|
|DESeq2||Differential gene expression analysis using R|
|edgeR||Empirical gene expression analysis using R|
|FASTX||Toolkit for short reads preprocessing|
|IGV||Interactive exploration of genomic data|
|igvtools||Preprocessing of data before loading to IGV|
|MUMmer||Rapid genome alignment, a dependency for AMOS|
|PeakAnalyzer||Multi-peak data analysis|
|Picard||Sequence data analysis|
|SAMtools||For manipulating alignments in the SAM format|
|Skewer||Adapter trimmer for paired-end reads|
These analysis tools are packaged and installed into the BTP Image upon creation. This process is described in detail in the [BTP Workflows][#btp-workflows] section below.
Packaging of Analysis Tools
To make maintenance and deployment of the BTP easier, most of the analysis tools included in the BTP are packaged as stand-alone
*.deb installers, installable on Ubuntu-based operating systems.
fpm-cookery is used to create the
*.deb installers using recipes. The current BTP Images are based on Ubuntu Precise LTS which is supported until 2017. The BTP maintains a set of recipes on GitHub which is also integrated with Travis CI for continuous integration. Whenever a tool recipe is updated or new recipe is added into the repository, Travis CI ensures that the recipes builds correctly.
.deb installers can also be manually created on a client machine. This step is not mandatory for creating the BTP images and deployment of BTP instances. One can clone the BTP Tools repository, and using
fpm-cookery, create the
Clone the BTP Tools repository:
git clone https://github.com/BPA-CSIRO-Workshops/btp-tools.git
.deb installer for a particular analysis tool can then be built using
fpm-cookery will read the instructions in the recipe and build the
.deb installer. In the example above,
fpm-cookery will get the source code for
bwa from GitHub, then will proceed with the standard configuration, compilation and building of the tool.
fpm-cookery will place the resulting
.deb installer inside the
Datasets used by the BTP are stored on NeCTAR Object Storage (Swift). Each training modules has a corresponding publicly readable container where its datasets are accessible. The containers for the NGS training modules are listed in the table below:
|Introduction to CLI||NGSDataCommandLine|
|De novo Assembly||NGSDataDeNovo|
The complete container URL are used to pull down the datasets into the BTP instances. The URLs are defined on the datasets metadata file, which is used by Puppet to pull down the datasets into the BTP instances upon deployment.
Creating the BTP Image
Prerequisites for Creating BTP Images
- OpenStack Glance Command Line Client
- NeCTAR OpenStack Credentials
- internet access
The BTP uses the Packer tool for creating new virtual machine images compatible with NeCTAR Research Cloud.
Get Workshop Repository
A collection of Packer recipes are included as part of the orchestration module inside the BPA-CSIRO Workshop NGS Repository. This repository must be cloned together with the training submodules on the client machine where the BTP image will be created:
git clone --recurse-submodules https://github.com/BPA-CSIRO-Workshops/btp-workshop-ngs.git
Once the workshop repository and the training submodules have been cloned, the local copy will have the following contents structure:
The numberings (e.g
015_preamble) are used by the build system to order by which the training modules are ordered in the training handouts. Each training module has an associated handouts subdirectory where the LaTeX file for that module is located. Going inside the
orchestration/packer subdirectory will show the available Packer recipes including the one compatible with the NeCTAR Research Cloud,
btp-qemu.json. The available Packer recipes for the BTP are:
Before launching Packer, the content of the recipe file
btp-qemu.json can be viewed. Descriptions for each recipe section are highlighted below.
The image creation process can then be started by feeding the recipe into Packer. This assumes that the Packer is already installed on the client (build) machine. Packer can then be launched, with the recipe filename passed as a command line argument:
packer build btp-qemu.json
Packer will start the image creation process and the first thing this will do is to download the latest available alternate Ubuntu ISO from the internet. So the client machine must have access to the internet to download the ISO file.
Deploying BTP Instances
Prerequisites for Deploying BTP Instances
- NeCTAR OpenStack Credentials
- OpenStack Nova Command Line Client
- NoMachine (NX) Client