Challenge

Ultra Highthroughput Sequence Pipeline

Challenge: Ultra Highthroughput Sequencing (UHTS) technologies are rapidly changing the way we approach fundamental questions in biology. Technological advances in nanotechnology and sequencing chemistry have together revolutionized our ability to obtain low-cost, high-throughput DNA sequence. As these technologies advance at a rapid pace, they pose new challenges for standardizing sequence information and in automating computational tasks.

The primary objective of the Ultra Highthroughput Working Group is to establish an informatics pipeline that allows members of the plant science community to process UHTS data (e.g. Illumina, Solid) using simple, user friendly interfaces. iPlant developers work with the UHTS Working Group to make tools that allow users to import UHTS files and output data matrices using the algorithms commonly used by the community for processing DNA and RNA data sets.

 

iPlant’s sequence analysis effort enables users to upload DNA or RNA sequencing data from their desktop, a remote server, or from the NCBI Sequence Read Archive, then view, manage, and perform basic analysis on the data in a user-centric workspace. Data management capabilities include annotation with metadata and pre-processing sequence data to remove non-biological sequence production artifacts (e.g. linkers, primers, etc). Scientists are able to perform two basic analytical workflows using their post-processed sequence data in a relatively short period of time and without complex command-line utilities.

Workflows

Variant detection - Supports DNA sequence data and allows users to detect single nucleotide polymorphisms (SNPs) in a test sequence compared to a reference sequence. The input of the workflow is a library of short read data and a reference sequence and the output is a list of SNP differences.

 

Transcript quantification - Supports RNA sequence data and provides transcript quantification relative to a reference genome. Initially, users will be able to choose various reference genomes (Arabidopsis thaliana, Zea mays, Arabidopsis lyrata, Brachypodium distachyon, Oryza sativa nipponbare, Oryza sativa indica, Populus trichocarpa, Sorghum bicolor, and Vitis vinifera) as the basis for their analyses.

 

Additional workflows are under development to allow discovery of novel RNA transcripts, comparative RNAseq analyses, and automated functional annotation of discovered polymorphisms.

 

Current CI services available (and more coming online regularly)

  • Bioinformatics software available through the iPlant Discovery Environment
    • Sequence alignments and phylogenetic tree building
    • Phylogenetic and evolutionary analyses
    • Ultra high-throughput sequence processing and variant detection
    • QTL mapping and genome-wide association studies
    • Functional analyses
    • Clustering and network analyses
    • ChIPseq studies
    • Utility tools and scripts
    • Full list at https://pods.iplantcollaborative.org/wiki/display/DEman0p4/Tools+list
  • Access to collaboration tools
    • Public and private wiki spaces, Mailing lists
    • Video conferencing setup and support
  • Data hosting - Access to mirroring, backup, and recovery services at petascale
  • Web and application hosting
  • Access to persistent virtual machines
    • Algorithm development
    • Software prototyping
  • Command-line access to production and experimental supercomputers, archive systems
  • Access to an online bug tracking and issue system
  • Git/svn code hosting within iPlant and through SourceForge and GitHub

Working Group Members

Name Role Institution
Tom Brutnell Working Group Lead
Donald Danforth Plant Science Center
Justin Borevitz Collaborator Univeristy of Chicago
Todd Mockler Collaborator Oregon State University
Pat Schnable Collaborator Iowa State University
Michele Morgante Collaborator Univerita degli Studi di Udine
Bob Schmitz Collaborator Salk Institute for Biological Studies
Lin Wang Collaborator Cornell
Blake Meyers Collaborator Delaware Biotechnology Institute
Matt Hudson Collaborator University of Illinois
Scott Jackson Collaborator Purdue University
Brad Barbazuk Collaborator University of Florida
Greg May Collaborator The National Center for Genomic Resources
Zhenyuan (Jerry) Lu Collaborator iPlant Collaborative, Cold Spring Harbor Laboratory
Liya Wang Collaborator iPlant Collaborative, Cold Spring Harbor Laboratory
Chunlao Tang Collaborator iPlant Collaborative, Cold Spring Harbor Laboratory
Chris Jordan Collaborator iPlant Collaborative, Texas Advanced Computing Center