| About | Grand Challenges | Discovery Environment | Communities | News & Pubs | Events | Contact |
Overview
A Discovery Environment is...
...a term used by the iPlant Collaborative to describe a software system that allows Grand Challenge team participants to access the relevant data sets, integrate across them to identify connections, visualize them in ways that allow the ‘big picture’ to appear, manipulate the data with analytic tools, and share results by facilitating computational steering.
Our model for Discovery Environments (DEs) are Internet ‘mashups’, also known as Web 2.0 applications, which allow community members to build content in a democratic way, to make and label connections between different types of content, and to integrate a variety of different types of information in a single user interface. The wildly successful Wikipedia project is one such application. It allows users to create and interconnect knowledge using the shared metaphor of an encyclopedia. Google Maps is another well-known mashup application. It provides the community a common reference system, a detailed geographic map of the world, and encourages people to link in their own datasets indexed by GPS location. Datasets contributed by independent groups, e.g., average housing prices compiled by one group and mean SAT scores of students enrolled in school districts compiled by another, can become dramatically more useful when linked together by a common coordinate system. Mashups allow new patterns among data to be revealed and allow one to make hypotheses of causality that would be impossible if the datasets were examined separately.
DEs are the cyber equivalent of the iPlant Collaborative physical meeting spaces. During the formative phases, they will provide a way to exchange ideas and prototypes and collaboratively create and refine an approach. During the production phase, they will provide a collaborative environment in which to exchange ideas, integrate datasets, share protocols and explore algorithmic approaches. Ultimately, they will be a way to publish the Grand Challenge teams’ research findings to the world and to invite participation from the wider community.
iPlant’s development team will create Discovery Environment mashups for plant biology by:
(1) identifying long-lived data sets that will serve as shared coordinate system frameworks for integrating disparate data sets, and
(2) providing the community with software services that enable the layering of data sets on top of these frameworks, in a distributed, community-controlled manner.
The particular dataset frameworks identified by staff and the CI Advisory Team will depend on the Grand Challenge projects that are chosen by the community, but illustrative examples include annotated genomes, named sets of genes, their aliases and cross-species orthology relationships, phylogenetic trees, protein structures, anatomical descriptions of plant tissues and/or developmental stages, annotated collections of microscopic images, machine-readable descriptions of biochemical or regulatory pathways, and geographical descriptions of species distributions. For example, for the Trait Evolution working group (of the iPlant Tree of Life (iPToL) Grand Challenge project) that requires extensive cross-species gene comparisons, we might build a Homology Registry that allows community members to assert (and dispute) phylogenetic relationships among members of gene families based on different types of evidence such as sequence conservation and synteny, the first iteration of a DE will allow for the upload of trees and trait data and then analyze and report the results using an independent contrasts method.
As another example, a Grand Challenge project that involves dissection of signal transduction pathways might be provided a WIKI-like environment that allows Grand Challenge team members to assemble a comprehensive description of plant G-protein coupled receptor kinases that combines written text with embedded media that show the position of the kinases on several genomes, the evolutionary trees that relate the kinases across species, kinetic models of signaling cascades driven by the family, and a map showing the geographical distribution of allelic variants of plant kinases.
Whenever possible, iPlant’s DEs will be based on existing software products and will be coordinated with groups performing similar work. For example, if a DE requires a common coordinate system based on an ontology, we will use an existing ontology such as the Plant Ontology (Ilic et al. 2007), if feasible, and coordinate the effort with the National Center for Biomedical Ontologies. Likewise, DEs based on a genome assembly and annotation will leverage interfaces developed by existing repositories such as TAIR (Garcia-Hernandez etal. 2002), Gramene (Jaiswal et al. 2006), and NCBI (Wheeler et al. 2007), rather than attempt to replace the functionality of those resources. There are numerous efforts in the bioinformatics and broader web development communities to create mashup systems, several of which would make good foundations for specific DEs, including QEDWiki, the Taverna workflow management system (Oinn et. al. 2004), AJAX GBrowse (for genome-based collaboration), and Galaxy2 (Giardine et al. 2005).

