Learn

Leaflet Issue 10-02

Greetings from the iPlant Collaborative! We’ve been busy beta-testing preliminary Discovery Environments and tools to address the iPlant Tree of Life Grand Challenge Project and making preparations for the iPlant 2010 Conference in Las Vegas, where we look forward to seeing some of you! As always, we welcome your comments or suggestions for our newsletter; please send to feedback@iplantcollaborative.org.

To unsubscribe, please click here (then hit ‘send’, nothing else is required). If this email does not display correctly, copy and paste this link into your browser to view The Leaflet on our website.


An iPlant Sabbatical: Image Analysis to Quantify Plant Development Phenotypes

By Edgar Spalding, University of Wisconsin at Madison, spalding@wisc.edu and B.S. Manjunath, University of California at Santa Barbara, manj@ece.ucsb.edu.


Digital images can capture the development of plant form with high spatial and temporal resolution. The tricky part is extracting quantitative information about specific processes from an image series. This is the essence of image analysis. When successful, the approach has the potential to advance the detection and understanding of mutant phenotypes and natural variation, leading to a quantitative and eventually predictive understanding of how the genotype relates to the phenotype.

iPlant believes making this approach more accessible to the community will have a broad impact on post-genomic plant reseasrch and to this end has fostered a new collaboration between the Phytomorph project (Edgar Spalding-PI) at the University of Wisconsin (http://www.botany.wisc.edu/phytomorph.htm) and the Bisque project (Bio-Image Semantic Query User Environment (B.S. Manjunath-PI) at the University of California – Santa Barbara (http://www.bioimage.ucsb.edu/bisque). Bisque is a web-based platform specifically designed to provide researchers with annotation, visualization, quantitative analysis and organizational tools for 5D image data. Bisque builds upon flexible and hierarchical textual and graphical annotations that allow users to extend both the data model and analyses to adapt the system to their needs. Together these empower researchers to create, develop, and share novel bioimage analyses.

The project’s goal is to combine Bisque’s sophisticated database environment for storing and manipulating biological images and metadata (experimental conditions, etc.) with computer algorithms from the Phytomorph project. The result would allow scientists to automate specific measurements that are typically performed manually (and therefore with much lower throughput and precision) in many important contemporary research projects. The algorithm by itself is only useful to those who have it and the capacity to run it. The database and user environment are most useful if equipped to perform a needed analysis. Putting the two together to create a workflow that takes images and metadata and returns processed results required by the biologist and delivering the tool to the community is the goal of this Bisque-iPlant-Phytomorph collaboration (Fig. 1).

Bisque
Figure 1. Bisque workflow (Click to View Larger)

The first phase is approaching beta-test level. An algorithm for tracking seedling root growth rate and tip angle over time is being integrated as a “mini application” in Bisque (Fig. 2). Gravitropism - yes, “there is an app for that”. The same is being done with an algorithm that quantifies seed shape and size metrics (Fig. 3). In development is an algorithm for monitoring the production, growth rate, and angles of branching root structures as a function of time (Fig. 4).

Fig. 3
Fig 2.
Fig. 2
Fig 3.

Figure 2. The first (left) and last (right) image of a set of maize seedlings undergoing gravitropism. The 3-h time course was captured by acquiring images every 3 min.

Figure 3. Arabidopsis seed size and morphology is quantified by an image analysis algorithm being integrated as a Bisque application. Area, major axis, and minor axis are the main features extracted for each seed in the population. The inset shows enlargements of seeds from two Arabidopsis ecotypes that are parents of a population used to map seed trait QTL with phenotype data produced by the algorithm.

Fig. 4
Fig 4.

Figure 4. An Arabidopsis plant growing on the surface of a vertical agar Petri plate produces new branches and elongates existing root laterals over a 22-h time course captured in images acquired every 15 min.

This is an interdisciplinary collaboration between plant biologists and computer scientists. The principal people in addition to Manjunath, Ambuj Singh (UCSB), Edgar Spalding (Wisconsin), and the iPlant leadership, are Kris Kvilekval (Bisque), Logan Johnson (Phytomorph), Dmitry Fedorov (Bisque), and Nathan Miller (Phytomorph). An interdisciplinary collaboration depends on people committed to the mission, time, and funding. iPlant has been instrumental in bringing together the people, who first met at an iPlant Grand Challenge workshop at Biosphere 2 in 2008. The second was helped by a sabbatical leave provided for Spalding. The third has been achieved by NSF Plant Genome Research Project support of Phytomorph, and iPlant support of personnel, including Spalding’s sabbatical.

 

 

DNA Subway: Fast Track to Gene Annotation and Genome Analysis

By Dave Micklos, micklos@cshl.edu, and Uwe Hilgert, hilgert@cshl.edu, Cold Spring Harbor Laboratory, iPlant Education, Outreach, and Training (EOT)


DNA Subway

The iPlant Collaborative announces DNA Subway, a bioinformatics workspace that makes genome analysis broadly available to biology students and educators. DNA Subway captures the essence of iPlant’s educational goal: to develop computer infrastructure that allows anyone to work with the same data, using the same tools, and at the same time as high-level plant researchers.

Developed by iPlant staff at Cold Spring Harbor Laboratory’s Dolan DNA Learning Center (DNALC), DNA Subway presents complex bioinformatics and visualization tools – predominantly open-source software – in an intuitive and appealing interface. “Riding” different lines, users can predict and annotate genes in up to 100,000 base pairs of DNA (Red Line), and prospect entire plant genomes for specific genes (Yellow Line). Additional lines are being developed to analyze next-generation transcriptome data, and to construct and work with phylogenetic trees. “Ride” DNA Subway now at http://dnasubway.org!

Since its Internet release in March, DNA Subway has attracted over 3,800 visitors, 212 of whom have registered for a user account. Most users chose to ride DNA Subway’s Red Line to annotate DNA sequences.

On April 23-24, iPlant’s EOT group convened the first of 10 faculty workshops to prepare educators to introduce their students to plant genomics using DNA Subway and other online resources. The workshop, conducted at Spelman College in Atlanta, drew 25 faculty participants from Georgia, North Carolina, and South Carolina. The group consisted mostly of biologists, along with five computer scientists and bioinformaticians. Holding workshops at minority-serving institutions is part of our strategy to reach significant numbers of faculty members under-represented in the sciences, who composed 25% of the Spelman workshop participants. Our thanks to the ASPIRE Program (Advancing Spelman’s Participation in Informatics Research and Education) for hosting the workshop.

Check the DNALC web site (http://www.dnalc.org) for a schedule of upcoming workshops on DNA Subway and educational adaptations of iPlant’s Discovery Environments.

Classification and Web Technologies: A User’s Guide

Damian Gessler, iPlant Semantic Web Architect, dgessler@iplantcollaborative.org


John Wilkins (1614-1672), originator of the metric system, co-founded The Royal Society of London, the world’s oldest scientific society. A man of no small ambitions, Wilkins sought to create a universal classification scheme of all knowledge. One hundred years later Diderot, Rousseau, Voltaire, and others attempted something equally grand in their magnum opus of the Enlightenment, the Encyclopédie. Both Wilkins’ efforts and those of French philosophes stand as testaments to a human will to create that is perhaps exceeded only by its inability to deliver. Yet some modern, albeit more focused, efforts in categorization have been resoundingly successful—think Universal Product Codes, International Standard Book Numbers, Digital Object Identifiers, and even the lowly Dewey Decimal Classification. URLs — Uniform Resource Locators, the “hyperlink” addresses you click in your web browser — may be the most influential effort yet.

Biology has its own grand testament in classification in Linnaeus’ Systema Naturae (1735). And here lies the rub. Linnaean taxonomy has come to represent both the success and the pesky frustrations of strict, categorical organization. Classifying life is no easy matter. For Biology, we seek to classify life not solely for classification's sake, but so that we may better organize, allowing us to verify and refute hypotheses, and better construct our cognitive model of Nature.

Advances in web technologies are taking a surprisingly relevant turn for Biology. It may be most productive to think of them as a gradient, from the Wild Wild Web of Folksonomy, to Linked Open Data, to ontologies, to the semantic web and OWL.

Folksonomy utilizes the Law of Large Numbers to allow statistical inferencing across a free-form vocabulary. Good et al. (2009)1 examined Folksonomy in Connotea and CiteULike and found it inferior to the more traditional text-mining indexing approach of PubMed. This was in part due to a lack of sufficient sample size for free-form tagging in science. Folksonomy has value, yet its application is restricted.

The Linked Open Data community takes us an important step beyond tagging (Figure 1). Linked Open Data begins to add the first elementary semantics to tagging. Instead of just associating content with a tag, content and tags are joined into a ‘subject → relationshipTo → object’ statement. The “relationshipTo” is called a predicate. The basis of RDF (Resource Description Framework) lies in the fact that statements are more powerful than tags.

Linked Open Data Community
Figure 1. The Linked Open Data Community (Click to View Larger)

RDF has become the de facto technology for the semantic web in its broadest sense. Perhaps the greatest of its many applications is that RDF allows us to take an arbitrary collection of content and place it into one, big table. The subject is akin to a row in a database table; the predicate is akin to a column header; and the object is akin to the value at the row/column intersection. Fifty years of relational database research told us how to organize globs of unclassified information into neat, clean, interrelated tables; RDF tells us how to organize globs of unclassified information into one, big table.

The Linked Open Data community is already at over 13 billion statements (though not all stored literally in one big table). But there is a price to pay for consolidation. Much of the important implied contextual content is stripped away, leaving something so big and massive, so denormalized and decontextualized, that we never actually use RDF in this manner. RDF simply does not have enough semantics to enable biological data and service integration.

An important advance in the last 15 years has been to construct controlled vocabularies (a finite set of words), and to define relations between those words. Ontologies — systems of terms and their relationships, such as the Gene Ontologies — are an example. In the languages of the web, this is mostly captured by an extension to RDF called RDF Schema (RDFS), which gives us the all-important subsumption, or subclass or subset, relation. So now we may say that Viridiplantae is a subclass of Eukaryota in a manner that even a computer can understand.

Yet, even RDFS is under-powered. The semantic web community is currently focused on a $100 million+ investment made over the last 20 years into a language now called OWL (Web Ontology Language). OWL is endorsed by the W3C, the voluntary sanctioning body of the World Wide Web. OWL gives us powerful semantics. It can be used as a first-order description logic — a formal logic amenable to automated reasoning — in a flavor called OWL DL. In a version called OWL Full, it can allow such expressivity that no reasoner is guaranteed to ever finish reasoning: such is the price we pay for high expressivity.

iPlant is investigating OWL to see how it can be used to enable data and service integration. We know that Biology needs something more powerful than RDFS, but with greater computational guarantees than OWL Full. At iPlant, we are currently building the constructs necessary to see how OWL DL can take us closer to using computers to better "understand" data, and thus enable the use of our vast human knowledge for the benefit of plant scientists.

1Good, B.M., J.T. Tennis, and M.D. Wilkinson. Social tagging in the life sciences: characterizing a new metadata resource for bioinformatics. BMC Bioinformatics 2009, vol. 10 (1) pp. 309 - 313.

Damian Gessler writes a regular column on the Semantic Web in The Leaflet. If you have questions or comments about the Semantic Web that you’d like him to consider in future columns, contact him at dgessler@iplantcollaborative.org.

 


Workshop: Computational Biology for Biology Educators


Contemporary biology is increasingly driven by the computational approaches developed to model biological processes and analyze data. This workshop will be a week-long, hands-on, tutorial on teaching modeling and computational analysis in biology. Modeling and analysis concepts and tools will be introduced using examples from plant biology, biomedicine, biochemistry and systems biology. Faculty will gain experience in dynamic simulation of photosynthesis, ecophysiological modeling of plant flowering time, probabilistic models of molecular evolution, and phylogenetic tree reconstruction. The workshop will take place July 18 - 24, 2010 on the North Carolina Agricultural and Technical State University campus in Greensboro, NC. See the full workshop description for more details. This workshop is a partnership between iPlant, Shodor, the National Computational Science Institute, and North Carolina Agricultural and Technical State University.

Issue 10-02

Tuesday, May 18, 2010


Unsubscribe
If you are currently not a subscriber to the iPlant Leaflet, but would like to be one, please subscribe.
Subscribe


Connect with iPlant at These Upcoming Meetings

American Society for Plant Biologists (ASPB) Plant Biology 2010 Conference, July 31 – August 4, Montreal, Canada.

“The iPlant Collaborative: New Tools for Innovative Plant Biology Research”, August 1, 2:30 PM. A Workshop presented by Dan Stanzione, Matt Vaughn, Sheldon McKay, and Uwe Hilgert. This workshop will provide an overview of the iPlant Cyberinfrastructure (CI), details on two Grand Challenge projects, and iPlant’s education and outreach activities. iPlant is building CI that will include user-centric, configurable “Discovery Environments” (DEs) designed to address Grand Challenge problems, data repositories available to the community, new and re-factored tools optimized for growing demands, and developer toolkits to harness the power of the underlying system. iPlant is currently building CI in support of two Grand Challenges. iPlant’s Tree of Life (iPToL) project seeks to enable construction of phylogenetic trees for up to 500k species of green plants, to enable the dissemination of data associated with large trees, visualize large trees and to implement scalable "post-tree" analysis tools to foster integration with other sciences. The iPlant Genotype to Phenotype (iPG2P) project seeks to support an analytical process that allows one to begin with a trait of interest in a species possessing limited genetic resources and progress towards the ability to predict trait scores for known genotypes in given, non-constant environments. iPlant’s CI will also serve as the foundation for development of educational software, in which students can use the same tools and data resources that are available to professional scientists.


For More Information…

Visit us at http://www.iplantcollaborative.org or contact Steve Goff, Project Director, sgoff@iplantcollaborative.org.

NSF The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#EF-0735191).
This email was sent to [email]
Click here to unsubscribe.