Greetings from the iPlant Collaborative! The summer conference season has been a busy one for iPlant, with presentations, posters, and demonstrations of the Discovery Environment, Atmosphere, and stand-alone tools at BSA, Arabidopsis, ASPB, TeraGrid, and other annual meetings. The feedback and input received at these conferences is invaluable and fuels our discussions and planning as we work to continuously improve features, functionality, and the stability of the iPlant cyberinfrastructure, and keep apace with our outreach efforts.
As always, we welcome your comments or suggestions for our newsletter; please send to feedback@iplantcollaborative.org.
iPlant Releases a New Version of the Discovery Environment
By Eric Lyons, iPlant Sr. Scientific Developer (elyons@iplantcollaborative.org)
The iPlant Collaborative is pleased to announce the new release of the Discovery Environment (DE). Version 0.4 is the first release of the DE that features easy integration of new tools and data by any user, and includes improvements that bring to the DE a higher degree of integration between iPlant's data management system, compute resources, and the diverse set of analytical tools contributed by the biological research community. This release is the culmination of six months of work by iPlant's Core Software, Core Services, and High-Performance Computing teams in conjunction with iPlant's faculty and postdoctoral collaborators from around the country. We thank all involved for their great work on this release.
As a broad overview, the DE creates a unified system for managing, analyzing, visualizing, and sharing data. While useful by itself, the DE also provides examples for how programmers building their own set of systems to manage and analyze data can benefit from and use various iPlant technologies and resources.
Expanded Analytical Capabilities
A major challenge in biology is the large diversity of analytical tools and programs available for processing and analyzing data. The tools are often not interoperable with one another and require learning a unique set of commands to use each tool. In addition, different tools have different compute resource requirements to run an analysis. One of the most exciting features of the new DE is it has become much simpler for any user to add a new tool to the Discovery Environment. As a result of this new capability, you will see an explosion in the number of command-line analytical tools available. On the first day of release, this version has more than 50 additional tools not in the previous version, and this list will grow rapidly.
 |
Improvements in the GUI for iPlant's Tool Integration Tool (TiTo). Click to enlarge |
To facilitate the integration of tools, a new system ancillary to the DE has been developed. This system, called Tito, is the DE's Tool Integration Tool. Tito enables researchers to integrate tools through a three-step process. The first step requires deploying the tool on iPlant's compute nodes using a simple web-form to describe the tool and where it may be obtained. iPlant's Core Services team then retrieves the tool and installs it on iPlant's cluster computing resources. The second step creates a graphical user interface (GUI) for using the tool in the DE. This is the most involved part of this process and is the core feature of Tito's intuitive web-based system. During the process of creating a GUI for the DE with Tito, the nascent interface may be saved, previewed, and can be updated in the future. The third step is to write documentation on how to use the tool, what the various options do, and how to find additional information. The principal idea is to enable researchers to integrate the tools of their choosing into the DE as quickly as possible, and share them with other scientists as they see fit.
In addition to the ability to integrate tools that run on standard compute clusters, Version 0.4 also provides the ability to run tools on the Texas Advanced Computing Center's (TACC) supercomputers. This process requires working with the iPlant team located at TACC in order to make sure a given tool is well-suited for running on a supercomputer. To facilitate integrating, accessing, and running these tools, the TACC team has developed a set of programming resources to make the process straightforward. This interface is part of iPlant's Foundational API, which is a suite of RESTful services. By using the Foundational API, simple wrapper programs may be written to run tools at TACC.
iPlant's Data Store
The iPlant Data Store is a distributed system for the general management of data. The Data Store can be used for any type of data of any size, and can be used within the DE or behind any web based tool. The Data Store is built upon highly redundant, high performance storage arrays, geographically replicated between the University of Arizona and the Texas Advanced Computing Center (TACC) in Austin. The Data Store makes use of the software package iRODS (an NSF-funded project from the University of North Carolina at Chapel Hill; www.irods.org). iRODS provides a scalable and distributed system for storing data files, a framework for describing and locating data, a set of protocols for maximizing the transfer rate of very large sets of data, and a diversity of tools to access those data. In practical terms, this creates a unified system through which all of iPlant's resources store, retrieve, and manage data. In addition, there are various web-based, stand-alone, and command-line driven tools to access iPlant's data store. This data store is now the data management system for the DE. As a result, you can deposit data in your iPlant data store through a variety of means, and those data are immediately accessible for downstream processing and dissemination from within the DE.
Other Improvements
In addition to these fundamental changes to the DE's integration of data and analytical resources, there are numerous improvements to the DE's user interface, backend services, and overall system stability. Version 0.4 of the DE is a major milestone for iPlant. It creates an integrated environment for researchers integrating tools and data, with an unprecendented degree of scalability. While extensive documentation and tutorials exist for using the DE and other iPlant resources, please feel free to contact the iPlant team with any questions you may have, or for help on integrating new tools, learning how to access iPlant's data store, and any problems you find in the system. We invite you to explore the DE 0.4 as we work towards the version 1.0 milestone and continue to address the cyberinfrastructure needs of the plant research community.
iPlant-PhytoBisque Code Sprint
By Nirav Merchant, iPlant Technology Advisor (nirav@email.arizona.edu) and Martha Narro, iPlant Sr. Project Coordinator (narro@email.arizona.edu)
To enable advances in the understanding of relationships between genotypes and phenotypes, access to efficient and scalable image analysis capabilities are essential components of the analysis platform. iPlant is fostering the development of a unified platform that allows integration of multiple tools and algorithms that facilitate analysis of high throughput imaging data from phenotypes. PhytoBisque is an iPlant-supported collaboration that brings together the Phytomorph project led by Edgar Spalding, which is developing image analysis algorithms for plant biology (University of Wisconsin, Madison, http://www.botany.wisc.edu/phytomorph.htm) and the Bisque project (Bio-Image Semantic Query User Environment) led by B.S. Manjunath, which is developing a web-based image analysis platform (University of California, Santa Barbara, http://www.bioimage.ucsb.edu/bisque). This report follows up on the initial work on PhytoBisque that was described in an earlier issue of The Leaflet.
 |
PhytoBisque Code Sprint participants, l – r: Seung-jin Kim, Andrew Predoehl, Kyle Simek, Steve Gregory, Nate Miller, Dmitry Fedorov, Utkarsh Gaur, Kris Kvilekval, Edwin Skidmore, and Sangeeta Kuchimanchi.
Click to enlarge |
PhytoBisque provides a sophisticated web-based image analysis platform that benefits both developers of image analysis algorithms and plant biologists who have image datasets in need of analysis. PhytoBisque leverages the iPlant cyberinfrastructure (CI) by utilizing the unified iPlant Data Store, computational grid and user authentication systems. Developers can integrate their algorithms into PhytoBisque and have immediate access to image datasets made available by collaborators which can be used to test and further refine the algorithms. For developers, working in the PhytoBisque environment allows them to focus on the image analysis algorithm development while letting PhytoBisque provide the web application environment, complete with authentication, an underlying image database, rich annotation and collaboration tools. Integrating an algorithm into PhytoBisque provides a fully functional web-based application for the community members to explore with their own data sets without having to install any software. For scientists, analyzing image datasets using PhytoBisque provides easy access to a wide choice of algorithms to determine which best meets the needs of a particular problem.
iPlant hosted a 'code sprint' on June 14 - 16, 2011, in Tucson, Arizona, to bring together the PhytoBisque team and image analysis algorithm developers to streamline the process of integrating new algorithms into PhytoBisque. The group also worked closely with iPlant staff to improve the integration with iPlant CI. The team was able to integrate two new analysis algorithms and identify steps for improvement, which will be implemented in the upcoming release of PhytoBisque in September 2011. The group also worked on integrating Graphical User Interface (GUI)-based image analysis algorithms developed in MATLAB into Atmosphere (iPlant's cloud CI) – there are now over five "early adopter" research labs utilizing this for image analysis. Stay tuned for a detail update on the upcoming release of PhytoBisque and the new infrastructure for high throughput image analysis in our next issue.
Fulfilling iPlant's Open Source Code Commitment
By Matthew Helmke, iPlant Sr. Technical Documentation Specialist (mhelmke@iplantcollaborative.org)
From the beginning, iPlant has had a commitment to release all code that we produce with an open source software license. This commitment is enshrined in our cooperative agreement with the National Science Foundation (NSF), which states that "the Discovery Environment (DE), software tools and systems, novel data sets and the like developed under direct project funding will be open source and will be made openly available for reuse and repurposing, with attribution." iPlant's goal in open-sourcing its code is that 'enabling technology,' such as the DE, will allow anyone to perform analyses, make discoveries that may be patentable or make products that are commercializable, and can be continuously used, improved, and extended by the community.
Since the initial release of the first iPlant tools in 2010, some iPlant code was released to the public, but not all. Some of the code lacked the maturity or documentation to make a useful or public release. To make the process more professional and clear, as well as easier to accomplish, iPlant has worked to standardize and clarify our open source procedures. 
The initial step involved choosing a specific software license to use that would achieve our goals. The ideal license would be familiar to the scientific community, easy to understand, and give the maximum amount of freedom to the community to use and improve. Working with the Office of Technology Transfer at iPlant's primary host institution, the University of Arizona, iPlant selected the well-known BSD License, an Open Source Initiative-approved license (http://www.opensource.org/licenses/index.html). The full text of iPlant's license can be found at http://www.iplantcollaborative.org/sites/default/files/iPLANT-LICENSE.txt. Next, iPlant developed a set of guidelines for correctly applying that license to our software.
Once the process was designed, pieces of software could be selected for release to the public repository in GitHub. In preparation for release, each piece of software requires installation and use instructions to be written to make it useful to others. This documentation has been created out of private, internal iPlant developer notes that have been sorted, queried, tested, and distilled to produce README files for every piece of released software as well as INSTALL files for those with complex installation procedures. This procedure is now in place and executed before any code is released.
To date, iPlant has released code for 10 software products. Software likely to be of immediate interest to our community includes the source code for Atmosphere, iPlant's cloud computing resource platform, and source code for Taxonomic Name Resolution Service, a tool for resolving conflicting taxonomic plant species names and/or their spelling. In fact, the FishBase project has already expressed interested in using the modifications that iPlant has made to the Taxamatch elements of the TNRS code base for their project.
More code is forthcoming with each release of the DE and the open source process will continue until all of iPlant's source code is licensed and released to the public, fulfilling our commitment to NSF and to the community we serve.
To view or access iPlant's open source code, visit the open source page on our website.
The Integrated Breeding Portal: Cutting-Edge Breeding Technology Services to Help Feed the World
By Stephen Mock, iPlant Research Engineer/Scientist Associate (mock@tacc.utexas.edu)
The Generation Challenge Programme (GCP) of the Consultative Group on International Agricultural Research (CGIAR) is coordinating the development of the Integrated Breeding Portal (IBP) in tight collaboration with partners from CGIAR Centers, National Programmes and the iPlant Collaborative team at the Texas Advanced Computing Center. Initiated in September 2009, the IBP is conceived as a public, sustainable, web-based, one-stop-shop for information, analytical and decision support tools and related services to design and carry out integrated breeding projects. One of the main objectives of the IBP is to boost crop productivity and resilience for smallholders in marginal environments by exploiting the economies of scale afforded by collective access to cutting-edge breeding technologies and informatics hitherto unavailable to breeders in developing countries.
In the last decade, a 'gene revolution' led by the private sector has boosted crop productivity in developed countries by applying and combining the latest advances in molecular biology and information technology with reliable plant phenotyping. In contrast, the adoption of molecular breeding is still limited in the public sector, and hardly used at all in developing countries. Major bottlenecks in these countries include a shortage of well-trained personnel, inadequate infrastructure, and lack of information systems with adaptable analysis tools.
The IBP will deliver support services to guide and train breeders in developing countries to access and use marker technologies. Supporting these communities in the practice of molecular breeding for the most important food security crops will be critical for the adoption of modern breeding technologies in developing countries and the development of local infrastructure to improve plant phenotyping and appropriate and targeted capacity building. Through these efforts the IPB will be a key part of a global strategy on food security and the alleviation of poverty.
The IBP is building on 14 initial use cases, which are breeding projects for eight crops across 32 developing countries in Africa and Asia. This ensures that the IBP's development is driven by real breeder needs and its interface is user-friendly to the community it will serve. Breeding tools and services will be progressively deployed on the IBP portal as they are completed, with the first integrated configurable workflow to be ready by June 2013.
The IBP initiative is mainly funded by the Bill & Melinda Gates Foundation, with additional financial support from the UK Department for International Development (DFID) and the European Commission.
BrachyBio!
By Mary Margaret Sprinkle, SciPlant Team Member and iPlant Special Assistant (marys@iplantcollaborative.org)
BrachyBio! is an engaging, authentic science experiment for middle and high school students. Once implemented, BrachyBio! will generate a rich database of mutant phenotypic data for scientists studying Brachypodium distachyon. Led by Tom Brutnell, Tiffany Fleming, and Camilo Rosero at the Boyce Thompson Institute for Plant Research (BTI), the project aims to engage teachers and students in genetic analysis and to introduce them to concepts related to bioenergy. For Brachy scientists, it's a way of crowdsourcing genetic analysis. To broaden the project's reach, BTI and the iPlant Collaborative have built a web-based gateway for reporting, querying, and sharing data from BrachyBio! experiments. According to Brutnell, "The goal is two-fold. It is to engage students in authentic research and it is a way of providing the genetics community with a valuable resource. We are really challenging the way we approach genetics with this project. It is a student-led initiative that will have immediate impact for the scientific community."
 |
Mutant of the Month, a BrachyBio! feature not found anywhere else.
Click to enlarge |
Students who participate in the BrachyBio! project will perform phenotypic screens on chemically mutagenized populations of Brachypodium that have never been screened or characterized. Thus, all results are novel and of potential value to the scientific community. Students will enter data using a web-based data entry portal developed by Dave Parizek and Jill Yarmchuk at iPlant.
Using the portal, teachers will create user accounts for their students, view observations and run reports for multiple classrooms, and enter environmental conditions for plant growth conditions. Students can log in using a uniquely created user ID, enter observations, and upload photos of mutants. Researchers can sort the data by a variety of filters including family and phenotype, run reports and view graphs, and download the search results in multiple file formats. Web forms are now available for ordering new seed, requesting information, and providing feedback. To broaden student appeal and community engagement, the site features a "Mutant of the Month" profile, an extensive mutant library, and a blog. Brachy scientists will contribute to a guest blog on current research topics and challenges of their research.
These improvements reflect a major upgrade to the project. In the pilot version, students recorded information on sheets, which was collected by teachers and entered manually on a spreadsheet. The spreadsheet was then sent to BTI and uploaded to the database. Researchers were not able to query the data sets. All this handling delayed dissemination and compromised data integrity. With the new interface, students will be able to directly upload datasets that can be queried by the scientific community.
The beauty of the overall design of the new BrachyBio! site is that it can be easily transitioned to other plant species and even other organisms. Nirav Merchant, Technology Advisor for iPlant has this to say, "We are developing a flexible platform which facilitates citizen science projects by streamlining the user and data management tasks for educators and researcher. Our emphasis is to provide ease of access to the data being produced." BrachyBio! is an excellent model for education, outreach, and community involvement; and it represents an increasingly prevalent way for conducting research.
Summer Conference Highlights: What a Difference a Year Makes!
iEvoBio'11, June 21 – 22, Norman, OK. iPToL Engagement Team Analyst Naim Matasci reports: "Overall, this conference was a big success for iPlant; I got very positive feedback for both iPToL and TNRS presentations. The main message I wanted to convey, that iPlant is more than just the Discovery Environment, got through. The fact that we are rolling out services and tools had a positive impact and going from a few tools to more than two dozen in less than three days impressed many people, thanks TiTo! And TNRS was a huge success too and generated a lot of interest – Jamie Estill pointed out that, for once, it's the zoologists who are envious of the plant science resources! TNRS' appeal to the wider community (queries about applying the underlying algorithms to fish, butterflies, and porifera) will favor its widespread adoption and longterm sustainability."
International Conference on Arabidopsis Research, June 22 – 25, Madison, WI. iPlant Special Assistant Vicki Bryan reports: "Overall, there was a lot of interest in iPlant's tools, particularly Atmosphere, which people immediately recognize the value of for their research."
Botany 2011 (BSA), July 9 – 13, St. Louis, MO. iPlant Sr. Project Coordinator Martha Narro reports: "The response to TNRS (Taxonomic Name Resolution Service) was great! We're already using the feedback we received to improve TNRS. I realized that iPlant should contact herbarium collections managers to get the word out, so they can integrate TNRS into their web services. There was also interest in iPlant's GIS/Biogeographic and data management resources now in development, which helps us prioritize features and tools as we go forward."
TeraGrid/XSEDE11, July 18 – 21, Salt Lake City, UT. Core Software Team Leader Andy Lenard reports: "I was pleased to see that XSEDE is making their services and compute resources more accessible by providing RESTful interfaces to them. That was the highlight of the meeting for me." Sriramu Singaram, iPlant Core Services Support Analyst reports: "It was exciting to discover the myriad projects that were collaborating to leverage Teragrid's HPC resources for boosting scientific research. The level of enthusiasm was literally palpable at the conference. iPlant's Atmosphere cloud service aroused keen interest among several researchers and the FutureGrid community as well."
Plant Biology 2011 (ASPB), August 6 – 10, Minneapolis, MN. iPlant Assistant Director Eric Lyons reports: "The response to the iPlant Collaborative at ASPB 2011 was phenomenal. When I joined the project about a year ago, I attended meetings presenting iPlant's vision and sharing prototypes of what we hoped to accomplish. Most of my subsequent conversations were spent defending iPlant, explaining why the scale of this project was necessary, and trying to convey the benefits of the tools we would provide. At ASPB this year, there was a marked difference in the community's response to iPlant. We now have products for them to use that address a diverse set of data management and analysis needs. The community understands what we are doing, why it is needed, and they are very excited to begin using the technology and infrastructure developed by iPlant. Scientists are running into major limitations with the amount of storage and computing resources they need, and iPlant has solutions. Importantly, several different segments of the community are excited about iPlant's technology, from senior PIs to assistant professors to post-docs and graduate students. I am particularly excited by the enthusiasm of scientists in the early stages of their careers as these are the people who are creating new research agendas. As iPlant's resources help them, their research will help inform the next set of technologies iPlant will develop. Ultimately, it is seeing iPlant's resources used by people to get their research done as quickly and as easily as possible that is particularly rewarding. Nothing iPlant creates has meaning unless it is used by the community."
A New Model for Semantic Search
By Damian Gessler, iPlant Semantic Web Architect (dgessler@iplantcollaborative.org)
Google the word "gene" and you will get links to web pages on biological genes. But you will also get a link to Gene Simmons, the famous rock star, and—as of this writing—a Los Angeles Times article published "today" mentioning Gene Smith, The Ohio State University athletic director. Given the lack of context on the sole token "gene", Google does a surprisingly good job of delivering what it anticipates you want.
Semantic web technologies seek to address this and other challenges by allowing data and service providers to "mark-up" their offerings with a more formal semantics (semantics: from the Greek sēmainō "to mean"; sēmantikos "significant"). More semantics means more context, with the hope that eventually sufficient context can be determined to free search engines from the constraints of lexical token matching and web page popularity voting. With a rich semantics, one could express how data are related, how they are related to various services, and vice-versa. However, the proliferation of independent data and service offerings means that currently, the aggregation of these resources is more of a Tower of Babel than an integrated network. An example is Linked Open Data, where over 25 billion data statements are available for search and extraction, yet the meaning of any resultant data set remains highly idiosyncratic.
This is a challenge for iPlant but we have a solution. iPlant is using SSWAP (Simple Semantic Web Architecture and Protocol) to establish a means whereby anyone can host a semantic web service that is amenable to discovery, invocation, and response handling for arbitrary data. SSWAP is an independent, NSF-funded technology specifically designed to work in this environment (the author is the Principal Investigator for SSWAP). The idea behind SSWAP is to address the impedance mismatch between the idiosyncrasies of various data and services by allowing the use of independent, shared and publicly available ontologies in a semantic web services framework.
The critical architectural characteristic of SSWAP is that a service's description is both necessary and sufficient for its discovery, and that with the addition of valid input data, it is necessary and sufficient for its invocation and response. The key to solving the search challenge is to recognize that once a service has told you the type of data it accepts, it must, under the rules of a formal semantic, accept any subclass of that data. Similarly, a service that returns a type of data necessarily returns superclasses of the type (e.g., data of the Plant Ontology class meristematic cell is necessarily of the superclass plant cell because all meristematic cells are plant cells). While all these relations may not be known explicitly, they can be derived by a reasoner suitably empowered to canvass the web and resolve ontology terms. This is the approach we are using to create a new World Wide Web semantic search model. The model allows one to find all services that are of the type of service sought, that are (logically) guaranteed to operate on the data given, and that return data at least as specific as the requested type.
In 2010 iPlant developed two SSWAP Application Programming Interfaces (APIs) to allow data providers to use the protocol. This year, iPlant is using the APIs to develop the platform to support discovery and invocation of resources using the semantic search characteristics of the protocol. This work will be released later this year. To accomplish this, we use on-demand, transaction-time reasoning to determine the subclass and superclass relations. This means that data and service providers and consumers do not need to know the explicit relations between the data they have and the services they want; they do not need to agree to use the same ontologies. As long as there is some logical relation between terms that a reasoner can infer, it will use those deductions to construct the matchmaking between data and services: just query with the data types you have and the semantic reasoning engine will discover those services that either accept that type of data—or if you wish—return that type of data. In practice, we limit the reasoner's scope of relations to account for network latency and other real-world factors. Thus we do not guarantee zero false-negative and false-positive rates. But the result is a practical platform where semantic search occurs truly on the semantics of the data and services, in a manner aimed at service discovery and invocation, for the ultimate goal of data and service integration.
As we roll out this technology in 2011 and early 2012, we will be engaging the community to aid in semanticizing its offerings. For more information on iPlant's use of SSWAP, see our web page at http://www.iplantcollaborative.org/discover/semantic-web.
Damian Gessler writes a regular column on the Semantic Web in The Leaflet. If you have questions or comments about the Semantic Web that you'd like him to consider in future columns, please contact him at dgessler@iplantcollaborative.org.
iPlant Post-Doctoral Profile: Barbara Banbury
Barbara Banbury is an iPlant post-doctoral researcher working in Brian O'Meara's lab at the University of Tennessee in Knoxville, TN. Originally from Detroit, MI, Barbara attended the University of Kansas for her undergraduate training, the University of Missouri-Rolla for her Masters, and Washington State University for her Ph.D. Barbara's dissertation research focused on using phylogenies to study the evolution of traits. More specifically, she focused on how patterns of biodiversity are shaped by morphological diversification and rates of lineage accumulation. Working on several animal empirical systems (frogs and fish), she examined both simple and complex traits (where several traits work together to create an emergent property).
 |
iPlant post-doc Barb Banbury Click to enlarge |
Barbara joined iPlant's Trait Evolution working group in June 2010, as the group's goals closely align with her own research interests and background. Since joining iPlant, her research has focused on evaluating existing comparative methods and developing new methods. One very exciting new method she is currently working on in collaboration with Brian O'Meara is called TreEvo, which uses approximate Bayesian computation (ABC) for comparative methods. TreEvo, written in R, will allow users to explore new models of trait evolution that have never been available before (see http://www.brianomeara.info and http://barbbanbury.info for more information). Barb plans to make TreEvo available to the wider community soon, when she integrates it into a future release of the Discovery Environment.
In June, Barbara visited the University of Arizona to be part of an internal workshop to test iPlant's new tool integration software (TiTo). While in Tucson, she learned how to use TiTo to integrate new comparative method software into the Discovery Environment. "TiTo is a great step forward, because it gives researchers the ability to use or implement their own tools and utilize the high-performance computational services that iPlant offers," she said.
Barbara's interactions with iPlant have strengthened her future career direction. She has formed new collaborations with researchers across the country, and several interactions have led to collaborative research papers. "The more of these types of interactions I have, the better a researcher I will become," said Barb.
|