| Challenge | Discover | Learn | Connect |
Data Assembly and Integration
Challenge: Disparate molecular sequence datasets are currently organized in multiple formats in multiple locations. Only a small fraction of the data needed to assemble very large phylogenetic trees is in widely available databases such as GenBank. Data that could be used for construction of trees are often not broadly accessible or needs proper pre-processing for inclusion in phylogenetic analysis.
Leveraging many years of collaboration in the plant phylogenetics community, the Data Assembly and Integration Working Group organizes workshops and outreach activities to bring together data providers to discuss strategies for assembling large-scale sequence data sets for plants to be used by the tree reconstruction team. An equally important part of this process is ensuring orderly integration and interoperability of the data.
Some of the key activities of this group include:
- My-Plant.org, is a scientific networking web site designed to bring together plant scientists to discuss and organize information about plant taxa with an ultimate goal of to bringing in data to iPToL. My-Plant uses a phylogenetic tree metaphor to organize species and scientists with related “clades”, each of which has a volunteer manager who moderates activities for that clade.
- Data intake. The data assembly working group is implementing data intake pipelines using compute resources in order to standardize and scale the assembly of sequence data into character matrices suitable for large-scale phylogenetic analysis. The PHLAWD intake pipeline, developed by Stephen Smith, has been implemented and work is underway to implement a generalized sequence intake pipeline developed by Gordon Burleigh.
- Perpetually updating tree. The task of assembling the tree of life for all green plants is an incremental one. Although trees in excess of 50,000 plant species are now a reality, as new data come in, scientists such as Alexis Stamatakis and Casey Dunn are planning for a perpetually updating alignment and phylogenetic tree creation on an ongoing basis.
Working Group Members
| Name | Role | Institution | |
|---|---|---|---|
| Douglas Soltis |
Working Group Co-Lead |
University of Florida | |
| Pamela Soltis | Working Group Co-Lead | University of Florida | |
| Michael Donoghue | Collaborator | Yale University | |
| Val Tannen | Collaborator | University of Pennsylvania | |
| Gordon Burleigh | Collaborator | University of Florida | |
| Casey Dunn | Collaborator | Brown University | |
| Sheldon McKay | Scientific Lead | iPlant Collaborative, Cold Spring Harbor Laboratory | |
| Steve Mock | Team Lead, My-Plant | iPlant Collaborative, Texas Advanced Computing Center | |
| Matthew Hanlon | Developer, My-Plant | iPlant Collaborative, Texas Advanced Computing Center | |
| John Cazes | Developer | iPlant Collaborative, Texas Advanced Computing Center | |
