Challenge

Statistical Inference

Challenge: Phenotypic variation is largely quantitative, polygenic, and controlled by the interaction of genes and environment. Both genomics and phenotyping systems are generating more data than can be interrogated on local systems or by non-expert laboratories.

The Statistical Inference Working Group identified and prioritized general classes of statistical genetics methods that will be supported by iPlant. These include General Linear Models, Mixed Models, Machine Learning, and Bayesian approaches. General Linear Models (GLMs) are being addressed first since they are most pertinent to the widest cross-section of plant biologists. The iPlant team has developed a multiple SNP forward regression version of general linear modeling and improved the performance of single SNP forward regression on graphics processing units (GPUs). In the multiple-GPU version of the code, the software will be specifically optimized to take advantage of the GPU features on the Texas Advanced Computing Center (TACC) computing cluster. Future work from the Statistical Inference group will include solutions on how to view and explore the large (2.5E+6 points) multidimensional data sets emerging from genetic association studies as well as how to make the results of such analyses more accessible to the general research community.

Working Group Members

Name Role Institution
Dan Kliebenstein Working Group Co-Lead University of California, Davis
Ed Buckler
Working Group Co-Lead Cornell University
Barb Stranger Collaborator Harvard University
Chris Myers Collaborator Cornell University
Liya Wang Collaborator iPlant Collaborative, Cold Spring Harbor Laboratory
Bindu Joseph PostDoc University of California, Davis
Peter Bradbury Collaborator Cornell University
Jean-Luc Jannink Collaborator  Cornell University
Weijia Xu Collaborator iPlant Collaborative, The Texas Advanced Computing Center