
The Carolina Center for Exploratory Genetic Analysis
The Carolina Center for Exploratory Genetic Analysis (CCEGA) is developing an interdisciplinary cyberinfrastructure to identify the complex genetic traits that underlie human diseases, bringing together data from clinical studies, population studies and model systems. Funded by the National Institutes of Health, CCEGA is a collaboration among RENCI the UNC departments of Biostatistics, Genetics, Epidemiology, and Computer Science, the schools of Pharmacy and Information and Library Sciences, and the Health Sciences Library.
The Carolina Center for Exploratory Genetic Analysis (CCEGA) is developing an interdisciplinary infrastructure to identify the complex genetic traits that underly human diseases, bringing together data from clinical studies, population studies and model systems. CCEGA believes the next breakthroughs in our understanding of biology and disease will be made possible by the integrated analysis of genetic data and its expression as phenotypes. CCEGA work centers on enabling this kind of multidisciplinary, multi-investigator research. The center involves three complementary groups of scientist at the University of North Carolina at Chapel Hill: (a) experimental geneticists, (b) quantitative experts in statistics and biostatistics, and (c) computer scientists with expertise in algorithm development, software construction, and high-performance computing.
Phase one of CCEGA focuses on building a community of investigators and deploying a prototype infrastructure for analyzing relationships among genotypes and phenotypes in three contexts:
The RENCI Contribution
To accommodate the diverse, multi-investigator databases necessary to answer these complex questions, RENCI is working with scientists to develop a prototype, extensible data model and provide access to data via a portal constructed using the Open Grid Computing Environment toolkit. The newest methods of integrated data analysis will be incorporated into a portal-based workflow. These include new techniques in linkage analysis (oligogenic analysis, multivariate linkage analysis, epistasis, and genotype by environment interaction), subspace clustering, and association analysis (quantitative trait and nucleotide analysis).
RENCI and its scientific partners also are exploring new visualization techniques for examining and interacting with large data sets and high performance computing for implementing computationally intensive analysis techniques. To reduce the barriers between data providers and data analyzers, CCEGA and RECNI conducts intensive, specialized workshops, colloquia and intramural meetings.
Funding
National Institutes of Health/National Center for Research Resources, Grant Number 5-P20-RR020751-01-02
Publications
Fred A. Wright, Hanwen Huang, Xiaojun Guan, Kevin Gamiel, Clark Jeffries, William T. Barry, Fernando Pardo-Manuel, Patrick F. Sullivan, Kirk C. Wilhelmsen, and Fei Zou. Simulating Association Studies: a Data-based Resampling Method for Candidate Regions or Whole Genome Scans (accepted for publication in Bioinformatics), 2007.
Presentations
Introduction and Context Dan Reed Chancellor's Eminent Professor Vice-Chancellor for Information Technology and CIO Director, Renaissance Computing Institute (RENCI)
Workshop Format Kirk Wilhelmsen, Department of Genetics
Addiction Family Study Kirk Wilhelmsen, Department of Genetics
Strong Heart Kari North, Epidemiology
Diabetes, Fusion Karen Mohlke, Department of Genetics
CATIE (Clinical Antipsychotic Trial of Intervention Effectiveness), Schizophrenia Pat Sullivan, Department of Genetics
Cystic Fibrosis Mike Knowles, Department of Medicine
Cancer Epidemiology Bob Millikan, Epidemiology
Head and Neck EpidemiologyAndy Olshan, Epidemiology
Renal Disease Gene Expression Ron Falk, Department of Medicine
ELSI/Prospective Studies Jim Evans, Department of Genetics
Introduction NIH Site Visit, May 4, 2005
Linkage analysis / family-based association studies Kori North, Epidemiology
Model system for evaluation of data mining techniques Susan Paulsen, Computer Science
Subspace clustering methods Wei Wang, Computer Science
Visualization of high-dimensional data Leonard McMillan, Computer Science
Complex phenotypes: schizophrenia and ventricle morphology Guido Gerig, Psychiatry and Computer Science
Realistic simulation of genotypes Fred Wright, Biostatistics
Genetics viewpoint Pat Sullivan, Genetics
Introduction Dan Reed Chancellor's Eminent Professor Vice-Chancellor for Information Technology and CIO Director, Renaissance Computing Institute (RENCI)
Project Overview Kirk Wilhelmsen, Department of Genetics
ELSI Working Group Jim Evans, Department of Genetics
Informatics Working Group Brad Hemminger, Information and Library Science
Analysis Working Group Jan Prins, Department of Computer Science
NIH Roadmap Program Greg Farber, NIH
Introduction Kirk Wilhelmsen, Department of Genetics
Data Modeling, Informatics Working Group Brad Hemminger, School of Information and Library Science
Realistic Simulation of Genotypes Fred Wright, William Barry, Department of Biostatistics
Random Forest on a Culled Set of SNPs Susan Paulsen, Jan Prins, Department of Computer Sciences
Preliminary Statistical Analysis of Bakeoff Data Fei Zou, Seunggeun Lee, Department of Biostatistics
Analysis of Simulated Genetic Data Based on Goodness of Fit Chi-square Test Alex Tropsha, Alexander Golbraikh, School of Pharmacy, Steve Marron, Department of Statistics
Bakeoff Summary Fred Wright, William Barry, Department of Biostatistics
Partners
Working Groups
There are three working groups that meet every week to have discussions and
presentations on specific topics.
| PCaP-Epidemiology Specimen Tracking System | Roger Akers, Feb. 17, 2005 |
| Lab Data Management Systems | Kirk Wilhelmsen, Feb. 24, 2005 |
| Demo of Lab Data and Clinical Data Management Systems | Kirk Wilhelmsen, Mar. 3, 2005 |
| Generalized Model (Modeling Genetics and Proteomics studies) | Brad Hemminger, Mar. 10, 2005 |
| Review Draft Model | Brad Hemminger, Mar. 24, 2005 (meeting minutes). |
| Review Draft Model | Brad Hemminger, Mar. 31, 2005 (meeting minutes). |
| Review Draft Model | All, Apr. 7, 2005 (meeting minutes). |
| BSP (BioSpecimen Project) Facility | Peter DeSaix, Paul Brown, May 19, 2005 (meeting minutes). |
| Knowles lab and their databases related to cystic fibrosis (CF) | Mike Knowles, Hemant Kelkar, Annie Xu and David Fargo, May 26, 2005 (meeting minutes). |
| Identify genes that influence cardio vascular disease | Kari North, Jun 2, 2005 (meeting minutes). |
| Data management issues of Melanoma project | Dennis Simpson, July 7, 2005 (meeting minutes). |
| Compare and merge database schemas of UNC labs | Offsite meeting at the Friday Center, Nov 8, 2005 |
| Review initial draft of the schema for the common data model | March 24, 2006 (meeting minutes). |
| Linkage Analysis | Ethan Lange, Feb. 3, 2005 |
| Family Based Association Studies | Kirk Wilhelmsen, Feb. 8, 2005 (summary). |
| A Model of Genetic Data | Fred Wright, Feb. 22, 2005 (summary). |
| A Model of Genetic Data | Fred Wright, Mar. 8, 2005 (summary). |
| Survey of Data Mining Techniques for Genotype-Phenotype Association Studies | Pat Sullivan (slides) and Susan Paulsen (slides), Mar. 22, 2005 (summary). |
| Survey of Data Mining Techniques for Genotype-Phenotype Association Studies | Pat Sullivan (discussion), Mar. 29, 2005 |
| Quantitative Genotype Phenotype Relationships (QGPR): Can we learn from Quantitative Structure Activity Relationships (QSAR) modeling? | Alex Tropsha (slides), Apr. 5, 2005 (summary). |
| Application of neural networks to find fucntions of selected, weighted combinations of measurable parameters | Clark Jeffries (slides), Apr. 12, 2005 (summary). |
| Microarray Data and Analysis | Charles Perou (slides), Apr. 19, 2005 (summary). |
| Classification Accuracy Criteria as Target Functions in QSAR | Alexander Golbraikh (slides), Apr. 26, 2005 |
| Genotyping, (slides), (slides and audio) | Bob Millikan, Apr. 19, 2005 |
| XML, (slides), (slides and audio) | Barrie Hayes, Apr. 19, 2005 |
Daniel A. Reed, Renaissance Computing Institute
Dan_Reed@unc.edu 919-966-1585
Daniel A. Reed Ph.D., is the Chancellor's Eminent Professor and the founding director of the Renaissance Computing Institute. He also serves as the Vice-Chancellor for Information Technology for the University of North Carolina at Chapel Hill. His research interests are in high-performance computing, computational Grids, scientific collaboration and computer systems.
Terry Magnuson, Department of Genetics
terry_magnuson@med.unc.edu 919-843-6475
Professor Terry Magnuson Ph.D., is the Sarah Graham Kenan Professor and founding chair of the Department of Genetics at UNC. He also heads the Carolina Center for Genome Sciences (CCGS). The CCGS includes experimental, social and analytical genomics divisions, with the latter unit including specialists in basic and applied biomedical computing.
Bradley Hemminger, School of Information and Library Science
bmh@ils.unc.edu 919-966-2998
Bradley Hemminger, Ph.D., is an Assistant Professor in the School of Information and Library Science. His interests are medical and bio-informatics, computer-human interfaces, digital libraries and open archives and information visualization.
James Evans, Department of Genetics
jpevans@med.unc.edu 919-966-2276
James Evans, M.D. is an Associate Professor in the Department of Genetics and an Associate Director of the CCGS. He is board certified medical geneticist with a special interest in cancer genetics. He is heading a UNC development project to integrate the collection of genetic information and materials throughout the UNC campus.
Andrew Nobel, Department of Statistics
nobel@email.unc.edu 919-962-1352
Andrew Nobel Ph.D. is an Associate Professor of Statistics and an adjunct faculty member in the Computer Science Department. He will lead the statistical analysis of subspace clustering techniques and the validity of their results. Andrew has been collaborating with Chuck Perou's laboratory on the analysis of gene expression data and also has research programs in pattern recognition and machine learning.
Kari North, Department of Epidemiology
kari_north@unc.edu 919-966-2148
Kari North Ph.D. is an Assistant Professor in the Department of Epidemiology and a member CCGS. She is a statistical geneticist. She is highly experienced in genetic epidemiology and linkage analysis. She has practical experience in large genetic study design and will act as a collaborator on this project contributing her expertise.
Fernando Manuel Pardo, Department of Genetics
fernando_pardo-manuel@med.unc.edu 919-843-5403
Fernando Manuel Pardo Ph.D. is an Assistant Professor in the Department of genetics and the CCGS. His interests include quantitative genetic analysis in mice. He has expertise in genetic meiotic segregation and genetic diversity of mice.
Karen Mohlke, Department of Genetics
karen_mohlke@med.unc.edu 919-966-2913
Karen Mohlke Ph.D. is an Assistant Professor in the Department of genetics and the CCGS. She is interested in the genetic analysis of complex traits. She has been and will be working on the positional cloning of loci for diabetes.
Susan Paulsen, Department of Computer Science
paulsen@cs.unc.edu
Charles Perou, Department of Genetics
chuck_perou@med.unc.edu 919-843-5740
Charles Perou Ph.D. is an Assistant Professor in the Department of Genetics and the CCGS. He is interested in transcriptional profiling and the genetic epidemiology of cancer.
Jan Prins, Department of Computer Science
prins@cs.unc.edu 919-962-1913
Jan Prins Ph.D. is a Professor of Computer Science. He has directed or co-directed several large projects to integrate high performance computing techniques into computational science. He will lead the development of high-performance interactive implementations of subspace clustering methods.
Patrick Sullivan, Department of Genetics
patrick_sullivan@med.unc.edu 919-966-3358
Patrick Sullivan M.D. is a Professor in the Department of Genetics, Psychiatry and the CCGS. His interest is principally in behavioral genetics. He is actively involved in large collaborative projects on schizophrenia, smoking dependence and chronic fatigue syndrome.
David Threadgill, Department of Genetics
david_threadgill@med.unc.edu 919-843-6472
David Threadgill Ph.D. is an Assistant Professor in the Department of Genetics and the CCGS. His principal interest is in quantitative genetic analysis of mice and transcriptional profiling.
Alexander Tropsha, School of Pharmacy
alex_tropsha@unc.edu 919-966-2955
Alexander Tropsha Ph.D. is a Professor in the Division of Medicinal Chemistry and Products in the School of Pharmacy and an Associate Director of the CCGS. His principal interests are in biomolecular informatics.
K.T.L. Vaughan, Health Sciences Library
ktlv@email.unc.edu 919-966-8011
K.T. Vaughan, M.S.L.S., is an Assistant Librarian in the Health Sciences Library. Her research interests include the integration of medical and bio-informatics into clinical, research, and teaching practices and the use of library services by interdisciplinary communities of practice. As the Librarian for Bioinformatics and Pharmacy, K.T. coordinates Library services for faculty, staff, students, and the general public in areas of genetics, pharmaceutics, and basic biomedicine.
Fred Wright, Department of Biostatistics
fwright@bios.unc.edu 919-843-3655
Fred Wright Ph.D. is an Associate Professor in the Department of Biostatistics and the CCGS. His principal interests are statistical genetics and the development of analytic methods.
Wei Wang, Department of Computer Science
wangwei@email.unc.edu 919-962-1744
Wei Wang Ph.D. is an Assistant Professor of Computer Science. She is an expert in data mining and has developed key algorithms for subspace clustering, as well as mining sequence, spatial and structured data. She is a member of the CCGS and has collaborations with several biological driving problems.
Kirk Wilhelmsen, Department of Genetics
kirk_wilhelmsen@med.unc.edu 919-966-1373
Kirk Wilhelmsen, M.D., Ph.D. is an Associate Professor in the Department of Genetics, Neurology, CCGS and Bowles Center for Alcohol Studies. His interest is principally in behavioral genetics. He has directed several large-scale family studies related to addiction, has directed a high throughput genotyping laboratory and has collaborated on the genetic analysis for all the studies that he has participated.
Fei Zou, Department of Biostatistics
fzou@bios.unc.edu 919-843-4822
Fei Zou, Ph.D. is an Assistant Professor in the Department of Biostatistics and the CCGS. Her interest is in methods of linkage analysis of quantitative traits and association analysis.
Introduction
HAP-SAMPLE is a web application for simulating SNP genotypes for case-control and affected-child trio studies by resampling from Phase I/II HapMap SNP data. The user provides a list of SNPs to be "genotyped," along with a disease model file that describes causal SNPs and their effect sizes. The simulation tool is appropriate for candidate regions or whole-genome scans.
Acknowledgements
This project is supported by Grant 5-P20-RR020751-01-02 from the National Institutes of Health Center for Research Resources as part of the Carolina Center for Exploratory Genetic Analysis. Other sources of support include Carolina Environmental Research Center (EPA RD-83272001), NIGMS R01 GM074175, and CF Foundation Zou05P0. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH or the National Center for Research Resources.