Oct 31, 2008

Three lessons learned from Dr. Cox’s lecture on how to conduct a successful GWAS

The genome-wide association study (GWAS) is an increasingly popular approach for identifying genetic factors influencing common, complex diseases. It also established the scientific basis of many consumer genomics tests. I am doing a live blog at the Consumer Genomics Workshop at Northwestern University.

  • Maximizing the power of GWAS

To maximize the power of a GWAS study, various approaches have been proposed.

According to Dr. Cox, staged-design (for example, 300 samples of 100,000 SNPs at stage I and 2,000 samples of 1,000 SNPs at stage II) is less popular now, because of the lower cost of genotyping nowadays.

Instead, it is more popular to utilize a public database of controls, which can significantly increase the power of the association study and decrease the overall project cost. An example of such a control database is Ilumina’s iControlDB.


  • QC of the allele calling is critical

Bad samples can bias the genotype calling, which results in superficially results with very high apparent significance (thousands significant SNPs after FDR), as evidenced by the Q-Q plot.

Many allele-calling algorithms are based on the clustering of the fluorescent intensities. As such, bad samples (outliers) can cause confusing and wrong assignment by the algorithms.

I used to think that with thousands of samples involved, a couple of (even dozens of) bad samples should not be a big concern (i.e., the robustness of statistical modeling!), but I am wrong according to Dr. Cox.

  • Experiment design

Dr. Cox mentioned that the batch/plate artifacts have been observed in multiple studies. For instance, some of the plates containing only case and some of the plates containing only controls. This fact reminds me of statistical experiment design. We learned the same lesson in SELDI-TOF proteomics and microarrays.

Such a batch/plate effect can be tested by looking at the allele frequency of each plate: if you see dramatic different results from a plate, it suggest further investigation.


Nancy J. Cox, PhD is a professor of medicine and human genetics and chief of the Section of Genetic Medicine at The University of Chicago. Her research program is focused on development of methods to identify and characterize the genetic component to common, complex diseases and related traits. Diseases currently under study in the Cox computational lab include focuses on diabetes and diabetic complications, asthma and related traits, stuttering, specific language impairment, mesothelioma, breast cancer, Tourette Syndrome and autism. Her development of methods for genome-wide association studies has provided new insights into the genetic component of common human diseases.