Oct 31, 2008

Three lessons learned from Dr. Cox’s lecture on how to conduct a successful GWAS

The genome-wide association study (GWAS) is an increasingly popular approach for identifying genetic factors influencing common, complex diseases. It also established the scientific basis of many consumer genomics tests. I am doing a live blog at the Consumer Genomics Workshop at Northwestern University.

  • Maximizing the power of GWAS

To maximize the power of a GWAS study, various approaches have been proposed.

According to Dr. Cox, staged-design (for example, 300 samples of 100,000 SNPs at stage I and 2,000 samples of 1,000 SNPs at stage II) is less popular now, because of the lower cost of genotyping nowadays.

Instead, it is more popular to utilize a public database of controls, which can significantly increase the power of the association study and decrease the overall project cost. An example of such a control database is Ilumina’s iControlDB.

  • QC of the allele calling is critical

Bad samples can bias the genotype calling, which results in superficially results with very high apparent significance (thousands significant SNPs after FDR), as evidenced by the Q-Q plot.

Many allele-calling algorithms are based on the clustering of the fluorescent intensities. As such, bad samples (outliers) can cause confusing and wrong assignment by the algorithms.

I used to think that with thousands of samples involved, a couple of (even dozens of) bad samples should not be a big concern (i.e., the robustness of statistical modeling!), but I am wrong according to Dr. Cox.

  • Experiment design

Dr. Cox mentioned that the batch/plate artifacts have been observed in multiple studies. For instance, some of the plates containing only case and some of the plates containing only controls. This fact reminds me of statistical experiment design. We learned the same lesson in SELDI-TOF proteomics and microarrays.

Such a batch/plate effect can be tested by looking at the allele frequency of each plate: if you see dramatic different results from a plate, it suggest further investigation.

Nancy J. Cox, PhD is a professor of medicine and human genetics and chief of the Section of Genetic Medicine at The University of Chicago. Her research program is focused on development of methods to identify and characterize the genetic component to common, complex diseases and related traits. Diseases currently under study in the Cox computational lab include focuses on diabetes and diabetic complications, asthma and related traits, stuttering, specific language impairment, mesothelioma, breast cancer, Tourette Syndrome and autism. Her development of methods for genome-wide association studies has provided new insights into the genetic component of common human diseases.

Aug 25, 2008

Art, Genomics and Daily Life

A couple months ago, I started my investigation on the interaction between visual arts and genomics. The goal is to stimulate public debate on the possibilities and impacts of genomics on everybody's daily life.

In collaboration with my colleagues, I have created two digital frames in the first series of works. They were displayed at the ISMB conference in Toronto, 2008 and recognized by the ISCB Visual Reflections On Science award. They are currently on display at the ECCB conference in Europe.

Work #1:

Portrait of James D. Watson in his own Word, 2008

Jared Flatow, Brian Chamberlain, and Simon Lin

(click on the picture to zoom in)

According to Wikipedia, "a portrait is a painting, photograph, sculpture, or other artistic representation of a person". Instead of simply using color pigments, we use unique portions of Dr. James Watson's DNA sequence to portrait himself. Dr. Watson was the discoverer of the structure of the DNA and helped to establish the Human Genome Project.DNA, as a primary genetic material, defines the molecular signature of oneself. Dr. Watson's DNA was fully sequenced and made public in 2007 by The Baylor College of Medicine Genome Sequencing Center, 454 Life Sciences Technology, and The Rothberg Institute. We used the SNPs, which define the small differences of DNA from person to person, to uniquely represent Dr. Watson. In order to do this, we took the variant allele base pairs from Dr. Watson's genome (Cold Spring Harbor Laboratory distribution, 6/6/2007) which had a sequence observation count greater than 12, and generated a portrait capturing his phenotype.

Work #2:

DNA and Community, 2008

Simon Lin and Jared Flatow
(click on the picture to zoom in)

Artists constantly explore the interactions between science and society. We looked into the public understanding of DNA in the Web 2.0 era by retrieving Creative Commons (CC)-licensed photos from the Flickr (photo-sharing) website. We retrieved 899 images using the topics of DNA and myself on April 6, 2008. We rearranged these images using a mosaic algorithm to reveal the hidden message of "DNA and Community". Traditional art uses oil and brush; we are using Python and the internet to experiment with new building blocks of CC-licensed photos. By integrating the photos through the lens of 899 individuals, we are investigating how people share their life stories (Flickr) and how people share their creative responsibility (CC license). It is interesting to note that our work is also licensed under CC and thus has 899 lines of acknowledgements.

May 14, 2008

23andme is Shaking up Clinical Research

It is coming.

Yes, 23andme is challenging the traditional way we are conducting clinical trials. In a press release today, 23andme is announcing a partnership with the the Parkinson's Institute to discover genetic and environmental factors of Parkinson's disease.

Six months ago, my colleagues and I send out a grant proposal arguing the potential efficiencies of combining consumer genomics with clinical trials. Although I am frustrated to learn a week ago that our proposal was not funded, I am very happy to see the press release today from 23andme, which essentially validated our proposal.

On a separate note, the Wall Street Journal reports today on a shortage of participation in cancer clinical trails, especially minority groups.

Bottlenecks of Traditional Clinical Trials23andme's Innovative Solutions
Patient recruiting Social network
Cost of SNP scans May do cost-sharing with participants

Similar to the early days of sequencing and bioinformatics development, I would expect to see the industry driving the innovative applications, instead of the academics.

There are "2.0" hypes on everything recently, including the "Research 2.0" in the press release. Even though there are a lot of details to be worked out, I am still very positive on it.

Apr 24, 2008

With GINA, now you can order a genetic test with a peace in mind

Today, it should be a historical day for consumer genomics. With overwhelming support the Senate passed by a vote of 95-0 on the Genetic Information Nondiscrimination Act (S. 358).

The Genetic Information Nondiscrimination Act (GINA) paves the way for the responsible use of genetic information. Now consumers can have a peace in mind when they order a genetic test to
manage their health proactively: their genetic information would not be used against them.

GINA protects against discrimination with respect to health insurance and employment by the following:

· Prohibiting group health plans and issuers offering coverage on the group or individual market from basing eligibility determinations or adjusting premiums or contributions on the basis of genetic information. They cannot request, require or purchase the results of genetic tests, or disclose genetic information.

· Prohibiting issuers of Medigap policies from adjusting pricing or conditioning eligibility on the basis of genetic information. They cannot request, require or purchase the results of genetic tests, or disclose genetic information.

· Prohibiting employers from firing, refusing to hire, or otherwise discriminating with respect to compensation, terms, conditions or privileges of employment. Employers may not request, require or purchase genetic information, and may not disclose genetic information. Similar provisions apply to employment agencies and labor organizations.

Mar 7, 2008

The first two persons who paid $350,000 for their genome

New York times reported this week on the first two Knome customers who paid $350,000 to get their full genome sequenced.

-- The first guy, Dan Stoicescu, is an executive of a Romania pharmaceutical company.
-- The second guy, is a Chinese executive signed with through Knome's business partner.

The rapid advance of genetic technology will result in a quick drop of the price tag. Soon, you do not need to be rich to see your genome! Even with the current technology, you can get your SNPs (a selection of your genome) done within $1,000!

Feb 17, 2008

How Web 3.0 and Lifestyle 3.0 Converge at DNA Testing

In parallel with the evolution of “web 1.0”, “web 2.0” and “web 3.0”, Richard Dale formulates the concept of “lifestyle 1.0”, “lifestyle 2.0” and “lifestyle 3.0”. It is so refreshing: a must read!

Here I re-analyze Richard’s concepts from the perspective of self-consciousness. My argument is that the development of the Internet and i-lifestyle is purely driven by human psychology.

  • Life Style 1.0 -- a.k.a. Self awareness: It is an explicit understanding that one exists. Furthermore, it includes the concept that one exists as an individual, separate from other people, with private thoughts. For instance, Joe created a personal webpage in 2001.
  • Life Style 2.0 -- a.k.a. Self disclosure: At this stage, one will both consciously and unconsciously reveal more about oneself to others. For instance, Joe joined the Facebook in 2006 to share his photos from a trip with friends.
  • Life Style 3.0 -- a.k.a. Self perception: At the 3.0 stage, one will develop his attitudes by observing our own behavior/genetic makeup and conclude what attitudes must have caused them. For instance, Joe ordered a Genetics Test in 2007 and put the results into his personal health record (PHR).

As the analysis goes on, we can find that the ultimate drive of web 3.0 is not semantic web, but Genomics Testing!

Feb 13, 2008

Genetic Testing on NBC News: How to Interpret the Results?

Bioinformatics scientists have long been helping biomedical researchers to interpret their genomics data. But who should/will/could help consumers to interpret their results? NBC News aired its investigation yesterday.

Having recognized the same probem since last year, I have been advocating the study of consumer bioinformatics. Bioinformatics has been traditionally defined as the scientific research to use computers to handle and genomics information.

Genomics testing is no longer for research uses only; it has reached millions of consumers with a few mouse clicks away since November 2007. Similarly, bioinformatics is no longer a scientific discipline just for research projects and scientists.

Overtime, we noticed that consumers have shown increasingly interests in bioinformatics. For instance, Hugh Reinhoff has started a wiki of “mydaughtersdna.org” to investigate the interpretation of her daughter’s genome. An active community has been formed around this wiki. Also, amateur scientists start having consumer access to super computing resources previous only available to heavily-funded researchers. Creative computing resources, such as the “Elastic Computing Cloud” by Amazon allows anyone to run parallel MPI-BLAST for less than $1.

We have coined the term “consumer bioinformatics” to emphasize the urgent need for bioinformatics in the consumer domain. Already a challenging problem in the research domain, we believe consumer bioinformatics will further invigorate the development of bioinformatics as a discipline and its applications.

Feb 12, 2008

Retail Genomics and web 2.0: Marketing via YouTube, Facebook and Blogs

In a previous Blog, I discussed the use of YouTube in Consumer Genomics marketing.

Today, I run into a Blog called "The Spittoon", owned by 23andme, for the same purpose.

Similarly, a company called the DNA Diagnostics Center (DDC) is on facebook.

On one side, they are geeky; on the other side, they reaches consumer well using the principle of soft marketing.

Jan 24, 2008

Another Player Joined the Personal Genomics Craze

Announced January 24, 2008, SeqWright, a Houston, TX company, joined the recent craze of personal genomics by offering a DNA test to consumers at $998.

Another player joins the "23etAl" craze!

My friend at BioTeam -- Michael Cariaso, who is also the author of the well known SNPedia, coined a word "23atAl". I like Michael's creation!

"23atAl" describes the companies of 23andme, Navigenics, deCODEme, Knome and alike, who are trying to provide consumers with genotying service.
The information provided by SeqWright is cursory. My instinct tells me that they are using Affymetrix SNP arrays. If so, it will be a direct competitor to Navigenics.

A distinctive feature of the SeqWright offering is the copy number variation analysis. So far as I know, it is the first one providing this information to consumers, although we use it regularly in research as a by-product of running SNP arrays.

A quick update: Blogger Daniel MacArthur (his Genetic Future blog here) also created an interesting word play, "me two": 23andMe and deCODEme.

Jan 21, 2008

The Google Approach to Large Genomics Data Sets

Getting terabytes of genomics data? Yes, easily! -- via Next Generation Sequencing (NGS), microarray, mass spectrometry, consumer genotyping ... you name it.

The bioinformatics community has been working on this problem for years. A few milestones: 1) Recognized the importance of meta-data (data about data, i.e., the running conditions to acquire the scientific data). 2) Utilized XML and Ontology to communicate.

However, it is still a great challenge. So, what did Google come up with?

In summary, here is the Google paradigm to large scientific data:
  • Premises
a) The growth of scientific data (size) outpaces the growth of Internet bandwith.
b) The consumption of the data (in terms of user-comprehensible results) is largely asymmetric in terms of size, comparing to the raw data.

  • Solution:
a) UPLOAD: Ship the data to the computational engine via FedEx or UPS.
b) ANALYZE: Data will be co-located with the computational engine (at the Google empire??)
c) DELIVER: The analyzed results or query results (usually much smaller) will be delivered to the consumer via the Internet.

Will it work? I think so.

Jan 5, 2008

Consumer Education through Video Clips

WSJ has discussed the use of YouTube in consumer advertisement.

There is another use of YouTube: to educate consumers on exotic products, such as genomic testing.

deCode Genetics, a biotech company (NASDAQ: DCGN), has experimented it on YouTube.

The results? A bit discouraging.

332 views so far (as of 1/5/08) after it was posted on November 17, 2007. I will keep tracking it...

Date NumberOfViews
11/17/2007 0
1/5/2008 332
3/9/08 698

As a separate note, here is a collection of 10 YouTube videos on genetic conditions:

Jan 3, 2008

Glossary of Consumer Genomics


  • a gene is the basic functional unit of heredity. Genes are made up of DNA.


  • the study of a single gene and its effect on human health. For example, a mutation of the CFTR gene causes cystic fibrosis.


  • the study of all human genes, including their interaction with environment factors such as smoking and diet.

Single Nucleotide Polymorphism (SNP):

  • a single basepair variation in the DNA that might differ from person to person. SNPs can be used to profile a person’s genomic information. Currently, up to 2 millions of SNPs can be measured on a single microarray.


  • a technology to measure SNPs or mRNAs.


  • most diseases are caused by the interaction of the genome and the environment.

Genomic Testing:

  • the measurement of a person’s SNPs using microarray or other technologies. Sometimes, it is also called genetic testing in the mass media.

What is Consumer Genomics

Anne Wojcicki & Linda Avey (founders of 23andme) interview on PBS

About Me and this Consumer Genomics Blog

I was trained as a physician. Eleven years ago, I started to conduct medical research. In my day job, I am a bioinformatics scientist. I have been tracking direct-to-consumer genomic products since 2005. Earlier 2007, I started using "consumer genomics" and "consumer bioinformatics" in an academic grant to describe the paradigm shift of genomics and bioinformatics from the lab to the consumers.

Now my academic research interest also includes consumer education, access, and interpretation of genomic testing results -- an activity I called "consumer bioinformatics".

I will address the consumer, science, and business aspects of direct-to-consumer genomics in this blog.

I am going to discuss the following issues:
  • The importance of data portability in consumer genomics
  • The consumer aspect of consumer genomics
  • and much more ...