Washington University part of major effort to sequence 1,000 human genomes

Washington University School of Medicine in St. Louis will play a leading role in an international collaboration to sequence the genomes of 1,000 individuals. The ambitious 1000 Genomes Project will create the most detailed picture to date of human genetic variation and likely will identify many genetic factors underlying common diseases.

Drawing on the expertise of research teams in the United States, China and England, the project will develop a new map of the human genome that will provide a close-up view of medically relevant DNA variations at a resolution unmatched by current technology. As with other major human genome reference projects, data from the 1000 Genomes Project will be made swiftly available to the worldwide scientific community through free public databases.

“A project like this would have been unimaginable only a few years ago,” says Elaine Mardis, Ph.D., co-director of the University’s Genome Sequencing Center, and one of the project’s lead investigators. “We now have the ability to examine in intimate detail variations in the genetic code that differ from person to person.”

At the genetic level, any two humans are more than 99 percent alike. However, it is important to understand the small fraction of genetic material that varies among people because it can help explain differences in individuals’ risk of disease, response to drugs or reaction to environmental factors. Common variation in the human genome is organized into local neighborhoods called haplotypes, which usually are inherited as intact blocks of information.

Recently developed catalogs of human genetic variation, such as the HapMap, have proven valuable in human genetic research. Using the HapMap and related resources, researchers already have discovered more than 100 regions of the genome containing genetic variations that contribute to common diseases such as diabetes, coronary artery disease, prostate and breast cancer, rheumatoid arthritis, inflammatory bowel disease and age-related macular degeneration.

However, because existing maps are not extremely detailed, researchers often must follow those studies with costly and time-consuming DNA sequencing to help pinpoint the precise variations. The new map would enable researchers to more quickly zero in on disease-related genetic alterations, speeding efforts to use genetic information to develop new strategies for diagnosing, treating and preventing common diseases.

“Our best chance of knowing why some people remain healthy well into their 90s and others develop illnesses at an early age is to understand the numerous genetic variations that exist within humans,” says Richard K. Wilson, Ph.D., director of the University’s Genome Sequencing Center. “This project will accelerate efforts to pinpoint the many genetic factors that underlie human health and disease.”

The scientific goals of the 1000 Genomes Project are to obtain a catalog of variations that occur at a frequency of 1 percent or greater in the human population across most of the genome, and down to 0.5 percent or lower within genes. This will likely entail sequencing the genomes of at least 1,000 people.

Going a major step beyond the HapMap, the 1000 Genomes Project will map not only the single-letter differences in people’s DNA, called single nucleotide polymorphisms (SNPs), but also will produce a high-resolution map of larger differences in genome structure called structural variants. Structural variants are rearrangements, deletions or amplifications of segments of the human genome. The importance of these variants has become increasingly clear with surveys completed in the past 18 months that show these differences in genome structure may play a role in susceptibility to certain conditions, such as mental retardation and autism.

The project will receiving major funding from the Wellcome Trust Sanger Institute in England, the Beijing Genomics Institute in China, and the National Human Genome Research Institute, part of the National Institutes of Health. In addition to Washington University, both the Sanger Institute and the Beijing Genomics Institute will contribute sequencing data to the project, as will the Broad Institute of MIT and Harvard and the Human Genome Sequencing Center at the Baylor College of Medicine in Houston. The consortium may add other participants over time.

The project depends on large-scale implementation of several new sequencing platforms. Using standard DNA sequencing technologies, the effort would likely cost more than $500 million. However, leaders of the 1000 Genomes Project expect the costs to be far lower – in the range of $30 million to $50 million – because of the project’s pioneering efforts to use new sequencing technologies in the most efficient and cost-effective manner.

In the first phase of the 1000 Genomes Project, lasting about a year, researchers will conduct three pilot studies. The results of the pilots will help decide how to produce most efficiently, accurately and cost effectively the project’s detailed map of human genetic variation.

The first pilot will involve sequencing the genomes of two trios (both parents and an adult child) at deep coverage, an average of 20 times per genome. This will provide a comprehensive dataset from six people that will help figure out how to identify variants using the new sequencing platforms, and will serve as a basis for comparison for other parts of the effort.

Another pilot will involve sequencing the genomes of 180 people at low coverage, about two times each. This will test the ability to use low-coverage data from new sequencing platforms to identify sequence variants and to put them in their genomic context.

The third pilot will involve sequencing the coding regions, called exons, of about 1,000 genes in about 1,000 people. This is aimed at exploring how best to obtain an even more detailed catalog in the approximately 2 percent of the genome that is comprised of protein-coding genes.

During its two-year production phase, the 1000 Genomes Project will deliver sequence data at an average rate of about 8.2 billion bases per day, the equivalent of more than two human genomes every 24 hours. The volume of data – and the interpretation of those data – will pose a major challenge for leading experts in the fields of bioinformatics and statistical genetics.

The 1000 Genomes Project will use samples from volunteer donors who gave informed consent for their DNA to be analyzed and placed in public databases. NHGRI established extensive and careful ethical procedures for previous projects, such as the HapMap.

Among the populations whose DNA will be sequenced in the 1000 Genomes Project are: Yoruban in Ibadan, Nigeria; Japanese in Tokyo; Han Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States.

Washington University School of Medicine’s 2,100 employed and volunteer faculty physicians also are the medical staff of Barnes-Jewish and St. Louis Children’s hospitals. The School of Medicine is one of the leading medical research, teaching and patient care institutions in the nation, currently ranked fourth in the nation by U.S. News & World Report. Through its affiliations with Barnes-Jewish and St. Louis Children’s hospitals, the School of Medicine is linked to BJC HealthCare.