Federally funded massive RAM computer speeds genome assembly
HOUSTON -- (September 22, 2010) -- A new massive RAM computing resource for genomic assembly with two nodes (or computers) that each have 1 terabyte of RAM (random access memory) will enable genomic scientists to assemble complicated genomes faster and with fewer errors, said scientists at the Baylor College of Medicine Human Genome Sequencing Center.
A federal grant of $262,000 from stimulus funds available through the American Recovery and Reinvestment Act of 2009 was awarded to BCM by the National Center for Research Resources and will pay for the Massive RAM Genomic Analysis Cluster, said Dr. Jeffrey Reid, assistant professor in the BCM Human Genome Sequencing Center, who led the effort to obtain the computers. This new resource will take the sequencing center into the next generation of genome assembly computation, a necessary step to handle recent advances in DNA sequencing technology.
Next generation methods
"This funding enables a higher level of computing," said Reid. That advance is critical to "next generation" sequencing methods that now speed the work with genomes. These recently improvements to DNA sequencing technology generate inexpensive "short reads" -- DNA sequences of small pieces of the genome. Assembling these short reads into long sequences and sorting the correct reads from those that contain sequencing errors is significantly easier when there is a reference genome available. For example, the first complete human genome sequence is used as a reference for other human genome sequencing.
But analyzing a new genome (such as that of the baboon monkey) that does not have a reference genome available requires doing the assembly without a guide. Rather than simply comparing each new read to the reference, you must use various computational strategies to determine which short fragments should be joined end-to-end to produce the long stretches of genetic material that define that genome.
"What if you took the complete works of Shakespeare and cut them into sentences and then put it back together? That's difficult," said Reid. "Now imagine if you cut them into words and had to reassemble them. That's orders of magnitude more difficult." The techniques that are most effective at assembling such new genomes using short reads are those that require a large amount of RAM, said Reid. Such techniques hold great promise in diseases that can significantly change the structure of a genome, such as cancer.
Expanded computing capacity
Dr. Richard Gibbs, director of the BCM Human Genome Sequencing Center, plans to capitalize on the expanded computing capacity in his search for disease genes – in particular those related to cancer. "Analyzing genomic information from tumors as well as from normal tissues requires formidable computing power, such as that represented in this computing cluster," said Gibbs. "This should be possible to do across millions of bases in hundreds of genomes in a reasonable amount of time. Taken together, the Massive RAM Genomic Analysis Cluster would enable existing analysis to proceed faster, improve the quality of analysis currently undertaken and allow us to embark on novel, transformational analysis development and discovery."
Reid is leading the effort to develop these novel methods of sequence analysis, in collaboration with other BCM researchers: Dr. Rui Chen, assistant professor in the sequencing center, will use the massive computing power in his work on the genetics of early onset retinal diseases. Dr. Joseph Petrosino, assistant professor of molecular virology and microbiology at BCM, will also use the computing power as part of the BCM effort to sequence the Human Microbiome, an international effort to determine the genetic code of the bacteria that colonize the bodies of healthy humans.
Dr. Jeffrey Rogers, an associate professor of molecular and human genetics at BCM and a member of the Baylor Human Genome Sequencing Center, is a leader in primate genomics and will work with researchers across the country to sequence various primate genomes. This knowledge will improve understanding of primate biology and human disease.
"This is a transformative time for genomics and primate biology, which is only enabled by the combination of next-generation DNA sequencing and high performance computing resources such as the Massive RAM Genomic Analysis Cluster," said Rogers. "This is a very exciting time for genomics."