By Shawn Contant, Anurag Muthyam, Michael Shannon, and Ryan Woodburn.


    Genomics is the study of the genome, the set of genes and chromosomes found in the body of every organism. Genomes are long strips of coded DNA (deoxyribonucleic acid), stored inside of a cell, and DNA uses specific codes to store information about that particular genome. These molecular structures give organisms most of the physical attributes they have – such as the color of a human's eye, the length of a fly's wing, whether a pea will be wrinkly or smooth, and so on. Technology and computers have been used to analyze and understand genomes in an unprecedented way, expanding our knowledge of them. Genomics programs, such as the Human Genome Project, have greatly increased our knowledge of our genes, benefiting us in various ways and changing what we believe about the organisms around us.

    Computational genomics takes it one step further by applying computer analysis to the efforts of understanding biology. This sounds simple but it actually has several specific fields that fall under it: bio-sequence analysis, gene expression data analysis, pattern recognition, and so on. Computational genomics thus can be seen as an umbrella term that covers a wide array of branches.

DNA and Sequencing

Image from

    DNA (deoxyribonucleic acid) is considered the blueprint of an organism. DNA goes through a process within the nucleus of a cell of an organism to duplicate itself so that it may create new cells for the organism. It's a very VERY long strip of data with two sides, sort of like a table, called a double helix. It makes half copies of itself and releases these half copies called RNA to the cell so that the information on one half can be read to determine the production of new proteins. Think of a ladder, and each of the arms has one piece of information per half. Each piece of information is what is called a nucleotide, which corresponds to another specific nucleotide. The nucleotides on the RNA are read in threes to determine which amino acid is to be added to the protein being built. You can easily see the scale of this system; There are so many different proteins and amino acids to work with. We've actually managed to decode some of the DNA sequences and determine which amino acids are being refered to. This process is called DNA sequencing, or genetic mapping, and is another big portion inside of genomics.

Human Genome Project

    One of, if not the largest combination of computer science and genomics is the Human Genome Project. The Human Genome Project (or HGP) started in October 1990 and set out to completely map the human genome. This included discovering around 20,000 to 25,000 human genes so that they can be used for further research. The second goal of this project was to find out the sequence of all 3,000,000,000 of the bases (or subunits) included in the human genome. The HGP’s main goal of mapping the human genome was confirmed completed in April 2003. However, the HGP is continuing research on the subject of mapping chromosomes.

    The Human Genome Project’s success is mostly in part to the combination of computer science and genomics. By 1994, only 4 years after they started, the HGP had already reached their 5-year goal. In 1995 the HGP announced the first in a long line of high resolution maps of the human chromosomes. In 1999 the first human chromosome was completely sequenced. The last chromosome that the HGP sequenced was chromosome 1 in May 2006.

    To start sequencing, the HGP divided the chromosomes into smaller groups with about 50,000,000 to 250,000,000 bases each. Then, each piece is divided into even smaller groups that only differ in length by one base. As you can imagine, in order to keep track of all these groups, it was crucial to have a database to hold all this information yet allow it to be easily accessed. This is where computer science meets genomics.

    A few decades ago, there was less information about genes because we knew little about them and now we have made breathtaking discoveries of human genes regarding diseases like cystic fibrosis, chorea and many more. And thus it is a grand accomplishment in sequencing the biochemical bases of DNA which is fundamentally becoming eminent in biotechnology and not only that but also it is generating a new sorts of renovate acquaintance on the necessities of biological processes and thus it has augmented the aptitude to handle the genes providing numerous prospects despite this genetic factor discovery is rather slow but it is effective and used in various fields and thus the biologists began to identify the prominence and the worth of computing to mapping and sequencing.

    The HGP also sought to address the ethical issues that arise with this understanding of the human genome. Ethics and morality are things often forgotten when discoveries are made. The HGP looked at matters of fair use of genetic information; privacy and confidentiality; possible discrimination for having or lacking certain genes; and so on.

“Junk” DNA         

    Another recent advent that came as a result of genomics was the discovery that “junk DNA,” the part of the human genome that seemed to have no function, actually plays a role. It was originally thought that some 98% of the genome was inactive, perhaps just an evolutionary carry-over that had been rendered useless, like the appendix on the large intestine. The Humane Genome project had gone through the genes and to the disappointment of geneticists, concluded only a mere two percent had any function.

    The Encyclopedia of DNA Elements (ENCODE), launched in 2003, showed that about 80% of the human genome is active. The tinier 2% is responsible for making proteins that let our cells function, and also make many of the physical attributes that give humans their diversity. Since the rest of the genome didn't do that, it was thought of as “an apparent biological wasteland.” Instead, the parts formerly considered junk is actually filled with “gene switches,” which control when genes turn on, off, and how much of a protein they make. They are versatile, able to tell an undeveloped cell in an embryo what it should turn into, and changing protein output at different points in life. Within these gene switches are the sources of many diseases, and as computers sift through this previously valueless area of the genome, advances in science and medicine can be made.

The Computational Aspect

    For a project of as grand a scale as the Human Genome Project or ENCODE, establishing a sufficiently functional computer system is a challenge. There is a lot of information in genes, especially for complex animals like humans. One part of the HGP's mission was to make its findings available over the Internet and to the private sector. The problem of analyzing and handling all this data was seen early on. A part of the sequence analysis engine is displayed below.

Image Courtesy of Oak Ridge National Laboratory, U.S. Dept. of Energy

    At the heart of the system is the process manager, which determines what tasks need to be done at any point of time. Even with powerful hardware, the combined system is too much for a single computer, so these processes are carried out over a network of machines, which themselves are physically separated by appreciably long distances.

    The resulting data from deciphering and analyzing the genome could be structured and applied in many ways. Output was sent to more than 100 databases around the world, each focusing on a certain aspect of genomics – DNA and protein sequences, genome mapping, protein structure, and so on. These needed to be accessible from any relevant place, and then have the information held in the database pieced together to make something meaningful.

Genetic Comparison

    One of the ways scientists figure out what genes do is by using comparative genomics. The supercomputers used in genomics projects may be able to find genes, but they can't tell us what the information means. It turns out that some genes in certain species do about the same thing. Just as organs in biologically related species perform similar functions, so does the DNA of genetically related organisms. In the most recently mapped genomes of prokaryotes (simple single-celled organisms), some 80 to 90% of their genes are determined using comparative methods. Unfortunately, this method does not fare well when used on complex genomes such as what humans have, where a majority of genes' functions can't be reliably predicted using comparisons. Presently, the best candidate for resolving this issue is the genome of a primate, which has a genetic make-up very close to that of a human.


    There still remains a colossal amount of work to be done by experimental and computational genome projects using the basic tactics like genome comparison to study biological adaptation and evolutionary classification of genes explaining them in a pragmatic way. Thanks to the application of computer science and technology to the endeavor of mapping genomes, our understanding of ourselves and the living world around us has dramatically increased. As computers improve and research continues, we can expect to see more leaps in knowledge which will ultimately benefit humanity.

YouTube Videos Related to Genomics

Dr. Cenk Sahinalp: Computational Genomics
Genomics Computing Revolutionizing Healthcare
What is DNA?
What is Genomics - Full Length

Works Cited

"Computational genomics." ScienceDaily. ScienceDaily. n.d. Web. 22 Nov. 2012.

"Human Genome Project Information." Human Genome Project Information. Oak Ridge National Laboratory. n.d. Web. 22 Nov. 2012.

Koonin, Eugene V. "Computational genomics." Current Biology. Cell Press, 6 Mar. 2001. Web. 22 Nov. 2012.

Major Events in the U.S. Human Genome Project and Related Projects." Human Genome Project Information. Oak Ridge National Laboratory. n.d. Web. 22 Nov. 2012.

Park, Alice. "Junk DNA — Not So Useless After All." Time Health and Family. Time Magazine, 6 Sept. 2012. Web. 22 Nov. 2012.

Uberbacher, Ed. "Computing the Genome." Oak Ridge National Laboratory. Oak Ridge National Laboratory. n.d. Web. 22 Nov. 2012.

"Welcome to DNA Sequencing." DNA Sequencing. DNA Sequencing. n.d. Web. 22 Nov. 2012.

"What is DNA?" Genetics Home Reference. National Institutes of Health. n.d. Web. 22 Nov. 2012.

"What is Genomics?" News-Medical.Net. News-Medical.Net. n.d. Web. 22 Nov. 2012.