Early Insights from the Human DNA Sequence

Genomics and Its Impact on Science and Society: The Human Genome Project and Beyond

What We've Learned Thus Far
The first panoramic views of the human genetic landscape have revealed a wealth of information and some early surprises. Much remains to be deciphered in this vast trove of information; as the consortium of HGP scientists concluded in their seminal paper, “. . .the more we learn about the human genome, the more there is to explore.” A few highlights from the first publications analyzing the sequence follow.

  • The human genome contains 3.2 billion chemical nucleotide bases (A, C, T, and G).
  • The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases.
  • The functions are unknown for more than 50% of discovered genes.
  • The human genome sequence is almost (99.9%) exactly the same in all people.
  • About 2% of the genome encodes instructions for the synthesis of proteins.
  • Repeat sequences that do not code for proteins make up at least 50% of the human genome.
  • Repeat sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics. Over time, these repeats reshape the genome by rearranging it, thereby creating entirely new genes or modifying and reshuffling existing genes.
  • The human genome has a much greater portion (50%) of repeat sequences than the mustard weed (11%), the worm (7%), and the fly (3%).
  • Over 40% of the predicted human proteins share similarity with fruit-fly or worm proteins.
  • Genes appear to be concentrated in random areas along the genome, with vast expanses of noncoding DNA between.
  • Chromosome 1 (the largest human chromosome) has the most genes (2968), and the Y chromosome has the fewest (231).
  • Genes have been pinpointed and particular sequences in those genes associated with numerous diseases and disorders including breast cancer, muscle disease, deafness, and blindness.
  • Scientists have identified about 3 million locations where single-base DNA differences occur in humans. This information promises to revolutionize the processes of finding DNA sequences associated with such common diseases as cardiovascular disease, diabetes, arthritis, and cancers.
Organism Genome Size (Bases) Estimated
Human (Homo sapiens) 3.2 billion 30,000 to 40,000
Laboratory mouse (M. musculus) 2.6 billion 30,000
Mustard weed (A. thaliana) 100 million 25,000
Roundworm (C. elegans) 97 million 19,000
Fruit fly (D. melanogaster) 137 million 13,000
Yeast (S. cerevisiae) 12.1 million 6,000
Bacterium (E. coli) 4.6 million 3,200
Human immunodeficiency virus (HIV) 9700 9

The estimated number of human genes is only one-third as great as previously thought, although the numbers may be revised as more computational and experimental analyses are performed.

Scientists suggest that the genetic key to human complexity lies not in gene number but in how gene parts are used to build different products in a process called alternative splicing. Other underlying reasons for greater complexity are the thousands of chemical modifications made to proteins and the repertoire of regulatory mechanisms controlling these processes.

The online presentation of this publication is a special feature of the Human Genome Project Information Web site.

Hosted by Webfaction

Return to Top

Page rendered with rest2web the Site Builder

Last edited Fri Sep 30 20:53:34 2005.