DNA - Human Genome


Deoxyribonucleic acid (DNA) is the chemical inside the nucleus of all cells that carries the genetic instructions for making living organisms. A DNA molecule consists of two strands that wrap around each other to resemble a twisted ladder. The sides are made of sugar and phosphate molecules. The “rungs” are made of nitrogen-containing chemicals called bases. Each strand is composed of one sugar molecule, one phosphate molecule, and a base. Four different bases are present in DNA - adenine (A), thymine (T), cytosine (C), and guanine (G). The particular order of the bases arranged along the sugar - phosphate backbone is called the DNA sequence; the sequence specifies the exact genetic instructions required to create a particular organism with its own unique traits.

Each strand of the DNA molecule is held together at its base by a weak bond. The four bases pair in a set manner: Adenine (A) pairs with thymine (T), while cytosine (C) pairs with guanine (G). These pairs of bases are known as Base Pairs (bp).

These Base Pairs (bp) are the basis of Y-chromosome testing.

The Y-Chromosome

Human sex is determined by the X and Y chromosomes. A female has 2 X-Chromosomes and a male has an X and a Y-Chromosome. When a child is conceived it gets one chromosome from its mother and one chromosome from its father. The chromosome from the mother will always be an X, but the chromosome from the father may be either X or Y. If the child gets the X she will be a girl, if the child gets the Y he will be a boy.

This Y-Chromosome has certain unique features:

*The presence of a Y-Chromosome causes maleness. This little chromosome, about 2% of a father's genetic contribution to his sons, programs the early embryo to develop as a male.

* It is transmitted from fathers only to their sons.

* Most of the Y-Chromosome is inherited as an integral unit passed without alteration from father to sons, and to their sons, and so on, unaffected by exchange or any other influence of the X-Chromosome that came from the mother. It is the only nuclear chromosome that escapes the continual reshuffling of parental genes during the process of sex cell production.

It is these unique features that make the Y-Chromosome useful to genealogists.

Testing the Y-Chromosome

The Y-Chromosome has definable segments of DNA with known genetic characteristics. These segments are known as Markers. These markers occur at an identifiable physical location on a chromosome known as a Locus. Each marker is designated by a number (known as DYS#), according to international conventions. You will often find the terms Marker and Locus used interchangeably, but technically the Marker is what is tested and the Locus is where the marker is located on the chromosome.

Although there are several types of markers used in DNA studies, the Y-Chromosome test uses only one type. The marker used is called a Short Tandem Repeat (STR). STRs are short sequences of DNA, (usually 2, 3, 4, or 5 base pairs long), that are repeated numerous times in a head-tail manner. The 16 base pair sequence of "gatagatagatagata" would represent 4 repeats of the sequence "gata". These repeats are referred to as Allele. The variation of the number of repeats of each marker enables discrimination between individuals.

DNA (deoxyribonucleic acid) was discovered in the late 1800s, but its role as the material of heredity was not elucidated for fifty years after that. It occupies a central and critical role in the cell as the genetic information in which all the information required to duplicate and maintain the organism. All information necessary to maintain and propagate life is contained within a linear array of four simple bases: adenine, guanine, thymine, and cytosine.

DNA was first described as a monotonously uniform helix, generally called B-DNA. However, we now know that DNA can adopt many different shapes and conformations. Moreover, many of these alternative shapes have biological importance. Thus, the DNA is not simply an informational repository, from which information flows through RNA into proteins. Rather, structural information exists within the specific sequence patterns of the bases. This structural information dictates the interaction of DNA with proteins to carry out processes of DNA replication, transcription into RNA, and repair of errors or damage to the DNA.

The Components of DNA

DNA is composed of purine (adenine and guanine) and pyrimidine (cytosine and thymine) bases, each connected through a ribose sugar to a phosphate backbone. Many variations are possible in the chemical structure of the bases and the sugar, and in the structural relationship of the base to the sugar that result in differences in helical shape and form. The most common DNA helix, B-DNA, is a double helix of two DNA strands with about 10.5 base pairs per helical turn.

Bases and Base Pairs

The four bases found in DNA are shown in Figures 1 and 2. The purines and pyrimidines are the informational molecules of the genetic blueprint for the cell. The two sides of the helix are held together by hydrogen bonds between base pairs. Hydrogen bonds are weak attractions between a hydrogen atom on one side and an oxygen or nitrogen atom on the other. Hydrogen atoms of amino groups serve as the hydrogen bond donor while the carbonyl oxygens and ring nitrogens serve as hydrogen bond acceptors. The specific location of hydrogen bond donor and acceptor groups gives the bases their specificity for hydrogen bonding in unique pairs. Thymine (T) pairs with adenine (A) through two hydrogen bonds, and cytosine (C) pairs with guanine (G) through three hydrogen bonds (Figure 2). T does not normally pair with G, nor does C normally pair with A.

Deoxyribose Sugar

In DNA the bases are connected to a ?-D-2-deoxyribose sugar with a hydrogen atom at the 2? ("two prime") position. The sugar is a very dynamic part of the DNA molecule. Unlike the nucleotide bases, which are planar and rigid, the sugar ring is easily bent and twisted into various conformations (which exist in different structural forms of DNA). In canonical B-DNA, the accepted and most common form of DNA, the sugar configuration is known as C2? endo.

Nucleosides and Nucleotides

The term "nucleoside" refers to a base and sugar. "Nucleotide," on the other hand, refers to the base, sugar, and phosphate group (Figure 1). A bond, called the glycosidic bond, holds the base to the sugar and the 3?-5? ("three prime-five prime") phosphodiester bond holds the individual nucleotides together. Nucleotides are joined from the 3? carbon of the sugar in one nucleotide to the 5? carbon of the sugar of the adjacent nucleotide. The 3? and the 5? ends are chemically very distinct and have different reactive properties. During DNA replication, new nucleotides are added only to the 3? OH end of a DNA strand. This fact has important implications for replication.

The Structure of Double-Stranded DNA

As mentioned above, the two individual strands are held together by hydrogen bonds between individual T·A and C·G base pairs. In DNA, the distance between the atoms involved is 2.8 to 2.95 angstroms (10?10 meters). While individually weak, the large number of hydrogen bonds along a DNA chain provides sufficient stability to hold the two strands together.

The stabilization of duplex (double-stranded) DNA is also dependent on base stacking. The planar, rigid bases stack on top of one another, much like a stack of coins. Since the two purine.pyrimidine pairs (A.T and C.G) have the same width, the bases stack in a rather uniform fashion. Stacking near the center of the helix affords protection from chemical and environmental attack. Both hydrophobic interactions and van der Waal's forces hold bases together in stacking interactions. About half the stability of the DNA helix comes from hydrogen bonding, while base stacking provides much of the rest.

Double-stranded DNA in its canonical B-form is a right-handed helix formed by two individual DNA strands aligned in an antiparallel fashion (a right-handed helix, when viewed on end, twists clockwise going away from the viewer). Antiparallel DNA has the two strands organized in the opposite polarity, with one strand oriented in the 5?-3? direction and the other oriented in the 3?-5? direction.

In the right-handed B-DNA double helix, the stacked base pairs are separated by about 3.24 angstroms with 10.5 base pairs forming one helical turn (360°), which is 35.7 angstroms in length. Two successive base pairs, therefore, are rotated about 34.3° with respect to each other. The width of the helix is 20 angstroms. An idealized model of the double helix is shown in Figure 3. As can be seen, the organization of the bases creates a major groove and a minor groove.

Adenine and thymine are said to be complementary, as are cytosine and guanine. Complementary means "matching opposite." The shapes and charges of adeninne and thymine complement each other, so that they attract one another and link up (as do cytosine and guanine). Indeed, one entire strand of duplex DNA is complementary to the opposing strand. During replication, the two strands unwind, and each serves as a template for formation of new complementary strand, so that replication ends with two exact double-stranded copies.

Alternative DNA Conformations

While the vast majority of the DNA exists in the canonical B-DNA form, DNA can adopt an amazing array of alternative structures. This is the result of certain particular sequence arrangements of DNA and, in many cases, energy in the DNA double helix from DNA supercoiling, the property of DNA in which the double helix, in a high-energy state, becomes twisted around itself. Some alternative DNA conformations identified are shown in Figure 4.

Unwound DNA

Since A·T base pairs contain two hydrogen bonds and C·G base pairs contain three, A+T-rich tracts are less thermally stable that C+G-rich tracts in DNA. Under denaturing conditions (heat or alkali), the DNA begins to "melt" (separate), and unwound regions of DNA will form, and it is the A+T-rich sequences that melt first. In addition, in the presence of superhelical energy (a high-energy state of DNA resulting from its supercoiling, which is the natural form of DNA in the chromosomes of most organisms), A+T-rich regions can unwind and remain unwound under conditions normally found in the cell. Such sites often provide places for DNA replication proteins to enter DNA to begin the process of chromosome duplication.

Cruciform Structures

DNA sequences are said to be palindromic when they contain inverted repeat symmetry, as in the sequence GGAATTAATTCC, reading from the 5? to the 3? end. Palindromic sequences can form intramolecular bonds (within a single strand), rather than the normal intermolecular (between the two complementary strands), hydrogen bonds. To form cruciforms ("cross-shaped"), the DNA must form a small unwound structure, and then base pairs must begin to form within each individual strand, thus forming the four-stranded cruciform structure.

Slipped-Strand DNA

Slipped-strand DNA structures can form within direct repeat DNA sequences, such as (CTG)n·(CAG)n and (CGG)n·(CCG)n (where "n" denotes a variable number of repetitions). They form following denaturation, after the strands become unwound, and during renaturation, when the strands come back together. To form slipped-strand DNA, the opposite strands come together in an out-of-alignment fashion, during renaturation. Expansion of such triplet repeats are features of certain neurological diseases.

Intermolecular Triplex DNA

Three-stranded, or triplex DNA, can form within tracts of polypurine.polypyrimidine sequence, such as (GAA)n·(TTC)n. Purines, with their two-ring structures, have the potential to form hydrogen bonds with a second base, even while base paired in the canonical A·T and G·C configurations. This second type of base pair is called a Hoogsteen base pair, and it can form in the major groove (the top of the base pair representations in Figure 2). Pyrimidines can only pair with a single other base, and thus a long Pu·Py tract must be present for triplex DNA formation. The important factor for triplex DNA formation is the presence of an extended purine tract in a single DNA strand. The third-strand base-pairing code is as follows: A can pair with A or T; G can pair with a protonated C (C+) or G.

Intramolecular Triplex DNA

When a Pu·Py tract exists that has mirror repeat symmetry (5? GAAGAG-GAGAAG 3?), an intramolecular triplex can form, in which half of the Pu.Py tract unwinds and one strand wraps into the major groove, forming a triplex. The structure in Figure 4 shows the pyrimidine strand (CTT) pairing with the purine strand (GAA) of a canonical DNA duplex. In an intramolecular triplex, one strand of the unwound region remains unpaired, as shown.

Quadruplex DNA

DNA sequences containing runs of G·C base pairs can form quadruplex, or four-stranded DNA, in which the four DNA strands are held together by Hoogsteen hydrogen bonds between all four guanines. The four guanines are aligned in a plane, and the successive rings of guanines are stacked one upon another.

Left-Handed Z-DNA

Alternating runs of (CG)n·(CG)n or (TG)n·(CA)n dinucleotides in DNA, under superhelical tension or high salt (more than 3 M NaCl) (M, moles per liter) can adopt a left-handed helix called Z-DNA. In this form, the two DNA strands become wrapped in a left-handed helix, which is the opposite sense to that of canonical B-DNA. This can occur within a small region of a larger right-handed B-DNA molecule, creating two junctions at the B-Z transition region.

Curved DNA

DNA containing tracts of (A)3-4·(T)3-4 (that is, runs of three or four bases of A in one strand and a similar run of T in the other) spaced at 10-base pair intervals can adopt a curved helix structure.

In summary, DNA can exist in a very stable, right-handed double helix, in which the genetic information is very stable. Certain DNA sequences can also adopt alternative conformations, some of which are important regulatory signals involved in the genetic expression or replication of the DNA.


Sinden, Richard R. DNA Structure and Function. San Diego: Academic Press, 1994.

Human Genome

The human genome is the genome of Homo sapiens, which is composed of 24 distinct pairs of chromosomes (22 autosomal + X + Y) with a total of approximately 3 billion DNA base pairs containing an estimated 20,000–25,000 genes. [1] The Human Genome Project has produced a reference sequence of the euchromatic human genome, which is used worldwide in biomedical sciences. The human genome is much more gene-sparse than was initially predicted at the outset of the Human Genome Project, with only about 1.5% of the total length serving as protein-coding exons, with the rest of the genome comprised by RNA genes, regulatory sequences, introns and controversially so-called junk DNA.


There are 24 distinct human chromosomes: 22 autosomal chromosomes, plus the sex-determining X and Y chromosomes. Chromosomes 1–22 are numbered roughly in order of decreasing size. Somatic cells usually have one copy of chromosomes 1–22 from each parent, plus an X chromosome from the mother, and either an X or Y chromosome from the father, for a total of 46.


There are an estimated 20,000–25,000 human protein-coding genes. Surprisingly, the number of human genes seems to be less than a factor of two greater than that of many much simpler organisms, such as the roundworm and the fruit fly. However, human cells make extensive use of alternative splicing to produce several different proteins from a single gene, and the human proteome is thought to be much larger than those of the aforementioned organisms.

Most human genes have multiple exons, and human introns are frequently much longer than the flanking exons.

Human genes are distributed unevenly across the chromosomes. Each chromosome contains various gene-rich and gene-poor regions, which seem to be correlated with chromosome bands and GC-content. The significance of these nonrandom patterns of gene density is not well understood.

In addition to protein coding genes, the human genome contains thousands of RNA genes, including tRNA, ribosomal RNA, microRNA, and other non-coding RNA genes.

Regulatory sequences

The human genome has many different regulatory sequences which are crucial to controlling gene expression. These are typically short sequences that appear near or within genes. A systematic understanding of these regulatory sequences and how they together act as a gene regulatory network is only beginning to emerge from computational, high-throughput expression and comparative genomics studies.

Identification of regulatory sequences relies in part on evolutionary conservation. The evolutionary branch between the human and mouse, for example, occurred 70–90 million years ago.[3] So computer comparisons of gene sequences that identify conserved non-coding sequences will be an indication of their importance in duties such as gene regulation. [4]

Another comparative genomic approach to locating regulatory sequences in humans is the gene sequencing of the puffer fish. These vertebrates have essentially the same genes and regulatory gene sequences as humans, but with only one-eighth the "junk" DNA. The compact DNA sequence of the puffer fish makes it much easier to locate the regulatory genes.[5]

Other DNA

Protein-coding sequences (specifically, coding exons) comprise less than 1.5% of the human genome.[2] Aside from genes and known regulatory sequences, the human genome contains vast regions of DNA the function of which, if any, remains unknown. These regions in fact comprise the vast majority, by some estimates 97%, of the human genome size. Much of this is comprised of:

repeat elements

* Tandem repeats
o Satellite DNA
o Minisatellite
o Microsatellite
* Interspersed repeats


* Retrotransposons
+ Ty1-copia
+ Ty3-gypsy
o Non-LTR
* DNA Transposons


However, there is also a large amount of sequence that does not fall under any known classification.

Much of this sequence may be an evolutionary artifact that serves no present-day purpose, and these regions are sometimes collectively referred to as "junk" DNA. There are, however, a variety of emerging indications that many sequences within are likely to function in ways that are not fully understood. Recent experiments using microarrays have revealed that a substantial fraction of non-genic DNA is in fact transcribed into RNA,[6] which leads to the possibility that the resulting transcripts may have some unknown function. Also, the evolutionary conservation across the mammalian genomes of much more sequence than can be explained by protein-coding regions indicates that many, and perhaps most, functional elements in the genome remain unknown.[7] The investigation of the vast quantity of sequence information in the human genome whose function remains unknown is currently a major avenue of scientific inquiry. [8]


Most studies of human genetic variation have focused on single nucleotide polymorphisms (SNPs), which are substitutions in individual bases along a chromosome. Most analyses estimate that SNPs occur on average somewhere between every 1 in 100 and 1 in 1,000 base pairs in the euchromatic human genome, although they do not occur at a uniform density. Thus follows the popular statement that "we are all, regardless of race, genetically 99.9% the same", [9] although this would be somewhat qualified by most geneticists. For example, a much larger fraction of the genome is now thought to be involved in copy number variation. [10] A large-scale collaborative effort to catalog SNP variations in the human genome is being undertaken by the International HapMap Project.

The genomic loci and length of certain types of small repetitive sequences are highly variable from person to person, which is the basis of DNA fingerprinting and DNA paternity testing technologies. The heterochromatic portions of the human genome, which total several hundred million base pairs, are also thought to be quite variable within the human population (they are so repetitive and so long that they cannot be accurately sequenced with current technology). These regions contain few genes, and it is unclear whether any significant phenotypic effect results from typical variation in repeats or heterochromatin.

Most gross genomic mutations in germ cells probably result in inviable embryos; however, a number of human diseases are related to large-scale genomic abnormalities. Down syndrome, Turner Syndrome, and a number of other diseases result from nondisjunction of entire chromosomes. Cancer cells frequently have aneuploidy of chromosomes and chromosome arms, although a cause and effect relationship between aneuploidy and cancer has not been established.

Genetic disorders

These conditions are caused by abnormal expression of one or more genes that matches a clinical phenotype. The disorder may be caused by a gene mutation, an abnormal number of chromosomes, or triplet expansion repeat mutations. Defective genes can be inherited from the parents, in which case it is known as a hereditary disease. There are around 4,000 known genetic disorders,[citation needed] with the most common being cystic fibrosis.

Studies of genetic disorders is often performed by means of population genetics. Treatment is performed by a geneticist-physician trained in clinical genetics. The results of the Human Genome Project are likely to provide increased availability of genetic testing for gene-related disorders, and eventually improved treatment. Parents can be screened for hereditary conditions and counselled on the consequences, the probability it will be inherited, and how to avoid or ameliorate it in their offspring.

One major gross effect on human phenotypes derives from gene dosage, whose effects play a role in disorders caused by duplication, omission, or disruption of chromosomes. For example, those afflicted with Down syndrome, or trisomy 21, experience high rates of Alzheimer's disease, an effect thought to be related to the overexpression of the Alzheimer's-related amyloid precursor protein whose gene is located on chromosome 21. By contrast, Down's syndrome sufferers experience lower rates of breast cancer, possibly due to the overexpression of a tumor-suppressor gene.


Comparative genomics studies of mammalian genomes suggest that approximately 5% of the human genome has been conserved by evolution since the divergence of those species approximately 200 million years ago, containing the vast majority of genes.[7][8] Intriguingly, since genes and known regulatory sequences probably comprise less than 2% of the genome, this suggests that there may be more unknown functional sequence than known functional sequence. A smaller, but large, fraction of human genes seem to be shared among most known vertebrates.

The chimpanzee genome is 95% identical to the human genome. On average, a typical human protein-coding gene differs from its chimpanzee ortholog by only two amino acid substitutions; nearly one third of human genes have exactly the same protein translation as their chimpanzee orthologs. A major difference between the two genomes is human chromosome 2, which is equivalent to a fusion product of chimpanzee chromosomes 12 and 13.

Humans have undergone an extraordinary loss of olfactory receptor genes during our recent evolution, which explains our relatively crude sense of smell compared to most other mammals. Evolutionary evidence suggests that the emergence of color vision in humans and several other primate species has diminished the need for the sense of smell.

Mitochondrial genome

The human mitochondrial genome, while usually not included when referring to the "human genome", is of tremendous interest to geneticists, since it undoubtedly plays a role in mitochondrial disease. It also sheds light on human evolution; for example, analysis of variation in the human mitochondrial genome has led to the postulation of a recent common ancestor for all humans on the maternal line of descent. (see Mitochondrial Eve)

Due to the lack of a system for checking for copying errors, Mitochondrial DNA (mtDNA) has a more rapid rate of variation than nuclear DNA. This 20-fold increase in the mutation rate allows mtDNA to be used for more accurate tracing of maternal ancestry. Studies of mtDNA in populations have allowed ancient migration paths to be traced, such as the migration of Native Americans from Siberia or Polynesians from southeastern Asia. It has also been used to show that there is no trace of Neanderthal DNA in the European gene mixture.[15]


A variety of features of the human genome that transcend its primary DNA sequence, such as chromatin packaging, histone modifications and DNA methylation, are important in regulating gene expression, genome replication and other cellular processes.[16][17] These "epigenetic" features are thought to be involved in cancer and other abnormalities, and some may be heritable across generations.

Share this with your friends
DNA Extraction.pdf106.7 KB
See Your DNA.pdf28.69 KB
Nutrition and DNA.pdf180.38 KB
Genes and Nutrition.pdf42.44 KB