What is DNA?
DNA stands for deoxyribonucleic acid. DNA is the genetic material and holds the instructions for all life on the planet. The genome of every living thing on the planet is made of DNA. You will probably have heard of the DNA double helix. The chemical structure of DNA is linear, with two antiparallel strands that pair together through the interaction of chemical bases. The two strands twist around each other to make the characteristic double helix structure that was famously solved by James Watson and Francis Crick in 1953 with the help of data produced by Rosalind Franklin and Maurice Wilkins. The building blocks of DNA are called nucleotides which chemically are made of a ribose sugar group, a phosphate group (which together form the sugar-phosphate backbone of DNA) and one of 4 nucleobases (or just bases for short). These bases come in 4 varieties: adenine (A), cytosine (C), guanine (G) and thymine (T). They are chemically different and pair together in the centre of the DNA double helix, forming what we call "base-pairs". A always base-pairs with T and G always base-pairs with C. In this way if we know the order or "sequence" of bases in one strand of a DNA molecule we automatically know what the "sequence" will be on the second strand of DNA.
What is a genome?
A genome can be described as an organisms complete set of genetic instructions. Hence the instructions for making a human are written into the human genome in the language of DNA bases. I like to think of the human genome as a library of information about how to make a human. Each of the trillions of cells that make up the human body have this library at their centre, housed in the cell's nucleus. The human genome is around 3 billion base-pairs long and divided into 23 chromosomes. If you stretched the human genome out into one long string of DNA it would be 103 cm long. Their are 2 copies of the human genome in every cell, one inherited from mum and the other from dad. So there are over 2 metres of DNA in every human cell, all wound up and condensed into a microscopic nucleus. It is estimated that if all of the DNA in a human body was stretched out into one long string it would reach from the earth to the sun and back 4 times. So there is an awful lot of DNA inside you and every other living thing on the planet!!
What is a gene?
So if a genome is a library, the genes are the books in the library. Genes are small sections of DNA, usually in the range of hundreds to several thousands of base-pairs long. They are many and varied and most of them contain the instructions for making proteins. Proteins come in a wide variety of shapes and sizes and make up most of the structural and functional components of cells and living things. It is the proteins that do all the important jobs of the cell, the DNA just contains the instructions, the genes are the instruction manuals. So how do you get from a gene sequence to a protein? Well, the cell has a way of copying or "transcribing" the DNA into the chemically similar molecule RNA (ribonucleic acid). This process, called transcription, occurs in the nucleus. The RNA is then transported into the cytoplasm of the cell where a process called translation occurs. Ribosomes are protein making factories that translate the language of DNA/RNA into the chemically very different language of proteins. While most genes are what we call "protein coding" some do not code for proteins but code only for the intermediate RNA molecules which can also play various roles in the cell.
What is the genetic code?
The genetic code tells us how to get from the DNA/RNA language into the protein language. While DNA and RNA both have 4 building blocks, which only differ in that RNA contains an extra oxygen atom in the ribose group and the base uracil (U) instead of thymine (T), the building blocks of proteins are very different. Proteins are also polymer molecules but are made up of strings of amino acids rather than nucleotides. These amino acids come in 20 varieties which all have different chemical properties. It is the order or sequence of these amino acids that give proteins their unique structures and functions. To get from a 4 base code into a 20 amino acid code requires a 3 base sequence of RNA to be translated into a single amino acid. With 4 possible bases at 3 positions there 64 possible sequence combinations. These 64 triplet sequences are called codons and each one codes for a particular amino acid. You can see the 64 possible codons and the amino acids that they code for in the table below. This is the "genetic code" or "codon table". This code is remarkably well conserved across all forms of life from the tiniest bacteria to ourselves, hinting at the common evolutionary origin of all life of earth. While the majority of codons code for amino acids, 3 are known as stop codons - when the ribosome comes across one of these it will terminate protein synthesis. Also, as there are 64 possible codons and only 20 amino acids, there is what we call "redundancy" in the genetic code - for most amino acids there is more than one codon that codes for them. The codon ATG is also important - it is the only codon for the amino acid methionine (Met or M) and is also the initiator or start codon - all protein translation starts at an ATG codon. You will notice in the table below that each of the 20 amino acids has a 3-letter and 1-letter code. It is the 1-letter code that is important for understanding the encoded messages in the Dan's DNA "Sangerism" designs.
What is DNA sequencing?
As you can imagine, the order or "sequence" of bases in a DNA molecule is very important. The 3 billion or so bases in the human genome are strung together in a very particular way to make all the 20,000 or so protein coding genes and all the other important regulatory regions that all work together to create a functioning human. If we know the sequence of one strand of a DNA molecule, not only do we automatically know the sequence of the other strand, we can predict the sequence of RNA molecules that will be transcribed from it and also the sequence of proteins that will be translated from the RNA.
Methods that can determine the order or sequence of DNA bases are immensely powerful and since the first methods were developed in the 1970s our understanding of the molecular basis of life have been revolutionised. We now know the full DNA sequence of the human genome and the genomes of many other organisms. This has had profound effects on our understanding of biology and medicine and has led to world changing discoveries and inventions too numerous to mention.
In the 1970s Fred Sanger invented the "chain termination" or "dideoxy" method of DNA sequencing. This method is commonly called "Sanger" sequencing, after its inventor, to differentiate it from more modern methods of DNA sequencing that are collectively referred to as "next-generation sequencing". Despite these very powerful modern methods, that can sequence whole genomes in a day or two, the original Sanger method is still in common usage today. After refinements and automation of his original method, it was Sanger's sequencing method that was used to map and sequence the first human genome, a monumental international effort that took most of the 1990s to complete.
How does Sanger DNA sequencing work?
So how does DNA sequencing work? Well, to understand it you need to know a bit about how DNA is synthesised in cells. An enzyme called DNA polymerase uses the information on one strand of DNA to make a complementary copy that is identical to the second strand of DNA. This process occurs in all cells before they divide to make new identical copies of the genome that are passed on to each daughter cell. In DNA sequencing we perform a modified version of this process in a tube. We add in the piece of DNA we want to know the sequence of (known as template DNA), add a small piece of DNA that we know the sequence of (called a primer), add in the nucleotide building blocks of DNA (A, C, G and T) and the DNA polymerase enzyme. In addition to all these ingredients we add in some modified building blocks known as "dideoxy-nucleotides". These are the all important ingredients that make the whole thing work. These modified nucleotides are chemically almost identical to the normal A, C, G and T. The only difference being that they lack an oxygen atom, hence "dideoxy". This oxygen atom is essential for adding the next nucleotide to the growing chain of DNA. When a dideoxy-nucleotide is added by the DNA polymerase, the DNA chain can no longer be extended and it is terminated. Hopefully you can see now why this method is also known as "dideoxy" or "chain termination" sequencing.
After running a DNA sequencing reaction you end up with millions of DNA molecules of many different lengths, all terminated with these dideoxy-nucleotides. By adding different coloured fluorescent dyes to the dideoxy-nucleotides we can visualise them. We separate the DNA molecules according to size using a process known as electrophoresis. The small ones travel fast and the larger ones more slowly through a capillary tube. When they get to the end of the tube a laser and detector are used to measure the fluorescence attached to the DNA molecules. Those that end in A glow green, those that end in T glow red, those that end in C glow blue and those that end in G glow yellow. When the DNA sequencing machine measures the fluorescence it plots a graph known as a chromatogram or sequence trace. The chromatogram consists of a series of evenly spaced peaks made up of 4 colours, each one corresponding to the colour of the fluorescent dye that was attached to the modified nucleotides that are terminating each DNA molecule. The one exception being G that is shown as black peaks rather than yellow, simply because yellow does not show up so well when printed on white paper. So by reading the coloured peaks from left to right in the chromatogram we now have determined the sequence of our DNA molecule.
How DNA sequencing works...
What is genetics?
Genetics is the study of genes, genetic variation and heredity in living things. As DNA is the genetic information and holds the instructions for making living things it is all important in the study of genetics. For example eye colour and hair colour are physical traits we can see that are determined by variations in the DNA sequence of specific genes. These variations can cause either more or less of a particular protein to be made or the particular protein that is made may have different properties. The results of these genetic variations are reflected in the traits we see. Because DNA is inherited from parents to child through the generations these traits often run in families, although some are more complex than others and can be controlled by many different genes.
A simple example is blue eye colour that is controlled by a single genetic variation on human chromosome 15. Blue eye colour is known as a recessive trait because you need 2 copies of the genetic variation to have blue eyes, one on the chromosome 15 you inherited from your mother and one on the chromosome 15 you inherited from your father. If you have just one copy of the blue eye colour variant, the second copy on your other chromosome 15 will dominate and you will have brown eyes. Brown eye colour, therefore, is a dominant trait. One of my first SangerArt pieces was inspired by the blue eye colour variant and visually illustrates how the DNA sequence relates to the physical trait of blue eyes. You can see it here.
What is a genetic disease?
In the same way that physical traits are genetically determined, so are many diseases. Most genetic diseases result from alterations to the DNA that are inherited through the generations. They can be as simple as a single DNA base replaced with another base that just so happens to be located in the middle of a protein coding gene. This single "spelling mistake" changes the sequence of a codon that codes for one amino acid to the codon for a different amino acid. If the amino acid in that protein is essential for its function then changing it can have devastating consequences, leading to a faulty, non-functioning protein that can no longer do its job.
An example of a genetic disease, that I have spent much of my life researching, is haemochromatosis. Most cases of haemochromatosis are caused by a single base change on chromosome 6 that is located in the HFE gene. The HFE gene encodes a protein that regulates the balance of iron in the body. When iron levels get too high, the HFE protein senses this and sends a signal to the intestines to tell them to stop absorbing iron. A single base change (G>A) in codon 282 of the HFE gene results in the replacement of the amino acid cysteine (C) with tyrosine (Y) in the HFE protein. This protein, which now contains the "p.C282Y" mutation is faulty and can no longer regulate iron balance. Essentially the body thinks it is iron deficient and allows iron absorption to continue unchecked, resulting in iron overload and the toxic effects of too much iron on many of the body's organ systems. One of the Dan's DNA designs, created for people with this most common genetic condition, depicts the sequence change that causes the p.C282Y mutation in the HFE gene. You can see it here. One of my SangerArtworks called "The Celtic Curse" is also inspired by the HFE gene, the p.C282Y mutation and haemochromatosis, a condition which is common among people with Celtic ancestry. You can see it here.
Haemochromatosis is an example of a recessive disease because you require 2 copies of the faulty gene to get it. Other examples of recessive diseases include cystic fibrosis and sickle cell disease. Dominant diseases only require one copy of the faulty gene to manifest, an example being polycystic kidney disease. Another special group of genetic disease are those that are due to changes on the sex chromosomes (X or Y). X chromosome-linked disorders are more common in males than females, simply because males have only one X chromosome and nothing to balance it out with. Hence, X-linked recessive diseases such as haemophilia are seen far more frequently in males. Haemophilia affected many male decedents of Queen Victoria and was known as the "Royal disease", my SangerArtwork "Royal Blood and DNA" is inspired by this story and the research which pinpointed the exact DNA change that caused it. You can see it here.