A Sweet and Simple Explanation of CentiMorgans and SNPs
Originally Published in the THE BURGENLAND BUNCH NEWS – No. 284
February 28, 2018, © 2018 by The Burgenland Bunch
CentiMorgans (cMs) and SNPs (“snips”) are terms used by DNA testing companies to show the relationship proximity between matches. Generally, the more closely related two people are, the more cMs and SNPs they share. It seems like a simple enough concept, but scouring the internet for a reasonably basic explanation turns up a lot of jargon that leads to even more questions.
According to the experts from The International Society of Genetic Genealogy,
A centiMorgan (cM) or map unit (m.u.) is a unit of recombinant frequency which is used to measure genetic distance. It is often used to imply distance along a chromosome, and takes into account how often recombination occurs in a region.1
While that may be an accurate definition, ISOGG assumes the reader already has an understanding of genetic recombination. Maybe the definition for SNPs will be a little clearer…
A single-nucleotide polymorphism (SNP, pronounced snip) is a DNA sequence variation occurring when a single nucleotide adenine (A), thymine (T), cytosine (C), or guanine (G]) in the genome (or other shared sequence) differs between members of a species or paired chromosomes in an individual.2
Although this is technically accurate and useful information for genetic researchers, it can be argued that the average genealogist does not need to obtain immediate mastery of this complex topic. It may be unorthodox, but understanding cMs and SNPs can be a whole lot easier if we look at genetics as something more familiar and delicious, like a chocolate bar.
Like chromosomes, chocolate bars have segments. Below, we have a pair of these figurative chocolate chromosomes, each with seven segments. The centiMorgans are the six “break” points between the segments.
Each chromosome has a different number of centiMorgans, and this number varies slightly depending on the testing company. Chromosome 1, for example has about 280cM, while chromosome 21 only has about 70 cM.3 This means there are about 280 “break” points where DNA segments are likely to split and rearrange with each generation on chromosome 1.
If this piece of chocolate were to fall on the ground, it will likely break at one or more of the recessed spots between the squares. Let’s imagine that this chocolate bar isn’t perfectly uniform and some of the break points are weaker than others. When dropped, the weaker spots would be more prone to breaking, just as certain parts of DNA segments are more prone to recombination.
In genetic recombination, each parent contributes one entire chromosome to each of the 23 pairs a child inherits. Let’s say the broken chocolate chromosome above came from your mother. This chocolate segment is made up segments from your mother’s ancestors. The other chromosome in the pair, from your father, may have “broken” in another place entirely. Though some spots are more likely to break, or recombine, than others, the process is ultimately random.
Although you received half of your DNA from each of your parents, the percent distribution begins to vary as the distance between you and the ancestor increases. Grandparents each contribute approximately 25% to your DNA, but it is common to share more with one of the four grandparents than the other three.
In the image above, this is the (not yet broken) segment inherited from your mother. The tinted squares show their origin and, in this example, the segment contains more DNA from your mother’s paternal grandfather than from her other grandparents. In this way, siblings may inherit entirely different sections of DNA from their grandparents and great-grandparents, and so on.
Unlike break points on chocolate bars, which divide equally-sized squares, centiMorgans do not occur between identically-sized bits of genetic code. As previously stated, a cM is not a unit of length, but a unit of recombinant frequency. To better illustrate this concept, I present the very strange-looking pairs of chocolate bar segments below.
Each chromosome in a pair has the same number of centiMorgans (ignoring the X and Y chromosomes for now—they play by their own rules) and the image above represents two different pairs of 7cM chromosome segments. With only cM values to compare, both of these have the same genetic significance, even though one is clearly longer. And to reiterate, a cM is a spot on a chromosome segment that is likely to split and recombine. This is why we count the spaces between the chocolate sections and not the sections themselves.
A segment from either of these pairs could refer to a 7cM segment showing “in-common-with” one of your DNA matches. It’s even possible for the smaller segment to have more genetic data. This is where SNPs come into play.
A strand of DNA is represented by combinations of the letters A, C, G, and T. These letters represent the individual building blocks of our DNA, called nucleotides. The code we each inherit from our parents is a replica of segments of our parent’s code—with a few exceptions. About once every 300 nucleotides, one of the letters will change. This change is called an SNP.
Below are two nearly-identical lines of code. The top line represents the code from a parent and the bottom line represents the code inherited by their child. Look closely you’ll see the letter G in the second strand is the only difference between these two lines of code.
That G is a sort of genetic “typo4” or a nucleotide that has mutated. A nucleotide that has mutated is aptly called a single nucleotide polymorphism, or SNP. Although the word “mutation” sounds bad, its actually a normal part of human evolution and it is from these mutations that we are able to use DNA testing to identify relatives. When two people share enough of these “typos,” it can be assumed they received these bits of code from a common ancestor and are therefore related.
To illustrate SNPs, I’ll throw some sprinkles on this chocolate segment.
As previously stated, SNPs occur about once every 300 nucleotides, but that is only an average. There are portions of certain chromosomes that have much fewer mutations than others. Like the sprinkles on this chocolate bar, they are not evenly spaced. Although the second chocolate segment from the left may be the biggest, it has the fewest sprinkles, or the smallest number of SNPs.
In DNA testing, the SNPs in your code are compared to the SNPs of other users in their database. Each testing company has a threshold, or minimum criteria in order to show another user as your match. Usually, this is a combination of number of matching SNPs along with a minimum cM count.
Our simulations have concluded that we can confidently detect related individuals if they have at least one continuous region of matching SNPs (Single Nucleotide Polymorphisms) that is longer than our minimum threshold of 7cM (centiMorgans) long and at least 700 SNPs.5
Family Tree DNA’s criteria focuses less on SNPs and more on cMs:
A match is declared if two people share a segment of 9 cM or more, regardless of the number of total shared cM. However, if there’s not a block that’s 9 cM or greater, the minimum of 20 shared cM with a longest block of 7.69 cM applies … Criteria for X-chromosome matches: 1 cM and 500 SNPs for both males and females; matches must already meet the autosomal DNA matching criteria.6
Hopefully this far-from-clinical analogy helps to demystify centiMorgans and SNPs. For the beginner, understanding the very basic concepts is really all that’s needed to get started. I recommend visiting the International Society of Genetic Genealogy. or any of these ISOGG-approved blogs7 for more advanced DNA resources.