A Sweet and Simple Explanation of CentiMorgans and SNPs

by Jane Horvath

Originally Published in the THE BURGENLAND BUNCH NEWS – No. 284
February 28, 2018, © 2018 by The Burgenland Bunch

 

CentiMorgans (cMs) and SNPs (“snips”) are terms used by DNA testing companies to show the relationship proximity between matches. Generally, the more closely related two people are, the more cMs and SNPs they share. It seems like a simple enough concept, but scouring the internet for a reasonably basic explanation turns up a lot of jargon that leads to even more questions.

According to the experts from The International Society of Genetic Genealogy,

A centiMorgan (cM) or map unit (m.u.) is a unit of recombinant frequency which is used to measure genetic distance. It is often used to imply distance along a chromosome, and takes into account how often recombination occurs in a region.1

While that may be an accurate definition, ISOGG assumes the reader already has an understanding of genetic recombination. Maybe the definition for SNPs will be a little clearer…

A single-nucleotide polymorphism (SNP, pronounced snip) is a DNA sequence variation occurring when a single nucleotide adenine (A), thymine (T), cytosine (C), or guanine (G]) in the genome (or other shared sequence) differs between members of a species or paired chromosomes in an individual.2

Although this is technically accurate and useful information for genetic researchers, it can be argued that the average genealogist does not need to obtain immediate mastery of this complex topic.

It may be unorthodox, but understanding cMs and SNPs can be a whole lot easier if we look at genetics as something more familiar and delicious, like a chocolate bar.

Like chromosomes, chocolate bars have segments. Below, we have a pair of these figurative chocolate chromosomes, each with seven segments. The centiMorgans are the six “break” points between the segments.

Each chromosome has a different number of centiMorgans, and this number varies slightly depending on the testing company. Chromosome 1, for example has about 280cM, while chromosome 21 only has about 70 cM.3 This means there are about 280 “break” points where DNA segments are likely to split and rearrange with each generation on chromosome 1.

If this piece of chocolate were to fall on the ground, it will likely break at one or more of the recessed spots between the squares. Let’s imagine that this chocolate bar isn’t perfectly uniform and some of the break points are weaker than others. When dropped, the weaker spots would be more prone to breaking, just as certain parts of DNA segments are more prone to recombination.

In genetic recombination, each parent contributes one entire chromosome to each of the 23 pairs a child inherits. Let’s say the broken chocolate chromosome above came from your mother. This chocolate segment is made up segments from your mother’s ancestors. The other chromosome in the pair, from your father, may have “broken” in another place entirely.

Though some spots are more likely to break, or recombine, than others, the process is ultimately random.

Although you received half of your DNA from each of your parents, the percent distribution begins to vary as the distance between you and the ancestor increases. Grandparents each contribute approximately 25% to your DNA, but it is common to share more with one of the four grandparents than the other three.

DNA RecombinationIn the image above, this is the (not yet broken) segment inherited from your mother. The tinted squares show their origin and, in this example, the segment contains more DNA from your mother’s paternal grandfather than from her other grandparents. In this way, siblings may inherit entirely different sections of DNA from their grandparents and great-grandparents, and so on.

Unlike break points on chocolate bars, which divide equally-sized squares, centiMorgans do not occur between identically-sized bits of genetic code. As previously stated, a cM is not a unit of length, but a unit of recombinant frequency.

To better illustrate this concept, I present the very strange-looking pairs of chocolate bar segments.

genetic recombination centiMorgansEach chromosome in a pair has the same number of centiMorgans (ignoring the X and Y chromosomes for now—they play by their own rules) and the image above represents two different pairs of 7cM chromosome segments. With only cM values to compare, both of these have the same genetic significance, even though one is clearly longer. And to reiterate, a cM is a spot on a chromosome segment that is likely to split and recombine. This is why we count the spaces between the chocolate sections and not the sections themselves.

A segment from either of these pairs could refer to a 7cM segment showing “in-common-with” one of your DNA matches. It’s even possible for the smaller segment to have more genetic data. This is where SNPs come into play.

A strand of DNA is represented by combinations of the letters A, C, G, and T. These letters represent the individual building blocks of our DNA, called nucleotides. The code we each inherit from our parents is a replica of segments of our parent’s code—with a few exceptions. About once every 300 nucleotides, one of the letters will change. This change is called an SNP.

Below are two nearly-identical lines of code. The top line represents the code from a parent and the bottom line represents the code inherited by their child.

Look closely you’ll see the letter G in the second strand is the only difference between these two lines of code.

ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGC

ACAAGATGCCATTGTCCCCGGGCCTCCTGCTGCTGCTGC

That G is a sort of genetic “typo4” or a nucleotide that has mutated. A nucleotide that has mutated is aptly called a single nucleotide polymorphism, or SNP. Although the word “mutation” sounds bad, its actually a normal part of human evolution. It is from these mutations that we are able to use DNA testing to identify relatives. When two people share enough of these “typos,” it can be assumed they received these bits of code from a common ancestor and are therefore related.

To illustrate SNPs, I’ll throw some sprinkles on this chocolate segment.

SNPs per centiMorgan

As previously stated, SNPs occur about once every 300 nucleotides, but that is only an average. There are portions of certain chromosomes that have much fewer mutations than others. Like the sprinkles on this chocolate bar, they are not evenly spaced. Although the second chocolate segment from the left may be the biggest, it has the fewest sprinkles, or the smallest number of SNPs.

In DNA testing, the SNPs in your code are compared to the SNPs of other users in their database. Each testing company has a threshold, or minimum criteria in order to show another user as your match. Usually, this is a combination of number of matching SNPs along with a minimum cM count.

Per 23andme:

Our simulations have concluded that we can confidently detect related individuals if they have at least one continuous region of matching SNPs (Single Nucleotide Polymorphisms) that is longer than our minimum threshold of 7cM (centiMorgans) long and at least 700 SNPs.5

Family Tree DNA’s criteria focuses less on SNPs and more on cMs:

A match is declared if two people share a segment of 9 cM or more, regardless of the number of total shared cM. However, if there’s not a block that’s 9 cM or greater, the minimum of 20 shared cM with a longest block of 7.69 cM applies … Criteria for X-chromosome matches: 1 cM and 500 SNPs for both males and females; matches must already meet the autosomal DNA matching criteria.6

Hopefully this far-from-clinical analogy helps to demystify centiMorgans and SNPs. For the beginner, understanding the very basic concepts is really all that’s needed to get started. I recommend visiting the International Society of Genetic Genealogy. or any of these ISOGG-approved blogs7 for more advanced DNA resources.

  1. https://isogg.org/wiki/CentiMorgan
  2. https://isogg.org/wiki/Single-nucleotide_polymorphism
  3. https://isogg.org/wiki/CentiMorgan
  4. https://www.23andme.com/gen101/snps/
  5. https://customercare.23andme.com/hc/en-us/articles/212170958-DNA-Relatives-Detecting-Relatives-and-Predicting-Relationships
  6. https://isogg.org/wiki/Autosomal_DNA_match_thresholds
  7. https://isogg.org/wiki/Genetic_genealogy_blogs

8 Comments

  1. This is vividly helpful. I’ve noticed elsewhere that there’s a lot of cM and SNP data is being posted and shared, with not a whole lot of interpretation accompanying it. One one Irish genealogy site, I have discovered an individual whose GEDMatch kit shows matches with me on 20 of 24 chromosomes. But all less than 5 cM. So what am I to make of that? I’d be interested to know if there are any guidelines for interpretation. Meanwhile, thank you for this useful introduction to cM and SNP metrics. The references are helpful, too. Best regards.

    • Hello Peter,

      Thank you for your comments. I agree, SNP and cM correlation don’t get enough attention. In your example, it’s very likely that individual is related to you in multiple ways, via numerous sets of shared ancestors. I see this with most of my matches from my own endogamous ancestral groups. I see this whenever assisting someone of Jewish or Amish descent. Really, this is common with anyone descended from people who married within a limited gene pool for many generations.

      I hope to write an article elaborating on this topic. For now, I’ll offer a suggestion/example based on my own research.

      You mentioned GEDMatch, so I’ll use an example from Genesis. When I run a 1:1 comparison, I raise the SNP minimum to 500 or 1000 and lower the cM min to 5. I’ll explain.

      This is B.N., the first DNA match I ever placed in my tree, years ago. She’s my 2c3r. Simply put, we’re 2nd cousins, but I’m 3 generations younger than her.

      Chr Start Pos’n End Pos’n Centimorgans (cM) SNPs
      2 105XXXXXX 111XXXXXX 5.0 1,112
      17 10XXXXXX 13XXXXXX 6.9 851
      17 39XXXXXX 53XXXXXX 12.7 2,544
      20 15XXXXXX 23XXXXXX 12.7 2,306

      The cM count is ~34, but the SNPs are quite dense across those small segments (two segments with >2000 unique markers).

      However, I have plenty of ~34 cM matches who are far too distant to place in my family tree.

      Compare B.N. to this next match, P.L. I have no idea how I’m related to P.L, other than that our shared matches all lead back to 1700s Kentucky (and before that: Ireland). That part of my ancestry is more of a wreath than a tree.

      Chr Start Pos’n End Pos’n Centimorgans (cM) SNPs
      1 242XXXXXX 249XXXXXX 11.9 362
      2 231XXXXXX 238XXXXXX 12.6 451
      12 125XXXXXX 132XXXXXX 24.4 631

      I did not adjust the SNP minimum for P.L., so I guess Genesis sees larger segments and reports her as a match with a closer generational distance than B.N. (B.N: 4.3 vs P.L: 4.1), but that’s not likely the case.

      Sure, the SNPS exist across larger portions of the chromosome, but there are far fewer mutations along the strand. This could be a 6th cousin, or a cousin via several different branches of my tree. Basically, B.N. and P.L. are the same size candy bar, but B.N. has a lot more sprinkles.

      I’d recommend sticking to the 5cM minimum and looking for segments that have >1000 SNPs, if you’re trying to find 3rd-5th cousins or closer.

      I hope this helped answer your question. Thanks again!

      – Jane

  2. This makes centimorgans and SNPs so much easier to understand! I’ve been trying to interpret and understand my own DNA results, and your article is hands down the best one I’ve come across. I appreciate the ease of reading and useful illustrative points. Thanks! 🙂

  3. I have a question around this. Quite often there are matches with reasonably high cM (> 7), but low in SNPs (< 500), and conversely, many quite low in cMs but very high in SNPs. What, if any, is the significance of these?

    • Hello Paul,

      Great question! Using my candy bar analogy, sharing significant cM, with few SNPs is much like a large chunk of chocolate with very few sprinkles. Meaning, the test shows you have a decent-sized segment of DNA, but there aren’t a lot of shared mutations on that segment. A 2013 study conducted by Penn State University noticed that some sections of chromosomes were less prone to recombination than others. As this relates to candy, some of the “break points” on the chocolate bar appear to be stronger than others. These areas are considered “cold.” Areas that are prone to break more frequently are “hot.” This is why you may have one single 20cM, low-SNP match with a very distant relative. I see this often with 7th-8th cousins. See my post on hot and cold segments for more details.

  4. I wish there was a more expressive way to say thanks besides merely saying THANK YOU! This is one of the easiest and best descriptions of these 2 (very important 2) terminologies constantly found in genetics and DNA matching – and 2 of the hardest to wrap my mind around until I read this article! THANK YOU, THANK YOU, THANK YOU!!! 😁😁😁 I think it may have something to do with my love of chocolate bars, but whatever it is…I am so grateful!

Leave a Reply

Your email address will not be published.


*



The maximum upload file size: 128 MB.
You can upload: image, audio, video, document, spreadsheet, interactive, text, archive, code, other.
Links to YouTube, Facebook, Twitter and other services inserted in the comment text will be automatically embedded.