(Close or minimize this page to recover Menu page.)

Some Basic Information On Paternal Lineage or Y-DNA, Tests, Results and Their Interpretation

First, let's explode some myths and misunderstandings. Y-DNA testing, as done for genealogical purposes, carries no information about your personal identity or health status. Therefore, genealogical Y-DNA data is not useful for legal actions nor to discriminate against you on the basis of your health. The Y-DNA tested for genealogical purposes has no known effect on physiology, anatomy or medical condition. It is actually called (and considered to be!) "Junk DNA" for this reason, it doesn't code for any feature or function. By definition, genealogical Y-DNA testing is intended specifically to identify relationships through identity of DNA profiles. There would be too many people who have your exact genealogical Y-DNA profile for it to be useful for unique identification purposes. Genealogical DNA data can't be used to prove paternity because a Y-DNA profile is not unique. A genealogical Y-DNA profile would probably be identical with that of every man in the individual's genealogical line. However, this feature might possibly have potential in demonstrating non-paternity. There are much more effective tests for paternity, medical condition or other such personal information that are legally accepted. Genealogical Y-DNA testing will only allow you to determine that you and some other individual share a common ancestor or that you are a member of some ancient group of people. It can not tell you who the common ancestor was, where he lived nor when. Comparing well researched, conventional, genealogical "paper trails" is necessary to to make this determination. In short, genealogical Y-DNA testing is only one more tool that may help you extend your genealogical research. It is not a quick solution to your genealogical impasses.

OK, now some basic information about Chromosomes and Y-DNA. The Y-DNA (deoxyribonucleic acid) molecule can be thought of as a really long ladder that has been twisted into a spiral. The linking elements in the Y-DNA molecule, comparable to the rungs of our ladder, are composed of four amino acids, guanine, adenine, thymine and cytosine, abbreviated G, A, T and C. These four amino acids occur together in pairs (G-C and T-A). The linking elements (or rungs) can be either of these pairs. A chromosome is simply a package containing a large number of Y-DNA molecules. There are two chromosomes that are of major interest to genealogists, the Y-Chromosome and the Mitochondrial Chromosome, carrying Y-DNA and mtDNA, respectively. Most people use Y-DNA and mtDNA synonymously with Y-Chromosome and Mitochondrial Chromosome. We speak of them in the singular but there are, of course, many, many identical copies of each chromosome type.

The Y-Chromosome is passed only from a father to his male offspring. In fact, it is the Y-Chromosome that determines that the child will be male. Because of the father-to-son mode of transmission, the Y-Chromosome has followed the father's surname line for, perhaps, hundreds of years and the paternal genetic line for, possibly, thousands. Our Y-DNA only undergoes a change (called a mutation) about every 500th father-son transmission on average. Transmissions equal generations in a family line. Your (or, if you are female, your male relative's) Y-DNA was initiated by a mutation at one transmission from some father to his son, probably, many years ago and has been passed down from that man through perhaps hundreds of father-son transmissions until passed to you (or your male relative) from your (his) father. It is this relative, but not complete, stability and limited transmission that allows us to garner genealogical information from our Y-DNA.

But let's get back to our twisted ladder analogy to the Y-DNA molecule. There are sections along the ladder where all rungs are identical over some distance. The section of the ladder where this occurs is called the DYS (DNA Y-Chromosome Segment) or, simply, a marker because it 'marks' the region of interest on the ladder (Y-DNA molecule). These regions are by alpha-numeric codes (i.e., 393 or Y-GATA). This identifying designation is called the DYS# and refers to a specific marker. The number of repeating, identical rungs in this region/DYS#/marker, is referred to as the number of "repeats" at that marker/DYS# (pretty rational, right?). When you submit a sample - usually from cells wiped or washed from your cheeks - and your Y-DNA test results are reported, they will take the form of a list of DYS#s or markers, each associated with the number of repeating elements (rungs) at that location on the molecule (ladder). This will probably look something like the partial "report" below. For example, at DYS#(marker) 393, there are 13 identical elements (or rungs). Your test result, the listing of markers with associated numbers of repeated elements, is called your haplotype.

DYS# 393 390 19/394 391 385a 385b etc.
Repeats 13 23 13 11 16 18 etc.

When a mutation occurs, one of the amino acid sequences is changed so that this sequence (rung) is no longer identical to all the other sequences (or rungs) at that DYS#/marker. Thus, the number of repeats, or repeated elements (identical rungs) at that DYS# or marker has now changed. For example in the case above, a mutation at DYS# 393 might cause the number of repeats to become 12 or 14 instead of 13 as shown. This mutation signals the initiation of a new branch in the genetic line that will continue indefinitely unless there are no sons born in the future (called "daughtering out")or, of course, there is another mutation at a later time. But for now, this new number of repeating elements at this specific DYS# will be passed from father to son unchanged for many generations. So, if your Y-DNA results match with another person, you know that each of you descend from the same individual who received a mutation from his father and started this new branch in the genetic line.

Since mutations do occur, albeit rarely and randomly, we can make statistical estimates of the time since the mutation occurred. There have been a number of studies in which modern populations of father-son combinations have been tested, the numbers of mutations counted and the numbers of transmissions between mutations determined for this population of father-son pairs. The average over all of these studies suggests that a mutation occurs about every 500 father-son transmissions - which corresponds to a mutation rate of 0.002 (1/500). Of course this is the average of all mutations at all markers of interest. The mutation rate at some markers is higher, sometime much higher, than at others. Realistically, there is no way to determine how far back a mutation occurred without a conventional family tree and testing of each limb.

So now you have swabbed your cheeks, had your Y-DNA tested, and gotten your report back . . . so what does it mean and what do you do next? There are at least two ways that these data can be used for your edification. First, there is now a rapidly growing field that seeks to use Y-DNA to identify ancient - maybe even original - groups of our ancestors and the migration patterns taken as they peopled the earth. The second - and infinitely more important for genealogical applications - use of your results comes when you compare your test results, or haplotype, to those of others to determine if you share a common ancestor with another person.

The first usage, determining ancient ancestral groups and migrations, makes the assumption that a mutation that first appeared at some location will tend to concentrate at that location. On the other hand, those carrying the mutation who do migrate will be spread progressively more thinly over the rest of the earth as they migrate more and more distal to the origin. Thus, in general, the greatest concentration of a particular mutation will occur at the location where it originated. Location, in this case, means a very general area such as a continent, subcontinent, country or group of countries. Therefore, localized (relatively) concentrations of people carrying a certain mutation suggests that the mutation probably originated at that locale. These large groups of people, carrying similar specific mutations, are called haplogroups. Basically, a haplogroup is defined as a group of people having, in common, specific mutations at specific sites. In this case, the mutation and site would correspond to a single "rung" on our ladder - called a "Single Nucleotide Polymorphism" or SNP for short. Mutations at some SNPs occur so infrequently that they can be considered a "one time only event". Those people carrying these SNP mutations are said to belong to the haplogroup defined by the value at that SNP. The ancient migrations of these peoples, as their haplogroups change when new mutations occur, have been determined and suggest origins for our ancient ancestors. If we find that we belong to some haplogroup - say R1B - we can look at migration maps such as the one below (from the National Geographic Society Genographic Project) and see migratory history of this haplogroup from its inception about 35,000 to 79,000 years ago. The original "Eurasian Adam" passed a mutation (M168) to a son, defining the "original" Y-DNA molecule. A later mutation at M89 defined a new limb on this tree and another haplogroup which has since been named "F". Then a mutation at M9 defined the new branch named the K haplogroup, followed by one at M45 defining P, then at M173 defining R1 to, finally the mutation at M343 which produced the group we know as R1b. R1b split from R1 while our ancestors were migrating into central and southern Europe about 30,000 to 35,000 years BC. Maps of migratory routes for all known Y-DNA haplogroups can be found on the web. There are links on our Links Page that will show some of these.

Soooooo, you now know the ancient group of people from which you descend and the region of the world in which they originated. Great! This information, and a couple of dollars, will get you a cup of coffee most places. However, it does allow you to eliminate anyone who belongs to a different haplogroup from consideration as possible relatives in your genealogical lines. This is true even for people with your surname. The Hope surname project currently shows five different haplogroups represented in our participants. The knowledge of your haplogroup, thus eliminating all those from other haplogroups, can save you a lot of time, money, effort and frustration in your genealogical pursuits.

The second - and much more important for genealogical applications - use of your results comes when you compare your test results, or haplotype, to those of others. If your haplotype matches with another person, it means that the two of you share a common ancestor - the "Most Recent Common Ancestor" or MRCA. However, it does not tell you who this ancestor was, or when and where he lived. If the person with whom you match has the same surname (Hope for example) the MRCA is certainly within the genealogical time period - that is, the time during which surnames have been commonly used, within the last, roughly, 1000 years. In order to estimate the maximum time to the MRCA, we have to consider several parameters. (1) the number of markers tested and the number that match, (2) the mutation rate assumed for the markers tested and (3) the level of certainty we will accept - remember this is a statistical statement. In general, more markers tested and matched, higher mutation rates assumed and lower probabilities accepted, will yield shorter time estimates to the MRCA. It is useful to remember that the number of generations to the MRCA is the maximum. The MRCA may well be more recent than this estimated maximum.

The first two charts below illustrate the effects of manipulations of some of these parameters on the estimated maximum number of generations to the MRCA. For example, looking at the first chart, we can be 90% certain that our MRCA is no further back than 48 generations if we match on 12 of 12 markers and assume a 0.002 mutation rate (the "industry" standard). This drops to ten generations if we match of 37/37 markers and to five generations if we match on 59/59 markers but assume a mutation rate of 0.004 instead of 0.002 and the TMRCAs are halved. The data in the second chart assumes that we did not match on one marker. Comparing this chart to the first one shows that only a single mismatch increases the time to the MRCA by about 70%.

90% Probability
Entries are Generations to the MRCA

Markers Mutation Rate Selected
Tested/Matched 0.002 0.003 0.004
12/12 48 32 24
37/37 16 10 8
59/59 10 7 5

90% Probability With A One Marker Mismatch
Entries are Generations to the MRCA

Markers Mutation Rate Selected
Tested/Matched 0.002 0.003 0.004
11/12 84 56 42
36/37 27 18 14
58/59 17 12 9


The third chart, below, illustrates the effect of accepting lower levels of certainty. As we reduce our percent certainty from 90% certain of the time estimate to 50% certain, the time to the MRCA is also reduced. However, remember that the percentage chosen determines how sure we can be that the time period actually covers our common ancestor.


Effect of Percent Certainty Accepted
Mutation Rate = 0.002
Entries are Generations to the MRCA

Markers Percent Certainty Selected
Tested/Matched 90% 70% 50%
12/12 48 25 15
37/37 16 8 5
59/59 10 5 3


The average mutation rate used in the illustrations is the average of all mutations. However FTDNA has developed a comparison technique which takes advantage of the individual mutation rates of the markers tested. This technique, called FTDNA Time Predictor (FTDNATip), gives a much better estimate of the TMRCA. The percent certainty that you will accept is a personal decision but it does have an inverse affect on our certainty that our common ancestor did, indeed, exist within the generations indicated. Clearly though, the time to the MRCA estimate increases dramatically with mismatches (e.g., a 36/37 marker match). Therefore, reducing the time to the MRCA, and insuring that the MRCA is within the genealogical time period suggests that testing the maximum number of markers available - thus maximizing matches - is the best route. However, it must be emphasized that the number of generations to the MRCA is what it is. All of these manipulations do not change this distance, they only affect our estimate of it.

Essentially, what we should "take home" from this information is that, when we match with someone, the best we can hope for is a most recent common ancestor within a sufficiently small number of generations that we can identify him. Also, we would hope that the person with whom we match will have his genealogical line defined back to Adam so we can "piggy-back" on the knowledge. This sounds a little selfish but it is useful to remember that we may also match with someone who has less information we, so we get to help someone as well.

This treatment presents this subject as I currently understand it. I too am also relatively new at this and am very liable to misunderstand the data I have tried to digest as I attempt to become more competent in the area. If you find any errors in this treatise or find that you have questions that it doesn't answer, please contact me. Thanks!