Re: New variants spreadsheet

Jared Smith
 

Please send all of them (with an indication of or grouping by YFull's
quality level). I'm curious how their quality levels align with those
from the other sources I used - Mike W.'s analysis, Alex's Big-Y
analysis, and my own.

I don't really include much grading information in the spreadsheet,
beyond FTDNA's read data (PASS or REJECTED, though they have an
extremely high threshold for "PASS"), and the indication of the Region
and STR data (anything in these columns will suggest lower quality).

I didn't think it necessary to create a complex quality analysis
metric in the spreadsheet seeing as this is what Mike, Alex, and YFull
do best, and I can always reference their work in instances where
things are questionable.

Jared

On Mon, Feb 20, 2017 at 4:56 PM, Joel Hartley <joel@...> wrote:
Thanks, Jared,

That should be a great resource. I'll send some of my SNPs your way once I
sort them out. YFull has them as:

Best Quality (5)
Acceptable (6)
Ambiguous (20)

I'm guessing you just want the 1st 2 categories?

Joel



On 2/20/2017 6:44 PM, Jared Smith wrote:

I have uploaded a new spreadsheet to
http://dna.smithplanet.com/media/Z16357-Variants.xlsx

This likely has limited utility for anyone other than me, but I
thought I'd share it. This file is used for analyzing Y-DNA mutation
variants (SNPs, insertions/deletions, etc.) that us Z16357 people
have. It's a very large spreadsheet with complex calculations - minor
changes like sorting can take a long time to calculate.

The Variants tab includes all 68,355 unique variants that we have.
These were collected from Big-Y VCF files.

You can use the Lookup tab to query specific DNA position numbers to
see the values each of us have at that position.

The Shared Variants tab shows all known variants ***AT OR BELOW
Z16357*** that at least 2 of us have. This allows easy analysis of the
consistency of SNPs and determination of their position on our
branches. A "+" indicates a positive test for that variant. A "***"
indicates the variant was identified, but the test quality is
questionable. A blank box indicates EITHER a negative result OR no
test coverage (be careful - you can't assume too much from a blank box
without analyzing the BED file for read coverage).

The Unique Variants tab lists most of the variants that are unique to
only one of us. I'd be happy to add any new ones from YFull, if any of
you who have tested there would like to e-mail them to me. Note that
some Insertions/Deletions (these are kinda like hiccups in your DNA)
show "Count" as 0 because Big Tree calculates the position info for
INDELs a bit differently than the VCF file. These are retained for
reference.

The primary function of this spreadsheet is to easily add VCF data to
Variants for new Big-Y testers, then immediately determine which
existing SNPs from our branch they have, and which Unique Variants are
then no longer unique and need to be moved to Shared Variants.

Jared




Join Z16357@groups.io to automatically receive all group messages.