Can we sequence the Y Chromosome

KerryOdair replied

11-22-2014, 09:42 PM
Statisctics for the next version 2.28 of our Y-Tree YFull (coming soon):
440 SNPs, 83 subclades
by haplogroups:
A0: 1 SNP
A1b1: 2 SNPs
E: 11 SNPs, 2 subclades
G: 14 SNPs, 1 subclade
I1: 52 SNPs, 15 subclades
I2: 64 SNPs, 10 subclades
J1: 4 SNPs, 3 subclades
J2: 48 SNPs, 1 subclade
T: 4 SNPs, 3 subclades
N: 8 SNPs, 9 subclades
O: 6 SNPs, 1 subclade
Q: 87 SNPs, 8 subclades
R1a: 44 SNPs, 12 subclades
R1b: 73 SNPs, 13 subclades
R-M479: 27 SNPs, 4 subclades
others: 5 SNPs, 1 subclade
Leave a comment:
KerryOdair replied

11-11-2014, 09:18 AM
Statistics for the next version (2.27) of the Tree (coming soon...):
1192 SNPs, 205 subclades
by haplogroups:
A0: 106 SNPs
A1b1: 32 SNPs, 2 subclades
E: 70 SNPs, 9 subclades
G: 89 SNPs, 1 subclade
H: 1 SNP
I1: 122 SNPs, 44 subclades
I2: 273 SNPs, 29 subclades
J: 36 SNPs, 8 subclades
T: 49 SNPs, 6 subclades
N: 18 SNPs, 5 subclades
O: 185 SNPs, 27 subclades
Q: 38 SNPs, 7 subclades
R1a: 99 SNPs, 40 subclades
R1b: 72 SNPs, 27 subclades
others: 2 SNPs
Leave a comment:
KerryOdair replied

10-11-2014, 04:27 AM
Update from Maximus Centurion‎
YFull.com: Y-Chr Sequence
Interpretation Service

YTree 2.25 (due date 15-20 October) coming soon...
1724 SNPs, 138 subclades:
C: 54 SNPs
E: 226 SNPs, 6 subclades
G: 83 SNPs
H: 3 SNPs
I1: 134 SNPs, 28 subclades
I2: 193 SNPs, 25 subclades
J: 84 SNPs, 1 subclade
L: 53 SNPs, 1 subclade
N: 163 SNPs, 16 subclades
O: 1 SNP
Q: 225 SNPs, 16 subclades
R1a: 313 SNPs, 24 subclades
R1b: 92 SNPs, 21 subclades
T: 92 SNPs, 4 subclades
Leave a comment:
KerryOdair replied

09-02-2014, 09:44 AM
YTree v2.24 (in the process of calculating now...)
Statistics: 571 SNPs, 127 subclades.
By haplogroups:
C: 2 SNPs
E: 31 SNPs, 9 subclades
G: 3 SNPs, 2 subclades
H: 4 SNPs
I1: 73 SNPs, 15 subclades
I2: 143 SNPs, 28 subclades
J: 62 SNPs, 17 subclades
L: 51 SNPs
N: 28 SNPs, 4 subclades
Q: 57 SNPs, 6 subclades
R1a: 40 SNPs, 24 subclades
R1b: 79 SNPs, 22 subclades
Leave a comment:
KerryOdair replied

08-28-2014, 09:44 AM
Very Nice presentation by Greg Magoon from Full Genomes:

'Next-Gen'
Y Chromosome Sequencing: Progress and Promise
2014 International Genetic Genealogy Conference
Washington, D.C. August 16, 2014
Greg Magoon

https://docs.google.com/file/d/0B8eigUEXUAvlMkFweDdHVzdMN2c/edit

https://docs.google.com/file/d/0B8eigUEXUAvlMkFweDdHVzdMN2c/edit
Leave a comment:
KerryOdair replied

07-26-2014, 08:05 PM
Full Genomes launches Y Prime - a new Y chromosome sequencing product
The following press release has been written by Full Genomes Corporation.

Full Genomes Corporation (FGC) is announcing today the introduction of a new Y chromosome sequencing product, dubbed Y Prime. The Y Prime test leverages recent technology advances to economically sequence large portions of a male's Y chromosome, enabling advanced, high-resolution tracing of direct paternal line ancestry.

FGC has worked with industry leaders to develop a new Y chromosome capture approach and has combined it with Illumina "next-gen" sequencing. The resulting data will be processed with the latest alignment algorithms to improve read mapping. The overall result is a cutting-edge product with Y chromosome coverage breadth that is close to that of FGC's original comprehensive Y sequencing product (now termed Y Elite), at a much lower cost. Additionally, the new product is priced lower than the leading competitor, while retaining a significant advantage in terms of quality and comprehensiveness.

More information at link

Full Genomes launches Y Prime - a new Y chromosome sequencing product

http://cruwys.blogspot.co.uk/2014/07/full-genomes-launches-y-prime-new-y.html
Leave a comment:
KerryOdair replied

07-14-2014, 01:26 AM
Statisctics for the YTree version 2.22 (coming soon):
NEW: 485 SNPs, 23 subclades
C: 195 SNPs
I1: 24 SNPs, 3 subclades
I2: 106 SNPs, 5 subclades
J: 6 SNPs
N: 14 SNPs, 6 subclades
Q: 62 SNPs
R1a: 37 SNPs, 7 subclades
R1b: 14 SNPs, 2 subclades
R-M479: 27 SNPs
Leave a comment:
KerryOdair replied

06-22-2014, 06:53 AM
Below are new statistics from the Yfull folks. These are new results coming from sequence Y testing from FullGenomes and the BigY from FamilytreeDNA. We are beginning to see the full impact of the data coming in and this is just the beginning. This is a huge leap in our body of knowledge of the Y.

Statisctics for the next version 2.21 of our Y-Tree (coming soon):
will be added 11987 SNPs, 246 new subclades
Technical requirements: .FASTQ or .BAM file; coverage min 25X; read length min 100 bp
by haplogroups:
I1: will be added 406 SNPs, 9 new subclades
I2: will be added 843 SNPs, 19 new subclades
J: will be added 1763 SNPs, 16 new subclades
N: will be added 753 SNPs, 34 new subclades
O: will be added 1191 SNPs, 23 new subclades
Q: will be added 402 SNPs, 10 new subclades
R1a: will be added 281 SNPs, 28 new subclades
R1b: will be added 595 SNPs, 75 new subclades
R-M479: will be added 55 SNPs, 2 new subclades

Last edited by KerryOdair; 06-22-2014, 07:16 AM.
Leave a comment:
KerryOdair replied

05-02-2014, 04:54 PM
In addition to my FullGenomes testing I have added the following folders with autosomal testing for myself at my google drive. I have also placed an experimental E-M35 portion of the tree that will be updated as new discoveries are made by the E-M35 Haplozone group.

1. Genographic 2.0 From National Geneographic

2. 23andMe Version 2.0 Testing

3. Family Finder Autosomal testing from Familytreedna

Link to google drive:

Google Drive: Sign-in

https://drive.google.com/#folders/0B1SPAtbDBxjEOGQ3MmU2NjEtMTU2NS00N2JhLTgzMDQtYzQ2NWNiMzExMDU2

Access Google Drive with a Google account (for personal use) or Google Workspace account (for business use).

Last edited by KerryOdair; 05-02-2014, 05:06 PM.
Leave a comment:
KerryOdair replied

03-04-2014, 11:31 AM
Test Results

There is a folder that contains my results files from testing at FullGenomes. This is probably what you will want to look at. It shows all the output files supplied from the testing. So if you are curious about what this data looks like it is available for view. If you have a google login you should be able to view and download these files.

I have placed on google drive my results files for view. There is also a .bam file of ChrY data that is 2.2 giagbytes. My complete file was 6.6 giagbytes which also contained ChrMt data and STR information. This was just too big to put up on googledrive on an upload with my current system. A word of warning should you want to download the 2.2 giabyte file with ChrY. Googledrive has a 2.0 gig max on downloads for files. Some people have hit this restriction and others have not based on browser and OS type and version. If you move the .bam file to your own googledrive the download seems to work all the time.

I have also given my .bam for interpretation and study to www.yfull.com as well. I am also working with www.Yseq.com to create primers for my own private snps or family and clan panel.

There are now two Y Sequence tests available in the Market place today. The Tsunami of newly discovered snps has begun. This was my hope when I started this thread. I am glad to see that we are on the doorstep of these new discoveries based on this kind of DNA testing.

Link to my personal FullGenomes results

Google Drive: Sign-in

https://drive.google.com/#folders/0B1SPAtbDBxjEOGQ3MmU2NjEtMTU2NS00N2JhLTgzMDQtYzQ2NWNiMzExMDU2

Access Google Drive with a Google account (for personal use) or Google Workspace account (for business use).
Leave a comment:
KerryOdair replied

02-11-2014, 03:10 PM
This is an update on my FullGenomes data with dating of SNP's. I would like to again thank Steve Fix in helping me with my data.

Over the past month I have been going over my E-M215 database and making a few minor corrections and improvements. None of these materially change the analysis I sent you last November but in this process I have also integrated SNP dating using the Poznik “reliable regions” which I think you might find interesting. In the corresponding spread sheet I maintain the comparison using the average mutation rate of 1x10E-9/y. The results are easily extended if you wish to use Poznik’s mutation rate or Francalacci’s etc.

The updated analysis even though it includes the Poznik regions did not change my estimate of the effective coverage of the FullGenomes sequencing to be around 16mb. The biggest change I saw using Poznik was that the number of “individual” SNPs fell with respect to the 1KGP Ph 1 “reliable regions” estimate. This has the effect of moving the overall age estimates closer to the StrictMask method. Using Poznik the age of your “individual” segment(TMRCA with HG01497) is 2 ky which is consistent with what I concluded in my November analysis.

Steve Fix
28 Jan 2014
Attached Files

Dating FullGenomes 2014.png (121.0 KB, 39 views)
Last edited by KerryOdair; 02-11-2014, 03:13 PM.
Leave a comment:
KerryOdair replied

11-22-2013, 01:20 PM
I would like to take this time to thank Steve Fix who did an analysis of my data from FullGenomes. He has been extremely helpful in adding to my understanding of my results. Below are his comments and a thumbnail showing dating on major snps in the E-M35 group.

Kerry

I have completed my analysis of your FullGenomes test results. This report extends and updates what I had previously sent you on Nov 4th and is based on the SNP report provided to you by FullGenomes contained in the file
1068A_045DV_KerryODair.haplogroupCompare20131001_final2.
In this report they identified 3690 SNP type variants and called 2925 of these positive with varying degrees of reliability. Of these they classified 141 as “private”.

As outlined in my previous report I note that you are most closely related to the 1KGP sample HG01497 from the CLM(Colombian in Medellin) data set and as part of my analysis have done bottoms-up dating of your results using a mutation rate of 1x10E-9/y. I have done this using two methods: One based upon the “reliable regions” of Y similar to what Wei, Francalacci and others have used. The other based on the 1KGP StrictMask. In general I have found the “reliable regions” method to be more volatile and less repeatable when comparing the 1KGP results to the Complete Genomics data. This analysis does not change that observation. Of the 3690 SNP variants in your results 1187 fell within the reliable regions. Of those I can associate 573 with the E haplogroup and 568 of those were called positive. Similarly for the StrictMask I associate 172 SNP variants in your results with E and all of those were called positive. There is an issue however as to whether all of the SNP variants listed should be classified as SNPs. I identified 30 positively called variants as associated with INDELs including 2 which passed the StrictMask and 6 from the reliable regions. These were mostly from the individual SNP classified set (see my analysis summary spreadsheet) so you will need to take a closer look at your alignment file(.bam) to better understand and resolve these calls. If these are found to be SNPs then my age estimates will increase by the appropriate amount (320y per StrictMask SNP and 112y per reliable region SNP). The locations of these variants can be found in my analysis summary spreadsheet.

The analysis summary spreadsheet is attached and contains the estimates for the segments and branches of haplogroup E based upon your results. I have also included a segment by segment ratio of the number of SNPs found with respect to the total that I could classify for the two methods. In addition I have made an estimate of the implied coverage achieved in your results based on the two methods and the SNP total. The average is shown to be 16 mb which is about what one might have expected from today’s sequencing technology. This of course assumes a mutation rate which does not vary within the male specific region. In this analysis I did not include the SNPs associated with the individual segment as the total of 386 is not compatible with those observed in the more reliable regions. One would conclude that many of these are not real even though they were screened and come from locations which have produced reliable results elsewhere. Using the quality assessment of FullGenomes the 386 individual SNPs have the following breakdown
Total StrictMask Reliable Regions
high quality 11 2 11
* quality 17 2 4
** quality 278 0 14
*** quality 80 1 1
This would indicate that the majority of quality calls are from the previously identified reliable regions of Y. This further tends to confirm why using the reliable regions is required for bottoms-up dating and why I prefer the more restrictive StrictMask even though some people would argue that it is too strict. In looking at these results from FullGenome I am not convinced that this criteria should be broadened.

As I have mentioned previously these estimates appear consistent with what I have seen in the Complete Genomics sequenced samples and continue to be compatible with those of Karafet(2008). Your’s is the first FullGenome results I have looked at as well as the first V12 but these are consistent with what I would expect. I expected the V12 dates as I have observed with the V22 dates to be slightly higher than the V13 and V65 dates. As more results become available over the next few months I will have more to say on this issue.

In summary it would appear that your TMRCA with HG01497 is somewhere around 1.6k to 3.3ky years ago. Since I find the StrictMask dating more consistent I prefer the 2 ky estimate assuming the 1x10E-9/y mutation rate.

Steve Fix
20 Nov 2013
Attached Files

Dating FullGenomes 2013.png (107.4 KB, 38 views)
Leave a comment:
KerryOdair replied

11-13-2013, 10:31 AM
This is a link to the latest Y-DNA SNP testing chart for testing companies.

Just a moment...

http://www.isogg.org/wiki/Y-DNA_SNP_testing_chart

I would also like to state that I have received my results from FullGenomes testing. I am very satisfied with my results and have received all files as promised by the company. The files supplied are in the chart information. These include 9 data files and my bam file.

Through this testing my terminal snp has been identified based on the current Y tree, 1k genome project and various other sources. My terminal snp looks to be about 2100 to 2600 years old. They have also identified beyond my terminal snp 25 new mutations unique to me at this point along with 4 unique indels. Based on current studies for snp mutation rate or even a snp mutation every 90 to 75 years these new mutations should get me into a genealogical time frame within paper records.

There is a funded study by a citizen scientist to get samples from A00 group of people in Cameroon. The hope is to identifiy one of these haplotypes and have the Y fully sequenced. This will hopefully give us the ancestral state of these early snps. This will help in positioning new snps on the tree.

Exciting times for us in the genetic genalogy world.

Last edited by KerryOdair; 11-13-2013, 10:55 AM.
Leave a comment:
syfo replied

08-19-2013, 06:28 AM
Efficient identification of Y chromosome sequences in the human and Drosophila genomes

Notwithstanding their biological importance Y chromosomes remain poorly known in most species. A major obstacle to their study is the identification of Y chromosome sequences: due to its high content of repetitive DNA, in most genome projects the Y chromosome sequence is fragmented into a large number of small, unmapped scaffolds. Identification of Y-linked genes among these fragments has yielded important insights about the origin and evolution of Y chromosomes, but the process is labor intensive, restricting studies to a small number of species. Apart from these fragmentary assemblies, in a few mammalian species the euchromatic sequence of the Y is essentially complete, owing to painstaking BAC mapping and sequencing. Here we use female short read sequencing and k-mer comparison to identify Y-linked sequences in two very different genomes, Drosophila virilis and human. Using this method, essentially all D. virilis scaffolds were unambiguously classified as Y-linked or not Y-linked. We found 800 new scaffolds (totaling 8.5 Mbp), and four new genes in the Y chromosome of D. virilis, including JYalpha, a gene involved in hybrid male sterility. Our results also strongly support the preponderance of gene gains over gene losses in the evolution of the Drosophila Y. In the intensively studied human genome, used here as a positive control, we recovered all previously known genes or gene families, plus a small amount (283 kb) of new, unfinished sequence. Despite some ambiguity caused by misassembled segmental duplications, the vast majority of the human sequence could be reliably identified as Y or not Y-linked. Hence this method works in large and complex genomes and can be applied to any species with sex-chromosomes.
Leave a comment:
KerryOdair replied

06-10-2013, 02:53 PM
Originally posted by Joann View Post

Kerry, do you expect to post your full Y sequence, and if so, where?

Hello Joann,

Yes, I do expect to post my full Y sequence publicly and Geno 2.0 results. I will also offer it to any academic endeavor for further study of the Y or E-M35 phylogency. It is yet to be determined where it will reside at this point. I may have an area with FullGenomes as a repository for the data. Assuming my sample has good dna for replication, I should receive results the last week of July or first week of August.

We have a nice quality assurance situation with my sample. In the E-M35 group we already have 52 tests completed for the GenoGraphic 2.0. The 52 tests involve different subclades in E-M35. We already have 5 tests for V12, which happens to be my subclade. 19 tests have already been compared with each other and to the 1000 genomes samples. New SNP’s are being discovered and cataloged by individuals in our group and submitted for new SNP identification numbers. With this information we have a SNP analysis file along with an E-M35 tree that is fluid in its information based on new discoveries. There are 13,000 SNP’s in the Geno 2.0. test.

This information is going to be used by FullGenomes as Quality Assurance against my full Y sequence, which will be used to match against 29,000 SNP’s. This is over twice the number in the Geno 2.0. test.

This is the beginning of some exciting times and maybe the beginning of a snp molecular clock specific to E-M35.
Attached Files

Sample E-M35 Tree V12.png (122.2 KB, 43 views)

Sample E-M35 Analysis .png (140.2 KB, 46 views)
Last edited by KerryOdair; 06-10-2013, 03:44 PM.
Leave a comment:

Previous 1 2 3 4 5 8 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News