Announcement

Collapse
No announcement yet.

Can we sequence the Y Chromosome

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can we sequence the Y Chromosome

    Hello,

    I am a newbie in this forum who has had extensive testing for STR, SNP, and chip technology by 23andMe. I have been following the developments in the third generation sequence machines such as PacBio, Ion Torrrent and others in the past couple of years.

    Are we getting closer to being able to actually sequence the Y-Chromosome? Technical questions such as can we easily isolate the Y-Chromosome for sequencing? I had hoped that Ion Torrent was a machine that could do this possibly until they were bought out by Life.

    With everyone talking about the $1000 genome at 3 billion BP why can't we sequence the Y-Chromosome at a reasonable cost at 80 million BP?

    To do the Y-Chromosome the need for long reads and multiple reads for accuracy would seem to be necessary. I do not have a science background so I am looking for the overview on this subject.

    Regards,
    Kerry O'Dair
    FullGenomes Kit 045DV YFull Terminal SNP Y2846 FTDNA Kit 52277 M35>V12>CTS693>CTS3346>Y2877>CTS6667>CTS8411>Y2846 MTdna U4b1a3a

  • #2
    some directions

    Hi Kerry,
    Welcome to seqanswers! While your questions target the very interesting intersection between cytogenetics and next gen sequencing, I cannot answer any of them directly. However I encourage you to do more literature research for yourself on this topic taking advantage of public databases. For example at the USPTO patent search site, under the advanced search for Y chromosome (enter abst/"Y chromosome" as your search term, you will note less than 80 published patents and applications--most dealing with physical methods for sperm cell separation on the basis of the Y chromosome. Then you can check Google scholar for chromosome fractionation and see where the most current research in that literature leads you. Perhaps you will be able to locate a commercial laboratory or research program currently engaged in this task.

    Comment


    • #3
      First off, what's wrong with the sequence we already have? Y is not a total wasteland.

      One group has reported specifically capturing a specific mammalian chromosome by flow cytometry & getting sequences highly enriched for the targeted chromosome.

      Comment


      • #4
        Originally posted by Joann View Post
        Hi Kerry,
        Welcome to seqanswers! While your questions target the very interesting intersection between cytogenetics and next gen sequencing, I cannot answer any of them directly. However I encourage you to do more literature research for yourself on this topic taking advantage of public databases. For example at the USPTO patent search site, under the advanced search for Y chromosome (enter abst/"Y chromosome" as your search term, you will note less than 80 published patents and applications--most dealing with physical methods for sperm cell separation on the basis of the Y chromosome. Then you can check Google scholar for chromosome fractionation and see where the most current research in that literature leads you. Perhaps you will be able to locate a commercial laboratory or research program currently engaged in this task.
        Thanks for the information on USPTO patent search site and the Google scholar. My goal is to find a commercial laboratory who might take on this task. I am hoping exposure in this forum might lead to potential companies interested in such an endeavor. Your site is very valuable in this vain. Your bioinformatics section listing software and google map of next generation sequencing service providers have all been very helpful looking into this matter. Obviously the next step after finding a company to sequence the data is finding viable tools to interpret the data.

        We have already been involved in a commercial investigation of the Y chromosome. We had eight people with 100k BP sequenced in a region of the Y with a high incidence of SNP discovery. We had mixed results finding new SNP’s. This particular company is not ready as of yet to take a commercial product sequencing the Y to the market place. We were the alpha test for this company on Y chromosome sequence testing. However, this company has contributed many new snp’s doing these tests. Any L series snp’s are a result of this testing. However, we are just seeing the tip of the iceberg with this testing.

        With sequencing we would see all snp’s, CNV’s, insertions and deletions, and transposons. The goal being able to develop a complete Y phylogeny tree. Another goal would be to develop a good snp molecular clock. STR’s mutation rates based on father and son studies are few and these str’s mutate more rapidly than snp’s. This would be the holy grail of ancient paternal ancestry with complete y chromosome testing.

        Ancient ancestry has a very strong following in the geneological community. Most sequencing is geared to medical and agricultural activities at the moment. In our project we have gone beyond a well studied haplogroup done by Cruciani in the E-M35 haplogroup. Our haplogroup represents about 4% of the population. We currently have close to 1800 members in this project and represents about 1/3 of the total number who have actually been tested in this group. The R1B1 haplogroup will never be fully understood without sequencing and represents a much larger per cent of the population. Thanks again for your great forum.

        Regards,
        Kerry O’Dair
        FullGenomes Kit 045DV YFull Terminal SNP Y2846 FTDNA Kit 52277 M35>V12>CTS693>CTS3346>Y2877>CTS6667>CTS8411>Y2846 MTdna U4b1a3a

        Comment


        • #5
          Have you looked in the 1000 Genomes data at chrY? Perhaps they have some useful variants there - -quite a lot of data is online. How many new SNPs for Y showed up in the Bushman or Asian whole genome sequences?

          For the sort of high-throughput approach you are describing, one of the targeted sequencing technologies could make sense. For example, you might try designing a RainDance or OLink primer library to amplify the Y. Or, perhaps a SureSelect/Nimblegen approach.

          A big question would be what cost per sample are you really willing to take on and how many samples? That would really affect the choice of technology.

          Comment


          • #6
            Originally posted by krobison View Post
            First off, what's wrong with the sequence we already have? Y is not a total wasteland.
            I do not consider the currently known sequences a wasteland. I use a very good y-mapper supplied by a vendor today I think using all known sequences.

            http://ymap.ftdna.com/

            Originally posted by krobison View Post
            One group has reported specifically capturing a specific mammalian chromosome by flow cytometry & getting sequences highly enriched for the targeted chromosome.
            Thank you for this lead, I will check into it further.
            FullGenomes Kit 045DV YFull Terminal SNP Y2846 FTDNA Kit 52277 M35>V12>CTS693>CTS3346>Y2877>CTS6667>CTS8411>Y2846 MTdna U4b1a3a

            Comment


            • #7
              Originally posted by krobison View Post
              Have you looked in the 1000 Genomes data at chrY? Perhaps they have some useful variants there - -quite a lot of data is online. How many new SNPs for Y showed up in the Bushman or Asian whole genome sequences?
              Here are some comments on the 1000 genomes project Y chromosome. These comments are from people with greater skill sets than my own. I do not think we are going to find the detail we are looking for. There is not enough variety in the y tree in the samples they appeared to use.

              1000 Genomes Project: Y Chromosome SNPs

              Luke Jostins, Qasim Ayub, Yali Xue, Chris Tyler-Smith

              Abstract

              • Y chromosome SNPs were called from the 1000 Genomes data, and numerousfilters were applied
              • A total of 2870 sites were called as variable in the 77 samples, of whicharound 75% are novel
              • 30 sites that passed all filters were re-sequenced using capillarysequencing, giving an estimated false positive rate of 3.3%
              • Known HapMap variants and variants from the Y haplogroup tree were used toestimate the sensitivity. This gave 22% for singletons and doubletons and63% for variants with a non-reference allele count of three or greater
              • We use the sensitivity to estimate a polymorphism rate of 1 variant per2350bp
              • HapMap genotypes and Pilot 1/Pilot 2 concordance was used to estimate aper-genotype error rate for Q10 non-reference bases of under 1%
              http://bit.ly/djsOaP

              Vince Tilroe

              Many established SNPs may have been screened out by their quality "filters" (proximity to other SNPs, for example).

              They had 77 men and a total of 188 million bp attributed to the Y, which means that on average only 2,400,000 bp per Y was sequenced: that's just 10%, roughly.

              And these genomes have VERY low sequencing coverage (just 1.94x on average), which is one reason the Y-SNP outcomes are so weak. While the HapMap samples covered a decent array of haplogroups, just 2870 usable Y-SNPs found is not a lot given the total amount of sequencing done here.

              It's interesting that even with this sparse data they point out the youthfulness of R1b1b2 ("New insights into recent human evolution can also be gained from the branch lenghts; for example, the short internal branch lengths within the haplgroup R1b relative to the other haplogroups suggest a recent expansion of this European haplogroup.)

              Yet it also apparent from Figure 4 that if we are going to get deep insights into intra-haplogroup structure (not to mention a truly
              precise SNP-based molecular clock) we are going to need much better Y- chromosome sequencing than was done here.

              Vicent Vizachero

              Many established SNPs may have been screened out by their quality "filters" (proximity to other SNPs, for example).

              But I think the more likely explanation is the low quality of Y- chromosome sequencing. They only got 188 million bp out of 77 guys, which amounts to just 2.4 million bp on average. That's only 10% of the full Y, and only maybe 20% of the "sequencable" Y. And some of the men were only sequenced at low depth (<3x), so sequencing errors could also have an impact.

              Vicent Vizachero

              Originally posted by krobison View Post
              For the sort of high-throughput approach you are describing, one of the targeted sequencing technologies could make sense. For example, you might try designing a RainDance or OLink primer library to amplify the Y. Or, perhaps a SureSelect/Nimblegen approach.
              Thanks for the leads here. I will pass them along.

              Originally posted by krobison View Post
              A big question would be what cost per sample are you really willing to take on and how many samples? That would really affect the choice of technology.
              Yes that is the big question. Where are we today cost wise doing this kind of sequencing with potential third generation machines. There are over 200,000 str tests today and I have heard numbers as great as 35k total for 23andMe testing. So their is a market place. If there is a price point I am thinking people would pay up to $600 for that kind of y sequence test. The real question is that in the realm of reality in todays market.
              Last edited by KerryOdair; 09-18-2010, 08:18 AM.
              FullGenomes Kit 045DV YFull Terminal SNP Y2846 FTDNA Kit 52277 M35>V12>CTS693>CTS3346>Y2877>CTS6667>CTS8411>Y2846 MTdna U4b1a3a

              Comment


              • #8
                I had hoped Complete Genomics would be an excellent candidate for this type of application. Below is the frustration looking into this kind of commercial test. The Y Chromosome has no known medical implications that I am aware of. So regulatory compliance seems to be a non issue in my mind, not to mention we should have right over our own dna. There is no need for consulting services that you run into in the 23andMe world as far as regulators are concerned. This reply was sent to me in June of 2009. I also contacted Ion Torrent but since they have been bought out by Life Tech, it is unclear to me if this will speed up or hinder development of this platform. Life Tech may also influence an increase in price for the instrument after the buyout. In terms of privacy this issue has been handled by companies already with an exclusion if you so desire. However, most people seeking this kind of testing are more than willing to share their personal information for matches. PacBio still remains an unknown at this point in time. Existing 2nd generation machines could also possibly do the job right now. My expertise on these machines is lacking to know if that is the case.

                There is much to be learned from a detailed y tree. From the academic world the flow of information is slow and the peer review process is a lumbering elephant trying to keep up with dramatic change.


                Dear Mr. O'Dair,

                Thanks for your interest in Complete Genomics. *While your project sounds very interesting, Complete Genomics does not plan on sequencing partial portions of the genome. Our current plans are to focus exclusively on providing sequencing of the complete human genome - all 6B bases.

                Complete Genomics is also providing our sequencing service for research purposes only to bio-pharma companies and genome centers/research organizations. The main reason we aren't sequencing genomes for individuals is because that type of service requires legal consents, consulting services, and privacy and regulatory compliance processes such as CLIA, etc. that Complete Genomics does not have. We have no plans in 2009 to obtain such certification and hence will not be able to sell directly to individuals.

                Regards,

                Jennifer Turcotte

                Complete Genomics, Inc.

                VP of Marketing
                FullGenomes Kit 045DV YFull Terminal SNP Y2846 FTDNA Kit 52277 M35>V12>CTS693>CTS3346>Y2877>CTS6667>CTS8411>Y2846 MTdna U4b1a3a

                Comment


                • #9
                  Interesting link for Registry of sequenced genomes.

                  http://www.worldpgr.com/
                  Last edited by KerryOdair; 09-23-2010, 11:37 AM.
                  FullGenomes Kit 045DV YFull Terminal SNP Y2846 FTDNA Kit 52277 M35>V12>CTS693>CTS3346>Y2877>CTS6667>CTS8411>Y2846 MTdna U4b1a3a

                  Comment


                  • #10
                    Originally posted by KerryOdair View Post
                    Interesting link for Registry of sequenced genomes.

                    http://www.worldpgr.com/
                    Disclaimer:
                    The World Personal Genome Registry is a website created by Illumina for the individual genome sequencing space that allows the community to keep track of the current status of personal whole-genome sequencing. We plan to transfer this registry to an appropriate standards body...

                    Comment


                    • #11
                      Originally posted by nilshomer View Post
                      Disclaimer:
                      The World Personal Genome Registry is a website created by Illumina for the individual genome sequencing space that allows the community to keep track of the current status of personal whole-genome sequencing. We plan to transfer this registry to an appropriate standards body...
                      I appreciate the information. I was aware that Illumina was the care taker of this information at the moment. I would suggest that an appropriate standards body might be the International Society of Genetic Genealogy.

                      http://www.isogg.org/
                      FullGenomes Kit 045DV YFull Terminal SNP Y2846 FTDNA Kit 52277 M35>V12>CTS693>CTS3346>Y2877>CTS6667>CTS8411>Y2846 MTdna U4b1a3a

                      Comment


                      • #12
                        Complete Genomics has stated very clearly that they are in the game only to do complete human genome sequencing. The Y has limited medical relevance (the obvious example is azoospermia due to deletions on Y) and in any case they've decided to stay out of the retail genomics game. Right now, the assumption is that you would need to assume regulatory issues unless you could really prove otherwise -- a headache they wish to stay away from.

                        Companies such as Ion Torrent (now into LifeTech) are in the game to make machines, not do much in the way of specific sequencing. PacBio would be the same issue. PacBio+RainDance or Fluidigm may make a very interesting combo for your application (probably also the other targeted sequencing schemes as well), but I still think you'll have trouble getting the cost to where you need it.

                        I think the quick answer right now is that at the moment there isn't a good solution in the cost you are looking for. If you could batch a large number of samples, then RainDance might be a reasonable option to do the targeting. In any case, I think you will find it challenging to go below $500 total cost per sample with a service provider, and that may even be a bit low-ball.

                        If I were in your shoes, particularly if not with a lot of funds, I'd focus on mining the available genomes & 1000 genome data to get a much richer set of SNPs. If you look around here, there is another thread on how to access consolidated SNP info from 1K genomes. These could be converted into one of the typical cheap SNP-typing formats & then you could get a richer set to type lots of genomes on cheap array/PCR platforms.

                        In any case, you probably should find an academic somewhere to collaborate with, as all of these companies will probably be more comfortable doing that than with a private citizen.

                        Comment


                        • #13
                          Future Repositories and Standards Bodies: Guidance

                          In addition to the International Society of Genetic Genealogy, I would like to suggest NARA (US National Archives and Records Administration) as a potential source of guidance and standards development for de-centralized or centralized genealogical repository of DNA sequences. This is because the emergent majority of digital archive users at US NARA and other traditional archival records facilities are genealogists and family historians. (Millions of users yearly!)

                          Analyzing archives and finding facts: use and users of digital data records Margaret O’Neill Adams - Archival Science, 2007 Vol 7 No. 1 21-36

                          Relocating Meaning in Heritage Archives: A Call for Participatory Heritage Databases. Angela M. Labrador and Elizabeth S. Chilton. Computer Applications in Archaeology. Annual Meeting Proceedings 2009
                          http://www.caa2009.org/articles/Labr...86_c%20(1).pdf

                          Comment


                          • #14
                            Here is a teenager with more ambitious aspirations than my small look of 80 million BP's on the Y Chromosome. Great article and a must read.

                            CUPERTINO, Calif.—In many ways, Anne West is a typical 17-year-old California teenager. She wears her hair long. She likes to hang out with her friends. She went to the prom.

                            She is also analyzing her family's genome.

                            Having being diagnosed with a pulmonary embolism in 2003, Anne's father John decided last year to get the family's genes sequenced. The process involves an advanced technology that spews out the six billion letters that represent the makeup of a person's genetic code. But after putting up $160,000 to get the four-member family tested, the Wests realized something: sifting through the reams of data was tougher than they ever imagined.

                            http://online.wsj.com/article/SB1000...l?mod=ITP_TEST
                            FullGenomes Kit 045DV YFull Terminal SNP Y2846 FTDNA Kit 52277 M35>V12>CTS693>CTS3346>Y2877>CTS6667>CTS8411>Y2846 MTdna U4b1a3a

                            Comment


                            • #15
                              Originally posted by KerryOdair View Post
                              Here is a teenager with more ambitious aspirations than my small look of 80 million BP's on the Y Chromosome. Great article and a must read.

                              CUPERTINO, Calif.—In many ways, Anne West is a typical 17-year-old California teenager. She wears her hair long. She likes to hang out with her friends. She went to the prom.

                              She is also analyzing her family's genome.

                              Having being diagnosed with a pulmonary embolism in 2003, Anne's father John decided last year to get the family's genes sequenced. The process involves an advanced technology that spews out the six billion letters that represent the makeup of a person's genetic code. But after putting up $160,000 to get the four-member family tested, the Wests realized something: sifting through the reams of data was tougher than they ever imagined.

                              http://online.wsj.com/article/SB1000...l?mod=ITP_TEST
                              This is very cool and inspiring. I wonder if the collective we could volunteer to help with this dataset. We have enough experts in assembly, annotation, SNP discovery, etc, that 4 human genomes (better yet a family) could be a very interesting data set.

                              I'll have to think on it a little more, but I would love to collaborate with her and her family to open up the dataset to SEQanswers users. Perhaps I could secure or fund a donation of some compute resources for the product.

                              Thanks for posting it Kerry!

                              edit: Looks like we'd be a little late to the party...

                              She is now at work on a paper based in part on her family's data, with researchers from a Seattle institute. Last month, she was a speaker on a panel at a personal-genomics conference held at Cold Spring Harbor, New York, a scientific mecca.

                              Comment

                              Working...
                              X