Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ddoopus
    Junior Member
    • Sep 2014
    • 6

    Available Personal Genomes

    Hi,

    We are resequencing segmental duplications of high sequence identity and there is very little overlap with dbSNP in these regions. Are there any personal genomes available other than Venter's which are sequenced with clones or a different long read technology which can effectively identify variation in these regions? It would be great to have a couple of points of comparison, in addition to Venter's personal genome, to sanity check and see that SNP concordance agrees across both the whole genome and these regions. Any input would be greatly appreciated.

    Thanks!
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Here are some complete genomics genomes from PGP: https://my.pgp-hms.org/public_geneti...&commit=Search

    Comment

    • lh3
      Senior Member
      • Feb 2008
      • 686

      #3
      The best so far is the CHM1 pacbio assembly, but I don't if it has been publicly released yet. NA12878 also has pacbio assembly and public moleculo data (from 1000g ftp). These will be useful for investigating hard regions.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        CHM1 PacBio data has been released: http://blog.pacificbiosciences.com/2...erage-for.html

        Comment

        • ddoopus
          Junior Member
          • Sep 2014
          • 6

          #5
          I can't find Pacbio assembly of NA12878, do you know where this is
          available?

          From what I can tell, all of the genomes from PGP are sequenced with
          Complete Genomics which I thought had a relatively short read length. The
          personal genome *vcfs from hg19 are on UCSC. I don't understand how they
          able to call variants in repeat regions which standard whole genome
          Illumina 100bp reads can not disambiguate. Are these variants possibly
          the result of liftover errors from hg18 to hg19 for segmental duplications
          which were collapsed in the older version? Can these variants be trusted
          at all?

          I found the Pacbio assembly for chm1, but can only see the raw reads in the link
          given so this doesn't help that much. I found the supplementary material
          for the paper on biorxiv:



          and CHM1_to_GRCh37_lite_snvs.site_filtered.pass.vcf is the only file which
          looks relevant, but the hetero:homo ratio of that vcf is 0.04 which looks
          suspect. Is there a different resource available than this which may not
          display this issue?

          Any other suggestions would be greatly appreciated. It would be great to
          have hg19 *vcfs which have variants in these regions which can be trusted.

          Thanks!

          Comment

          • lh3
            Senior Member
            • Feb 2008
            • 686

            #6
            PacBio assembly of CHM1 is here:



            It is different from the version I was looking at, but I believe it should be equally good. The NA12878 PacBio assembly has not been released yet.

            CHM1 is a haploid sample. Very low het:hom ratio is expected.

            EDIT: I should add that I am extremely impressed by the CHM1 assembly done by Jason Chin.
            Last edited by lh3; 09-25-2014, 11:29 AM.

            Comment

            • ddoopus
              Junior Member
              • Sep 2014
              • 6

              #7
              Ah, thanks for the clarification it is actually mentioned directly in their biorxiv paper but I overlooked it.

              Thanks!

              Comment

              • lh3
                Senior Member
                • Feb 2008
                • 686

                #8
                I overlooked it, too... An author told me the link yesterday.

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  #9
                  Originally posted by ddoopus View Post
                  From what I can tell, all of the genomes from PGP are sequenced with Complete Genomics which I thought had a relatively short read length. The personal genome *vcfs from hg19 are on UCSC. I don't understand how they able to call variants in repeat regions which standard whole genome Illumina 100bp reads can not disambiguate. Are these variants possibly the result of liftover errors from hg18 to hg19 for segmental duplications which were collapsed in the older version? Can these variants be trusted at all?
                  I have a lot of experience with Complete Genomics data (but a bad memory, so the details are slightly fuzzy). Their reads are super-short. IIRC each "read" consists of 2x10bp fragments and 2x15bp fragments, or something like that, with unknown normally-distributed distances between the pieces but ~50% of the time the distance is one specific value, like 2bp. So you get reads like:
                  10bp sequenced, 0-2 bp unsequenced, 15bp sequenced, ~10bp unsequenced, 15bp sequenced, 0-2bp unsequenced, 10 bp sequenced.
                  ...roughly. I think some of the "readlets" were 5bp. Anyway, they are nothing like other platforms.

                  As a result, you cannot do de-novo assembly with them, and I would never trust them in long repetitive regions. In my testing, they are quite accurate for calling SNPs (using CG's calls) but abysmal at indels, with almost no concordance to indels called from 2x100bp Illumina data, or indels that could possibly have been inherited when analyzing sequenced parents+child trios. And FYI, the way they call indels is by de-novo reassembling the areas around suspected indels using reads that map spanning it, not directly from the reads.

                  I would not include CG genomes if you are studying 'difficult' parts of the genome that are low-complexity, repetitive, highly variable, or are interested in indels.
                  Last edited by Brian Bushnell; 09-25-2014, 05:58 PM.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  21 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  29 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-26-2026, 10:12 AM
                  0 responses
                  31 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...