Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pacbio sequence error correction

    Hi all,

    I have some pacbio long read data, about 10x coverage of a 120M genome. I already have the reference genome. However it is not complete and there are many gaps in it. What I am trying to do is to error correct my pacbio sequence and assemble the genome. Later on I will add more illumina data trying to close the gaps.

    My question about he error correction is: Can I use the incomplete reference genome to error correct my pacbio data? My plan is to convert the genome fasta into pacBioToCA required frg format. And then feed my pacbio data and the genome frg data to the correction pipeline to output error corrected data. My concern is : will pacBioToCA accept relatively long genome scalfold data as high identity sequence to correct my pacbio data?

    Suggestions and help is greatly appreciatedl

    Stuart

  • #2
    I am not able to figure out how I can use the incomplete reference genome for error correction. It looks like FastaToCA converts fastq file to frg file so that it can be used as high identity sequence for error correction. However, the incomplete genome assembly in fasta file. there is no quality score files can be found. How can I get around this?

    many thanks!

    Stuart

    Comment


    • #3
      Perhaps use the pbjelly pipeline to fill gaps? Also, with an appropriate pipeline (quiver: https://github.com/PacificBiosciences/GenomicConsensus) you may not need error correction to call accurate consensus.

      cheers,
      -mark

      Comment


      • #4
        Thanks for the tips! Mark. It looks like it will take me a while to figure this out. However, It sounds like interesting to me when you say I might not need to do error correction for pacbiodate since it it has 15% error rate.

        STuart

        Comment


        • #5
          Some more tips: if you want to use pacBioToCA, the approach would be to use the raw Illumina data as input to the correction step, not the draft assembly. The advantage of going back to the raw data is you may be able to correct assembly errors. The disadvantage is it takes longer to run.

          If you want to keep the assembly as is, you can install SMRT Analysis and use AHA (a hybrid assembler) to scaffold it, provided your the genome is less than about 200 MB. For larger genomes, or to really focus on the gap-filling, you can use pbjelly.

          Finally, the "no error correction" suggestion refers to the new algorithm HGAp: http://www.pacbiodevnet.com/hgap. You'll need more PacBio coverage to go that route. The benefit is you may be able to close more gaps and get a final result that's potentially as accurate as Sanger finishing.

          Comment


          • #6
            Thanks for your tips! jbingham. I am in the process of generating short illumina data for the error correction. I think I don't have enough coverage to try the new algorithm since my pacbio data only gives 3-4 times coverage when look into those data more carefully. The most majority of them are less than 500bp and 1000bp. Longest read is 13kb. I will post my process later.

            Thanks again to Winsettz and jbingham for helping out here!

            Stuart

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Developments in Metagenomics
              by seqadmin





              Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
              09-23-2024, 06:35 AM
            • seqadmin
              Understanding Genetic Influence on Infectious Disease
              by seqadmin




              During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

              Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
              09-09-2024, 10:59 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 10-02-2024, 04:51 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-01-2024, 07:10 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-30-2024, 08:33 AM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-26-2024, 12:57 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Working...
            X