Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Different assemblers respond to errors and trimming in different ways. For example, if you read the SGA paper, the authors recommend not to trim reads. Allpath-lg does not trim reads, either, as I remember. Other assemblers, such as SPAdes, may be less sensitive to trimming as they trim reads by default. Also, I have read somewhere (could be wrong) that SOAPdenovo developers recommend not to correct reads if you have enough RAM, but SGA/Allpath-lg etc always include error correction as a necessary step. At the end of day, which trimming/error correct approach to use is assembler dependent. If it were me, I would just use the tools/pipelines recommended by the developers. If I had time, I would combine different strategies/correctors and see what I would get. Probably the result is data dependent.

    K-mer based error correctors typically use short k-mers. I think that is fine. With shorter k-mers, we more often collapse segmental duplications/repeats and will not be able to correct errors when they occur right at the sites differentiating repeats. However, only a small fraction of errors are not correctable due to repeats. If such errors can be corrected with long k-mers, assemblers can usually handle them well. I would not worry to much about the k-mer length in error correction, unless it is too short.

    Comment


    • #17
      Originally posted by lh3 View Post
      Different assemblers respond to errors and trimming in different ways. For example, if you read the SGA paper, the authors recommend not to trim reads. Allpath-lg does not trim reads, either, as I remember. Other assemblers, such as SPAdes, may be less sensitive to trimming as they trim reads by default. Also, I have read somewhere (could be wrong) that SOAPdenovo developers recommend not to correct reads if you have enough RAM, but SGA/Allpath-lg etc always include error correction as a necessary step. At the end of day, which trimming/error correct approach to use is assembler dependent. If it were me, I would just use the tools/pipelines recommended by the developers. If I had time, I would combine different strategies/correctors and see what I would get. Probably the result is data dependent.
      I've been trying to follow the pipeline of the folks that assembled the Giant Panda, as they used SOAPdenovo and exclusively short Illumina reads (I think the group that developed SOAPdenovo is the same that put out the Giant Panda assembly). Unfortunately, documentation with SOAPdenovo isn't the most thorough. According to their Nature paper (supplemental section), they did trim for low quality bases at the 3' end before error correction, though they don't specify their threshold.

      After looking at the parameters in SOAPec, it seems that they might have trimmed for Q<2 because that is the only threshold available for trimming during error correction in that program.

      I'm going to first try letting SOAPec do the trimming and error correction from the full raw data. Right now I have all paired-end data, and I don't think SOAPec can handle both paired-end and single end data while still keeping pairs together in the output.

      If that doesn't work too well, I'll try trimming first with something like Q<10 before error correction.

      Comment


      • #18
        Originally posted by jwag View Post
        I'm going to first try letting SOAPec do the trimming and error correction from the full raw data. Right now I have all paired-end data, and I don't think SOAPec can handle both paired-end and single end data while still keeping pairs together in the output.
        You just have to do two seperate runs with the same output from KmerFreq.

        Comment


        • #19
          Originally posted by Wallysb01 View Post
          You just have to do two seperate runs with the same output from KmerFreq.
          Ah that makes sense. So I can just initially put in all my data into KmerFreq, so that it counts all instances of K, and I can use any of my data against that same frequency distribution (even in smaller chunks). That will definitely save me some time. Thanks for the tip.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 05-10-2024, 06:35 AM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-09-2024, 02:46 PM
          0 responses
          26 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-07-2024, 06:57 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-06-2024, 07:17 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Working...
          X