Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Different assemblers respond to errors and trimming in different ways. For example, if you read the SGA paper, the authors recommend not to trim reads. Allpath-lg does not trim reads, either, as I remember. Other assemblers, such as SPAdes, may be less sensitive to trimming as they trim reads by default. Also, I have read somewhere (could be wrong) that SOAPdenovo developers recommend not to correct reads if you have enough RAM, but SGA/Allpath-lg etc always include error correction as a necessary step. At the end of day, which trimming/error correct approach to use is assembler dependent. If it were me, I would just use the tools/pipelines recommended by the developers. If I had time, I would combine different strategies/correctors and see what I would get. Probably the result is data dependent.

    K-mer based error correctors typically use short k-mers. I think that is fine. With shorter k-mers, we more often collapse segmental duplications/repeats and will not be able to correct errors when they occur right at the sites differentiating repeats. However, only a small fraction of errors are not correctable due to repeats. If such errors can be corrected with long k-mers, assemblers can usually handle them well. I would not worry to much about the k-mer length in error correction, unless it is too short.

    Comment


    • #17
      Originally posted by lh3 View Post
      Different assemblers respond to errors and trimming in different ways. For example, if you read the SGA paper, the authors recommend not to trim reads. Allpath-lg does not trim reads, either, as I remember. Other assemblers, such as SPAdes, may be less sensitive to trimming as they trim reads by default. Also, I have read somewhere (could be wrong) that SOAPdenovo developers recommend not to correct reads if you have enough RAM, but SGA/Allpath-lg etc always include error correction as a necessary step. At the end of day, which trimming/error correct approach to use is assembler dependent. If it were me, I would just use the tools/pipelines recommended by the developers. If I had time, I would combine different strategies/correctors and see what I would get. Probably the result is data dependent.
      I've been trying to follow the pipeline of the folks that assembled the Giant Panda, as they used SOAPdenovo and exclusively short Illumina reads (I think the group that developed SOAPdenovo is the same that put out the Giant Panda assembly). Unfortunately, documentation with SOAPdenovo isn't the most thorough. According to their Nature paper (supplemental section), they did trim for low quality bases at the 3' end before error correction, though they don't specify their threshold.

      After looking at the parameters in SOAPec, it seems that they might have trimmed for Q<2 because that is the only threshold available for trimming during error correction in that program.

      I'm going to first try letting SOAPec do the trimming and error correction from the full raw data. Right now I have all paired-end data, and I don't think SOAPec can handle both paired-end and single end data while still keeping pairs together in the output.

      If that doesn't work too well, I'll try trimming first with something like Q<10 before error correction.

      Comment


      • #18
        Originally posted by jwag View Post
        I'm going to first try letting SOAPec do the trimming and error correction from the full raw data. Right now I have all paired-end data, and I don't think SOAPec can handle both paired-end and single end data while still keeping pairs together in the output.
        You just have to do two seperate runs with the same output from KmerFreq.

        Comment


        • #19
          Originally posted by Wallysb01 View Post
          You just have to do two seperate runs with the same output from KmerFreq.
          Ah that makes sense. So I can just initially put in all my data into KmerFreq, so that it counts all instances of K, and I can use any of my data against that same frequency distribution (even in smaller chunks). That will definitely save me some time. Thanks for the tip.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-25-2024, 11:49 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          62 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X