Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Katerina Cvikova
    Junior Member
    • Apr 2014
    • 6

    Data preprocessing

    Hi,
    do you use a some rule for trimming/removing or saving reads with the worst quality for DNA and RNA assembly/alignment. How do you decide to trimm/remove/save? According the type of analysis, coverage, base quality.... ???? Thanks a lot.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    In general you can do without explicit quality filtering, unless you know that you data contains a significant portion of low quality reads.

    If you are aligning to a known reference you can get away with using data that may be as low as Q10. If you are doing de novo work then that would likely be the only case where quality score based filtering is warranted. You would want to filter out data Q25 or lower for that application.

    Comment

    • Dario1984
      Senior Member
      • Jun 2011
      • 166

      #3
      It's always best to do adapter trimming, even if the aligner you use later can do soft-clipping, such as STAR. It would work moderately faster if it didn't have to attempt to map the adapter sequences to the genome and then soft-clip the sequences when no alignment was found. You can have a look at the overall quality profile of the reads in a sample using a tool like FastQC. It's worth applying quality filtering, even if very few reads in the dataset are removed by it. There could be a rare circumstance where it makes an important difference.

      Comment

      • deep639
        Member
        • Dec 2013
        • 10

        #4
        Its always good to trim the adapters and do quality trimming before running alignment. Quality trimming is usually done at q20 level. Programs like trim_galore can auto detect the adapter sequence that needs to be trimmed based on the input reads.

        Comment

        • Brian Bushnell
          Super Moderator
          • Jan 2014
          • 2709

          #5
          Originally posted by deep639 View Post
          Its always good to trim the adapters and do quality trimming before running alignment. Quality trimming is usually done at q20 level. Programs like trim_galore can auto detect the adapter sequence that needs to be trimmed based on the input reads.
          I disagree with the part about quality-trimming. There is at least one published study indicating trimming to high levels like Q20 is generally detrimental to alignment, which agrees with my observations. However, it depends on the aligner. For something like bowtie1 which can only tolerate 3 mismatches, long reads >100bp might indeed need very aggressive quality-trimming... but then, you shouldn't use bowtie1 with long reads.

          I generally suggest the range of Q5-Q12 for quality-trimming using HiSeq/MiSeq Illumina data that has the full range of quality scores. Illumina is moving toward binned and inaccurate Q-scores on its latest platforms, though, so the utility of quality-trimming is going to be reduced.

          Comment

          • deep639
            Member
            • Dec 2013
            • 10

            #6
            Originally posted by Brian Bushnell View Post
            I disagree with the part about quality-trimming. There is at least one published study indicating trimming to high levels like Q20 is generally detrimental to alignment, which agrees with my observations. However, it depends on the aligner. For something like bowtie1 which can only tolerate 3 mismatches, long reads >100bp might indeed need very aggressive quality-trimming... but then, you shouldn't use bowtie1 with long reads.

            I generally suggest the range of Q5-Q12 for quality-trimming using HiSeq/MiSeq Illumina data that has the full range of quality scores. Illumina is moving toward binned and inaccurate Q-scores on its latest platforms, though, so the utility of quality-trimming is going to be reduced.
            I think trim_galore by default sets it quality trimming to 20, I have never tried to change the setting, usually a small percentage of reads are thrown out because of quality and adapter trimming.

            Comment

            • Brian Bushnell
              Super Moderator
              • Jan 2014
              • 2709

              #7
              The problem is not so much the fraction of data that is discarded, but rather, the bias - Illumina read quality is affected by sequence content, so a high quality-trimming or quality-filtering thresholds can disparately impact certain genomic regions. This is particularly important for quantitative analyses. And regardless of bias, longer reads give more accurate mapping. The confidence of a 250bp alignment, and the ability to place it correctly despite inexact repeats in the genome, is much higher compared to a 150bp read, even if the last 100bp of the 250bp read are only Q17 and thus would be expected to contain 2 mismatches. For variant-calling, q-trimming can be done AFTER alignment to allow the most accurate mapping possible.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM
              • seqadmin
                Investigating the Gut Microbiome Through Diet and Spatial Biology
                by seqadmin




                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                02-24-2025, 06:31 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 05:03 AM
              0 responses
              16 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              13 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              16 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              185 views
              0 reactions
              Last Post seqadmin  
              Working...