Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bharat_iyengar
    Member
    • Dec 2012
    • 20

    Quality filtering

    I am new to the field but from what I gathered is that it is essential to filter sequences with low quality (phred scores).

    But what I generally observed is that I lose ~70-80% reads from some publicly available data when I apply a filter to remove reads with mean quality score < 20 (phred 33/illumina 1.8+) and individual nucleotide score < 10.

    I am not sure whether it is a very stringent criterion.

    When I asked one of the data submitters regarding the filtering criteria that they used he said that they didn't use any filters.

    Can someone please tell me what is the right cut-off, which would both minimize data loss and preserve reliability.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    You might consider just quality trimming and keeping all reads above some minumum size (that's what I do). The ends of reads are often low quality, so trimming those off will probably help a lot here. There are a number of publicly available quality trimmers to do that (trimmomatic, etc.). I trim bases off with qual < 20 and just toss resulting reads less than 16 or so (the minimum will depend on what you're doing, your aligner, and if you have paired-end reads). Another possibility would be to also toss resulting reads with mean quality scores below 20 (after trimming, that number should be seriously reduced).

    If you have paired-end reads, make sure whatever trimming program you use can handle that. There are a lot of posts on this forum from people with out of sync paired-end fastq files because they didn't do that.

    Comment

    • Apexy
      Member
      • Apr 2011
      • 62

      #3
      This is quite an old thread but you may consider novel perpectives when you engage with future datasets. If it is RNA-Seq you are dealing with, take a look at this paper http://journal.frontiersin.org/Journ...014.00017/full

      Comment

      Latest Articles

      Collapse

      • GATTACAT
        Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by GATTACAT
        Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
        Yesterday, 11:43 AM
      • SEQadmin2
        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by SEQadmin2


        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

        Here are nine questions we think about, in roughly the order they matter, before...
        06-18-2026, 07:11 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, Today, 11:08 AM
      0 responses
      6 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-30-2026, 05:37 AM
      0 responses
      11 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-26-2026, 11:10 AM
      0 responses
      18 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-17-2026, 06:09 AM
      0 responses
      52 views
      0 reactions
      Last Post SEQadmin2  
      Working...