Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trim off variable length 'N' strings at the end of the read

    Hi,

    I need to remove all 'N' strings in a fastq file. I have paired end files and there are N strings at the end of some reads of variable length (both the reads and the N strings are of variable length). I can't find any tool to do this. Trimmomatic will remove bases based on their quality score. Fastx_trimmer will keep 'x' first bases.

    Anyone has a script for this or knows of a tool? It is important that the tool deals with paired files and keeps the pairs 'alive' after trimming in both files.

    ps: I tried to install nesoni but I am uncapable to do this in the server without root permisions and an older python version.

    Thanx
    Illinu

  • #2
    Aren't the quality scores of the Ns very low? Trimming by quality will normally remove stretches of them (at least unless they're in the middle, which happens).

    Comment


    • #3
      As dpryan said, quality-trimming should suffice; Ns should have a quality of zero, so you can just set the quality-trim threshold at 1. For example, with BBTools:

      reformat.sh in1=read1.fq in2=read2.fq out1=trimmed1.fq out2=trimmed2.fq qtrim=rl trimq=1

      That will keep the pairs together. That program will also automatically convert the quality of Ns to zero, if they happen to be non-zero.

      Comment


      • #4
        You can trim poly-Ns with PRINSEQ. There are (at least) three options to control the trimming, one to specify the minimum length of Ns at the 3-prime end, another option to specify the maximum N percentage to allow, and one option to specify the max number of Ns to allow.

        Code:
        -trim_ns_right <integer>
                    Trim poly-N tail with a minimum length of trim_ns_right at the
                    3'-end.
        
        -ns_max_p <integer>
                    Filter sequence with more than ns_max_p percentage of Ns.
        
        -ns_max_n <integer>
                    Filter sequence with more than ns_max_n Ns.

        Comment


        • #5
          SES, thank you for this tip I think it's the approach I was looking for.
          To answer the previous posts, I checked precisely this and surprisingly the scores are high. I don't understand why but I was expecting them to be null if nothing.

          Comment


          • #6
            Originally posted by SES View Post
            You can trim poly-Ns with PRINSEQ. There are (at least) three options to control the trimming, one to specify the minimum length of Ns at the 3-prime end, another option to specify the maximum N percentage to allow, and one option to specify the max number of Ns to allow.

            Code:
            -trim_ns_right <integer>
                        Trim poly-N tail with a minimum length of trim_ns_right at the
                        3'-end.
            
            -ns_max_p <integer>
                        Filter sequence with more than ns_max_p percentage of Ns.
            
            -ns_max_n <integer>
                        Filter sequence with more than ns_max_n Ns.
            SES, I am thinking now... this will not handle paired files right?

            Comment


            • #7
              Originally posted by illinu View Post
              To answer the previous posts, I checked precisely this and surprisingly the scores are high. I don't understand why but I was expecting them to be null if nothing.
              But like I said, BBTools will automatically change the quality of N bases to 0, because it makes no sense for them to have any other quality. So they will be trimmed anyway.

              Comment


              • #8
                Originally posted by illinu View Post
                SES, I am thinking now... this will not handle paired files right?
                Yes, I believe recent versions of PRINSEQ will handle paired-end files correctly. If you run into issues, you could use Pairfq to re-pair your reads and separate the singletons after trimming.

                Comment


                • #9
                  I tried the bbmap option and it worked beautifully! Only 5 minutes wallclock time. The programs needs no installation it runs with java.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Genetic Variation in Immunogenetics and Antibody Diversity
                    by seqadmin



                    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                    Yesterday, 07:24 PM
                  • seqadmin
                    Choosing Between NGS and qPCR
                    by seqadmin



                    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                    10-18-2024, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 11-01-2024, 06:09 AM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-30-2024, 05:31 AM
                  0 responses
                  21 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-24-2024, 06:58 AM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-23-2024, 08:43 AM
                  0 responses
                  56 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X