Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trimming - looking for a complete solution

    Hi, I found this previous discussion which covers a lot of what I'd like to know:

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    but not quite all! I am working with HaloPlex data. Before alignment, I need to remove Haloplex adapters, and also clip 5bp from both ends of both forward and reverse reads. I should also not be left with any empty or orphan (i.e. unmatched reads).

    I had previously been taking an approach to trim adapters with cutadapt, use a separate Perl script to remove the 5bp, then re-run cutadapt with a 'fake' adpater sequence to drop zero-length reads, then finally run another script to drop orphans. While this works, it seems tools like Trimmomatic or Trim Galore could achieve the same in a more efficient one-step manner.

    My problem is therefore that neither tool seems to deal with both ends of the reads:

    Trimmomatic has 'CROP: Cut the read to a specified length by removing bases from the end'

    Trim Galore has --clip_R1 <int> and --clip_R2 <int> to remove <int> bp from the 5' end of read 1 and read 2.

    Unless I've misunderstood, this only deals with one end of the reads. The reason I need to clip these bases from both ends is to remove residual bases from the restriction enzyme footprint.

    TIA!

  • #2
    Trimmomatic also has HEADCROP, which removes bases from the 5' end of the reads.

    Comment


    • #3
      Sorry - there's an error is my OP - HEADCROP is the option I meant to mention. CROP is actually not much use to me, as it's the opposite of what I'd like to do (specifying the length of sequence to be left behind as opposed to what to remove), so I still have the situation that I can only clip from one end (5').

      Ideally (in the case of Trimmomatic) I'm looking for a 'TAILCROP' option...

      Comment


      • #4
        Originally posted by girlmonkey View Post
        Sorry - there's an error is my OP - HEADCROP is the option I meant to mention. CROP is actually not much use to me, as it's the opposite of what I'd like to do (specifying the length of sequence to be left behind as opposed to what to remove), so I still have the situation that I can only clip from one end (5').

        Ideally (in the case of Trimmomatic) I'm looking for a 'TAILCROP' option...
        I guess it depends at which stage of the trimming and adapter removal steps you need to cut the bases from the 3' end, if you can do it as the first step, then CROP would be OK, unless your reads are all different lengths.

        Comment


        • #5
          Thanks for your reply. The reads are initially all the same length (150bp), but adapter trimming should come first (after which they are all different lengths) before the clipping of 5bp from the ends.

          Comment


          • #6
            We have just implemented two new options into Trim Galore (--three_prime_clip_r1 and --three_prime_clip_r2) to clip off any number of bases from the 3' ends of reads after adapter/quality trimming has finished. girlmonkey is just testing the new version, if it works fine it will find its way into the next release.

            Comment


            • #7
              PRINSEQ has many options for trimming the 3' end of reads. There is '--trim_right' for trimming a specified length, '--trim_right_p' for trimming a certain percentage, '--trim_ns_right' for trimming poly-N tails, '--trim_qual_right' for trimming by a certain quality threshold, and '--trim_to_len' to specify trimming to a certain length.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-25-2024, 11:49 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X