Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mmartin
    Member
    • Aug 2009
    • 73

    cutadapt: A tool that removes adapter sequences

    I'm pleased to announce the tool 'cutadapt', which we have been using in our research group for adapter removal in high-throughput sequencing data. Removing adapter sequences from reads is necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing small RNAs.

    Since special code is included to handle color space data correctly, the tool may be especially useful for people who do not use Applied Biosystem's Corona pipeline.

    cutadapt is under the MIT license.

    Please see the web page for a feature list and a link to a downloadable package:
  • Torst
    Senior Member
    • Apr 2008
    • 275

    #2
    Originally posted by mmartin View Post
    I'm pleased to announce the tool 'cutadapt
    http://cutadapt.googlecode.com/
    It seems your code only runs under Python 2.6 ?

    For Centos 5.x, which is a bit behind, I had to install the "python26" packages and change the #!/usr/bin/python to #!/usr/bin/python26.

    Comment

    • mmartin
      Member
      • Aug 2009
      • 73

      #3
      Yes, Python 2.6 is needed, thanks for the pointer. It wouldn't be hard to support Python 2.5, but some 2.6 features make the transition to the Python 3 syntax easier, so I would like to stick to it. I have updated the homepage to reflect the requirement of Python 2.6.

      Comment

      • HiroMishima
        Member
        • Aug 2009
        • 15

        #4
        3'-end partial match of adapters

        Hi,

        I have a question about Cutadapt version 0.3.

        Does Cutadapt cut partial sequences of adapters?

        According to "Statistics for adapter" messages, Cutadapt seems to recognize 3'-end partial match of adapters. However, only full-matched adapter sequences are removed in output files.

        Comment

        • mmartin
          Member
          • Aug 2009
          • 73

          #5
          Yes, cutadapt recognizes partial adapters. That is, if your adapter is ADAPTER and your read is MYSEQUENCEADAP, then the resulting sequence is MYSEQUENCE. In fact, these are some examples of input sequences that will result in MYSEQUENCE:
          MYSEQUENCEADAPTER
          MYSEQUENCEADAP
          MYSEQUENCEADPAPTERSOMETHINGELSE

          Could you give an example of the problematic read you encounter and the output of cutadapt for that read?

          Comment

          • HiroMishima
            Member
            • Aug 2009
            • 15

            #6
            Originally posted by mmartin View Post
            Could you give an example of the problematic read you encounter and the output of cutadapt for that read?
            I found that I used two -a options and used adapter sequences were almost reverse complement each other. Probably I do not have to use two -a options in this case. Hopefully, these examples clarify the situation.

            sample.fastq:
            Code:
            @read1
            GATCCTCCTGGAGCTGGCTGATACCAGTATACCAGTGCTGATTGTTGAATTTCAGGAATTTCTCAAGCTCGGTAGC
            +
            hhhhhhhhhhahhhhhehhffhghhehdgghhheddggfhfhhgffhddhhfffhhffhfgggffddfdfffcdfb
            @read2
            CTCGAGAATTCTGGATCCTCTCTTCTGCTACCTTTGGGATTTGCTTGCTCTTGGTTCTCTAGTTCTTGTAGTGGTG
            +
            hhhhhhhhhhhhhhhhhhhhhhhhhhgghghhhhhhhhgaddeeadaa^dadaa_aaaaababca_aa__^[T^[Z
            And next result is OK:
            Code:
            $python cutadapt -a CTCGAGAATTCTGGATCCTC sample.fastq
            
            @read1
            CTGGAGCTGGCTGATACCAGTATACCAGTGCTGATTGTTGAATTTCAGGAATTTCTCAAGCTCGGTAGC
            +
            hhhahhhhhehhffhghhehdgghhheddggfhfhhgffhddhhfffhhffhfgggffddfdfffcdfb
            @read2
            TCTTCTGCTACCTTTGGGATTTGCTTGCTCTTGGTTCTCTAGTTCTTGTAGTGGTG
            +
            hhhhhhgghghhhhhhhhgaddeeadaa^dadaa_aaaaababca_aa__^[T^[Z
            However, in next results, read1 still contains "GATCCTC" in the 5' end:
            Code:
            $python cutadapt -a CTCGAGAATTCTGGATCCTC -a GAGGATCCAGAATTCTCGAGTT sample.fastq
            
            @read1
            GATCCTCCTGGAGCTGGCTGATACCAGTATACCAGTGCTGATTGTTGAATTTCAGGAATTTCTCAAGCTCGGTAGC
            +
            hhhhhhhhhhahhhhhehhffhghhehdgghhheddggfhfhhgffhddhhfffhhffhfgggffddfdfffcdfb
            @read2
            TCTTCTGCTACCTTTGGGATTTGCTTGCTCTTGGTTCTCTAGTTCTTGTAGTGGTG
            +
            hhhhhhgghghhhhhhhhgaddeeadaa^dadaa_aaaaababca_aa__^[T^[Z

            Comment

            • mmartin
              Member
              • Aug 2009
              • 73

              #7
              Hi, actually, you do have to use two -a options since currently reverse complements are not automatically searched for.

              I managed to reproduce the problem you encountered and I have prepared a new release that hopefully fixes it. You can download v0.4 from the homepage and see whether the bug is actually fixed. Thanks for reporting this!

              Comment

              • gaffa
                Member
                • Oct 2010
                • 82

                #8
                I haven't looked into the details of the program, but I wonder how straightforward it would be to use the program to filter out and discard the entire reads that match an adapter, rather just removing that part and re-using the trimmed read?

                Comment

                • mmartin
                  Member
                  • Aug 2009
                  • 73

                  #9
                  Since this isn't too hard, I just added that feature. cutadapt now has the option "--discard", which does exactly that: If an adapter is found in the read, then the read is discarded and not trimmed.

                  Comment

                  • HiroMishima
                    Member
                    • Aug 2009
                    • 15

                    #10
                    Originally posted by mmartin View Post
                    Hi, actually, you do have to use two -a options since currently reverse complements are not automatically searched for.

                    I managed to reproduce the problem you encountered and I have prepared a new release that hopefully fixes it. You can download v0.4 from the homepage and see whether the bug is actually fixed. Thanks for reporting this!
                    Everything's perfect! cutadapt 0.5.1 worked well with two -a options.

                    I believe that cutadapt is one of the best adopter sequence trimmer especially in term of simpleness and speed.

                    Thanks again for prompt update.

                    Comment

                    • sdavis
                      Member
                      • Jan 2010
                      • 14

                      #11
                      This looks a very useful tool. Could I suggest that you accept gzipped fastq files as an alternative input format as a simple convenience?

                      Comment

                      • mmartin
                        Member
                        • Aug 2009
                        • 73

                        #12
                        Good idea. Since this was on my to do list as well, I have just implemented this feature and released cutadapt 0.6.

                        Comment

                        • bioinfosm
                          Senior Member
                          • Jan 2008
                          • 483

                          #13
                          cool, that was fast!
                          --
                          bioinfosm

                          Comment

                          • thinkRNA
                            Member
                            • Jan 2010
                            • 94

                            #14
                            can you please add an option to remove all N's or C's etc? I think this will be helpful. Also, can you describe in detail how error rate is calculated?

                            Comment

                            • gaffa
                              Member
                              • Oct 2010
                              • 82

                              #15
                              Originally posted by mmartin View Post
                              Since this isn't too hard, I just added that feature. cutadapt now has the option "--discard", which does exactly that: If an adapter is found in the read, then the read is discarded and not trimmed.
                              Fantastic!

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 08:59 AM
                              0 responses
                              11 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...