Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adapter Trimming Illumina Mate Pairs (Nextera Protocol)

    I have just received our first Illumina mate pair libraries constructed using the Nextera protocol. Unlike prior Illumina libraries these now contain junction adapters either in duplicate or singularly as well as read specific external adapters.

    Before re-inventing the wheel and writting a perl script to trim the adapters according to the guidelines documented in http://www.illumina.com/documents/pr...processing.pdf I would like to know if any pre-existing trimming tools have been already modified to correctly identify and remove the Nextera adapters from these mate pair libraries as its seems more sensible to use a tools that has already proven to do the job correctly.

    The adapters are:

    >Nextera_circularized_duplicate_junction_adapter
    CTGTCTCTTATACACATCTAGATGTGTATAAGAGACAG
    >Nextera_circularized_single_junction_adapter
    CTGTCTCTTATACACATCT
    >Nextera_circularized_single_junction_adapter_reverse_complement
    AGATGTGTATAAGAGACAG
    >Nextera_read_1_external_adapter
    ATCGGAAGAGCACACGTCTGAACTCCAGTCAC
    >Nextera_read_2_external_adapter
    GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

    Any suggestions on which pre-existing (or new tools) are already available to trim adapters from Nextera based Illumin Mate Pair libraries would be appreciated.

  • #2
    Hi hrarnc,

    You should be able to do this using Cutadapt. I think the appropriate commands would be something like (untested):

    cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC read1.fastq > read1_trimmed.fastq
    cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -a GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT read2.fastq > read2_trimmed.fastq

    Cheers,
    Roy.

    Comment


    • #3
      Originally posted by Roy View Post
      Hi hrarnc,

      You should be able to do this using Cutadapt. I think the appropriate commands would be something like (untested):

      cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC read1.fastq > read1_trimmed.fastq
      cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -a GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT read2.fastq > read2_trimmed.fastq

      Cheers,
      Roy.
      Thanks for the suggestion Roy. Cutadapt is one of the tools we do use. Yes it could trim the adapters. However the technote for the Nextera Mate Pair libraries documents that there are two types of read pairs present. Those with a single adapter and those with the duplicate adapter and dependent on which of these senarios the read pairs will be either forward-reverse or reverse-forward and thus could be separated into two output streams based on the presence of adapters singularly or in duplicate. It would therefore seem sensible to have a one step trimmer/partitioner for the read pairs as the adapters present determine which stream the read pairs would get written to and might therefore obviate the need to partition by subsequent mapping and partitioning based on FR read mapping using bowtie or bwa or other tools.

      So what I was really asking was has anyone already written an specific tools that takes into account both the trimming and parititioning of reads into the two streams (RF and FR reads) all at once in single tool as so far I have not found any that do this.
      Last edited by hrarnc; 04-04-2013, 12:05 PM.

      Comment


      • #4
        I'm not sure that would be possible. In figure 2 in the technote, a) and e) both result in sequence reads with no adapter contamination, so it would not be possible to distinguish them without mapping. To partition the reads which contain adapter, you could perhaps use multiple rounds of cutadapt - using the option to write untrimmed reads to a separate file.

        Comment


        • #5
          Originally posted by Roy View Post
          I'm not sure that would be possible. In figure 2 in the technote, a) and e) both result in sequence reads with no adapter contamination, so it would not be possible to distinguish them without mapping. To partition the reads which contain adapter, you could perhaps use multiple rounds of cutadapt - using the option to write untrimmed reads to a separate file.
          The following pseudocode might do it:

          Foreach read pair
          case in
          a): duplicate junction adapter so is RF orientation
          trim adapters as appropriate
          write to RF stream read pair files
          b): Adapter at one end only (duplicated)
          Read 1 (or 2) culled dependent on which read has the adapter - read 1 is depicted in image - cull read
          Write other read to unpaired stream
          c): Single adapter at one end
          Read 1 trimmed, read 2 no trimming required
          Write to FR stream read pair files
          d): duplicate junction adapter closer to one end than other so not in one read of the pair
          Read 1 trimmed, read 2 no trimming
          Write to RF stream read pair files
          e): No adapters present
          Write untrimmed reads to FR stream read pair files

          It should distill down to partitioning based on presence/absence of adapters on a read pair basis and should therefore be tractable as a single process. However that would require a new tools since there does not appear to be one already available from Illumina. Mapping would need to be done to prove the trimming/partitioning worked successfully.

          Your suggestion of multiple rounds of cutadapt is a good one and would be required as the external adapters need to be first removed prior to handling the junction adapters.

          I should also point out that the technote does suggest using biopieces or AdapterRemoval but as yet I have not used either of these and will need to give them a try asap.
          Last edited by hrarnc; 04-04-2013, 02:36 PM.

          Comment


          • #6
            There's no guarantee that the reads from case a would include an adapter - this would depend on the fragment size and read length. So I think you would still need to distinguish FR from RF by mapping/assembly.

            Comment


            • #7
              Hi there,

              I am in the same situation. Thank you guys for your thoughts. I entirely agree about:
              - trimming the external adapters first and then the junction adapter, because cutadapt manual says "If multiple -a, -b or -g options are given, only the best matching adapter is trimmed".
              - the requirement to check whether the reads map in a "proper pair" configuration anyway, because the absence of (junction adapter) trimming could be due to case "a" or "e" indistinctly.
              - the possibility to identify distinct subsets of reads from specific cases by using the different options of cutadapt in an iterative fashion.

              More specifically, I see two ways to proceed about the junction adapter:

              A) the easy way
              1. Use cutadapt with the -b option
              2. Use everything from the previous step -whether trimming occurred or not- and distinguish the genuine mate-pairs (MP) from the paired-ends (PE) depending on the mapping orientation: FR for the PE and RF for the MP

              B) the tricky way
              1. Use the "-g" option of cutadapt: trimmed reads indicate PE from case "c)" => they should map in FR orientation. Use these guys as a "reliable" subset of PE reads, to estimate the fragment size distribution of your library for instance (I am referring to the fragments after circularization).
              2. Use the option "-a" of cutadapt on the untrimmed reads from the previous step: trimmed reads indicate MP from case "d)" => they should map in RF orientation.
              3. Everything else should be untrimmed reads from cases "a)" or "e)". Hopefully most of them are MP from case "a)" thanks to the biotin enrichment.
              This method should work even when reads overlap with each other (for instance if your fragment is shorter than twice the read length).

              Does that make sense?

              Comment


              • #8
                Originally posted by syfo View Post
                Does that make sense?
                Apologies for belated reply. We have decided to use our routine adapter remover (fastq-mcf from ea-utils) to remove the external adapters. Then we are implementing an in-house perl script to trim and partition read pairs as there are only a very small % of reads where the adapters are not perfect in our present data.

                Hopefully I will be able to let you know how it goes in a week or so.

                Comment


                • #9
                  Originally posted by syfo View Post
                  Hi there,

                  I am in the same situation. Thank you guys for your thoughts. I entirely agree about:
                  - trimming the external adapters first and then the junction adapter, because cutadapt manual says "If multiple -a, -b or -g options are given, only the best matching adapter is trimmed".
                  - the requirement to check whether the reads map in a "proper pair" configuration anyway, because the absence of (junction adapter) trimming could be due to case "a" or "e" indistinctly.
                  - the possibility to identify distinct subsets of reads from specific cases by using the different options of cutadapt in an iterative fashion.

                  More specifically, I see two ways to proceed about the junction adapter:

                  A) the easy way
                  1. Use cutadapt with the -b option
                  2. Use everything from the previous step -whether trimming occurred or not- and distinguish the genuine mate-pairs (MP) from the paired-ends (PE) depending on the mapping orientation: FR for the PE and RF for the MP

                  B) the tricky way
                  1. Use the "-g" option of cutadapt: trimmed reads indicate PE from case "c)" => they should map in FR orientation. Use these guys as a "reliable" subset of PE reads, to estimate the fragment size distribution of your library for instance (I am referring to the fragments after circularization).
                  2. Use the option "-a" of cutadapt on the untrimmed reads from the previous step: trimmed reads indicate MP from case "d)" => they should map in RF orientation.
                  3. Everything else should be untrimmed reads from cases "a)" or "e)". Hopefully most of them are MP from case "a)" thanks to the biotin enrichment.
                  This method should work even when reads overlap with each other (for instance if your fragment is shorter than twice the read length).

                  Does that make sense?
                  So I guess from the tricky way, read 2 should be trimmed using 'a' first and then '-g'?

                  I have the data with junction adapter shorter than the read length, so it can present anywhere or overlap to the read. Is it the same situation with c) when the adapter appears in the read? -> FR?

                  Thanks!

                  Comment


                  • #10
                    Hi all,

                    I am in the same situation as hrarnc and I'd like to restart the discussion.
                    Can anyone suggest an efficient way/script to remove adapters from illumina mate-pair libraries ?

                    Thanks for your help !

                    Comment


                    • #11
                      Do I commit sin ?

                      Comment


                      • #12
                        Hi all,

                        We eventually implemented the following pipeline:

                        1. Trim the fastq files with the same cutadapt command

                        cutadapt -a CTGTCTCTTATACACATCT -a AGATGTGTATAAGAGACAG -m 20 $fastq1 2>$outdir/cutadaptR1.log > $trimmed1
                        cutadapt -a CTGTCTCTTATACACATCT -a AGATGTGTATAAGAGACAG -m 20 $fastq2 2>$outdir/cutadaptR2.log > $trimmed2

                        The -m option removes very short products (otherwise you can even get an empty sequence, which all mappers don't like). However you then need to remove the asymetric pairs.

                        2. Identify the valid pairs that have a read in both files (with something like "comm -12 $id1 $id2 > $ids") and get rid of the other ones to make sure your two fastq files are symetrical -meaning that they contain the same read IDs.

                        3. Reverse complement the sequences (I know, that's ugly).

                        4. Align with bwa.

                        Quite dirty but it worked for us.

                        Anything better?

                        Comment


                        • #13
                          Thanks for your reply syfo, I will test it soon as possible.

                          Thanks !

                          Comment


                          • #14
                            Originally posted by syfo View Post
                            Hi all,

                            We eventually implemented the following pipeline:

                            1. Trim the fastq files with the same cutadapt command

                            cutadapt -a CTGTCTCTTATACACATCT -a AGATGTGTATAAGAGACAG -m 20 $fastq1 2>$outdir/cutadaptR1.log > $trimmed1
                            cutadapt -a CTGTCTCTTATACACATCT -a AGATGTGTATAAGAGACAG -m 20 $fastq2 2>$outdir/cutadaptR2.log > $trimmed2

                            The -m option removes very short products (otherwise you can even get an empty sequence, which all mappers don't like). However you then need to remove the asymetric pairs.

                            2. Identify the valid pairs that have a read in both files (with something like "comm -12 $id1 $id2 > $ids") and get rid of the other ones to make sure your two fastq files are symetrical -meaning that they contain the same read IDs.

                            3. Reverse complement the sequences (I know, that's ugly).

                            4. Align with bwa.

                            Quite dirty but it worked for us.

                            Anything better?
                            I haven't done it yet, but based on the discussion above, I like the idea of removing the external adapters first, then going back for the internal ones - I think this will work:

                            Code:
                            cutadapt -a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m 20 $fastq1 | cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -m 20 > $fastq1_trimmed
                            
                            cutadapt -a GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -m 20 $fastq2 | cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -m 20 > $fastq2_trimmed
                            This would be followed by steps to ensure pairs in each file. I use novoalign to align which allows alignment of two orientations.

                            Thoughts?
                            Last edited by hartmaier; 07-14-2013, 05:59 AM.

                            Comment


                            • #15
                              Apologies - its been some time since I first posted the question but here is the quick and dirty approach which is very simple. Take the advice from the original Illumina tech reps we spoke to some years ago on what to do with mates which was to trim the mate pairs to 36 bases in length an dod the same with the new Nextera libraries. After that we remove duplicate mates (using a simple perl script) then screen for adapters using fastq-mcf (-t 0.0001) but if you wish you can omit this step since they are effectively nullified by the removal of duplicate pairs. Then use the trimmed mates with SSPACE which is what we use for scaffolding. This apporach is what we have been using for libraries from the prior protocol and works with the new Nextera libraries (use with 4, 8, 12 kb Nextera libs). It saves on complicated adapter removal protocols. SSPACE config parameters should ensure that paired end contamination is unlikely to result in scaffolding.

                              Disclaimer - may not work so well if you are "feeding" your mate pairs to the assembler directly instead of using with SSPACE

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Non-Coding RNA Research and Technologies
                                by seqadmin




                                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                Nobel Prize for MicroRNA Discovery
                                This week,...
                                10-07-2024, 08:07 AM
                              • seqadmin
                                Recent Developments in Metagenomics
                                by seqadmin





                                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                09-23-2024, 06:35 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 06:35 AM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 02:44 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-11-2024, 06:55 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-02-2024, 04:51 AM
                              0 responses
                              111 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X