Seqanswers Leaderboard Ad

**Roy** · 04-04-2013, 04:39 AM

Hi hrarnc,

You should be able to do this using Cutadapt. I think the appropriate commands would be something like (untested):

cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC read1.fastq > read1_trimmed.fastq
cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -a GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT read2.fastq > read2_trimmed.fastq

Cheers,
Roy.

**hrarnc** · 04-04-2013, 12:02 PM

Originally posted by Roy View Post

Hi hrarnc,

You should be able to do this using Cutadapt. I think the appropriate commands would be something like (untested):

cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC read1.fastq > read1_trimmed.fastq
cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -a GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT read2.fastq > read2_trimmed.fastq

Cheers,
Roy.

Thanks for the suggestion Roy. Cutadapt is one of the tools we do use. Yes it could trim the adapters. However the technote for the Nextera Mate Pair libraries documents that there are two types of read pairs present. Those with a single adapter and those with the duplicate adapter and dependent on which of these senarios the read pairs will be either forward-reverse or reverse-forward and thus could be separated into two output streams based on the presence of adapters singularly or in duplicate. It would therefore seem sensible to have a one step trimmer/partitioner for the read pairs as the adapters present determine which stream the read pairs would get written to and might therefore obviate the need to partition by subsequent mapping and partitioning based on FR read mapping using bowtie or bwa or other tools.

So what I was really asking was has anyone already written an specific tools that takes into account both the trimming and parititioning of reads into the two streams (RF and FR reads) all at once in single tool as so far I have not found any that do this.

**Roy** · 04-04-2013, 12:43 PM

I'm not sure that would be possible. In figure 2 in the technote, a) and e) both result in sequence reads with no adapter contamination, so it would not be possible to distinguish them without mapping. To partition the reads which contain adapter, you could perhaps use multiple rounds of cutadapt - using the option to write untrimmed reads to a separate file.

**hrarnc** · 04-04-2013, 01:03 PM

Originally posted by Roy View Post

I'm not sure that would be possible. In figure 2 in the technote, a) and e) both result in sequence reads with no adapter contamination, so it would not be possible to distinguish them without mapping. To partition the reads which contain adapter, you could perhaps use multiple rounds of cutadapt - using the option to write untrimmed reads to a separate file.

The following pseudocode might do it:

Foreach read pair
case in
a): duplicate junction adapter so is RF orientation
trim adapters as appropriate
write to RF stream read pair files
b): Adapter at one end only (duplicated)
Read 1 (or 2) culled dependent on which read has the adapter - read 1 is depicted in image - cull read
Write other read to unpaired stream
c): Single adapter at one end
Read 1 trimmed, read 2 no trimming required
Write to FR stream read pair files
d): duplicate junction adapter closer to one end than other so not in one read of the pair
Read 1 trimmed, read 2 no trimming
Write to RF stream read pair files
e): No adapters present
Write untrimmed reads to FR stream read pair files

It should distill down to partitioning based on presence/absence of adapters on a read pair basis and should therefore be tractable as a single process. However that would require a new tools since there does not appear to be one already available from Illumina. Mapping would need to be done to prove the trimming/partitioning worked successfully.

Your suggestion of multiple rounds of cutadapt is a good one and would be required as the external adapters need to be first removed prior to handling the junction adapters.

I should also point out that the technote does suggest using biopieces or AdapterRemoval but as yet I have not used either of these and will need to give them a try asap.

**Roy** · 04-05-2013, 02:05 AM

There's no guarantee that the reads from case a would include an adapter - this would depend on the fragment size and read length. So I think you would still need to distinguish FR from RF by mapping/assembly.

**syfo** · 04-11-2013, 10:09 AM

Hi there,

I am in the same situation. Thank you guys for your thoughts. I entirely agree about:
- trimming the external adapters first and then the junction adapter, because cutadapt manual says "If multiple -a, -b or -g options are given, only the best matching adapter is trimmed".
- the requirement to check whether the reads map in a "proper pair" configuration anyway, because the absence of (junction adapter) trimming could be due to case "a" or "e" indistinctly.
- the possibility to identify distinct subsets of reads from specific cases by using the different options of cutadapt in an iterative fashion.

More specifically, I see two ways to proceed about the junction adapter:

A) the easy way
1. Use cutadapt with the -b option
2. Use everything from the previous step -whether trimming occurred or not- and distinguish the genuine mate-pairs (MP) from the paired-ends (PE) depending on the mapping orientation: FR for the PE and RF for the MP

B) the tricky way
1. Use the "-g" option of cutadapt: trimmed reads indicate PE from case "c)" => they should map in FR orientation. Use these guys as a "reliable" subset of PE reads, to estimate the fragment size distribution of your library for instance (I am referring to the fragments after circularization).
2. Use the option "-a" of cutadapt on the untrimmed reads from the previous step: trimmed reads indicate MP from case "d)" => they should map in RF orientation.
3. Everything else should be untrimmed reads from cases "a)" or "e)". Hopefully most of them are MP from case "a)" thanks to the biotin enrichment.
This method should work even when reads overlap with each other (for instance if your fragment is shorter than twice the read length).

Does that make sense?

**hrarnc** · 04-16-2013, 08:48 PM

Originally posted by syfo View Post

Does that make sense?

Apologies for belated reply. We have decided to use our routine adapter remover (fastq-mcf from ea-utils) to remove the external adapters. Then we are implementing an in-house perl script to trim and partition read pairs as there are only a very small % of reads where the adapters are not perfect in our present data.

Hopefully I will be able to let you know how it goes in a week or so.

**christinawu2008** · 05-21-2013, 02:22 PM

Originally posted by syfo View Post

Hi there,

I am in the same situation. Thank you guys for your thoughts. I entirely agree about:
- trimming the external adapters first and then the junction adapter, because cutadapt manual says "If multiple -a, -b or -g options are given, only the best matching adapter is trimmed".
- the requirement to check whether the reads map in a "proper pair" configuration anyway, because the absence of (junction adapter) trimming could be due to case "a" or "e" indistinctly.
- the possibility to identify distinct subsets of reads from specific cases by using the different options of cutadapt in an iterative fashion.

More specifically, I see two ways to proceed about the junction adapter:

A) the easy way
1. Use cutadapt with the -b option
2. Use everything from the previous step -whether trimming occurred or not- and distinguish the genuine mate-pairs (MP) from the paired-ends (PE) depending on the mapping orientation: FR for the PE and RF for the MP

B) the tricky way
1. Use the "-g" option of cutadapt: trimmed reads indicate PE from case "c)" => they should map in FR orientation. Use these guys as a "reliable" subset of PE reads, to estimate the fragment size distribution of your library for instance (I am referring to the fragments after circularization).
2. Use the option "-a" of cutadapt on the untrimmed reads from the previous step: trimmed reads indicate MP from case "d)" => they should map in RF orientation.
3. Everything else should be untrimmed reads from cases "a)" or "e)". Hopefully most of them are MP from case "a)" thanks to the biotin enrichment.
This method should work even when reads overlap with each other (for instance if your fragment is shorter than twice the read length).

Does that make sense?

So I guess from the tricky way, read 2 should be trimmed using 'a' first and then '-g'?

I have the data with junction adapter shorter than the read length, so it can present anywhere or overlap to the read. Is it the same situation with c) when the adapter appears in the read? -> FR?

Thanks!

**Diegodescarpates** · 05-31-2013, 12:14 AM

Hi all,

I am in the same situation as hrarnc and I'd like to restart the discussion.
Can anyone suggest an efficient way/script to remove adapters from illumina mate-pair libraries ?

Thanks for your help !

**Diegodescarpates** · 05-31-2013, 05:03 AM

Do I commit sin ?

**syfo** · 06-06-2013, 08:51 AM

Hi all,

We eventually implemented the following pipeline:

1. Trim the fastq files with the same cutadapt command

cutadapt -a CTGTCTCTTATACACATCT -a AGATGTGTATAAGAGACAG -m 20 $fastq1 2>$outdir/cutadaptR1.log > $trimmed1
cutadapt -a CTGTCTCTTATACACATCT -a AGATGTGTATAAGAGACAG -m 20 $fastq2 2>$outdir/cutadaptR2.log > $trimmed2

The -m option removes very short products (otherwise you can even get an empty sequence, which all mappers don't like). However you then need to remove the asymetric pairs.

2. Identify the valid pairs that have a read in both files (with something like "comm -12 $id1 $id2 > $ids") and get rid of the other ones to make sure your two fastq files are symetrical -meaning that they contain the same read IDs.

3. Reverse complement the sequences (I know, that's ugly).

4. Align with bwa.

Quite dirty but it worked for us.

Anything better?

**Diegodescarpates** · 06-06-2013, 12:32 PM

Thanks for your reply syfo, I will test it soon as possible.

Thanks !

**hartmaier** · 06-24-2013, 01:11 PM

Originally posted by syfo View Post

Hi all,

We eventually implemented the following pipeline:

1. Trim the fastq files with the same cutadapt command

cutadapt -a CTGTCTCTTATACACATCT -a AGATGTGTATAAGAGACAG -m 20 $fastq1 2>$outdir/cutadaptR1.log > $trimmed1
cutadapt -a CTGTCTCTTATACACATCT -a AGATGTGTATAAGAGACAG -m 20 $fastq2 2>$outdir/cutadaptR2.log > $trimmed2

The -m option removes very short products (otherwise you can even get an empty sequence, which all mappers don't like). However you then need to remove the asymetric pairs.

2. Identify the valid pairs that have a read in both files (with something like "comm -12 $id1 $id2 > $ids") and get rid of the other ones to make sure your two fastq files are symetrical -meaning that they contain the same read IDs.

3. Reverse complement the sequences (I know, that's ugly).

4. Align with bwa.

Quite dirty but it worked for us.

Anything better?

I haven't done it yet, but based on the discussion above, I like the idea of removing the external adapters first, then going back for the internal ones - I think this will work:

Code:

cutadapt -a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -m 20 $fastq1 | cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -m 20 > $fastq1_trimmed

cutadapt -a GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -m 20 $fastq2 | cutadapt -b CTGTCTCTTATACACATCT -b AGATGTGTATAAGAGACAG -m 20 > $fastq2_trimmed

This would be followed by steps to ensure pairs in each file. I use novoalign to align which allows alignment of two orientations.

Thoughts?

**hrarnc** · 07-04-2013, 03:19 PM

Apologies - its been some time since I first posted the question but here is the quick and dirty approach which is very simple. Take the advice from the original Illumina tech reps we spoke to some years ago on what to do with mates which was to trim the mate pairs to 36 bases in length an dod the same with the new Nextera libraries. After that we remove duplicate mates (using a simple perl script) then screen for adapters using fastq-mcf (-t 0.0001) but if you wish you can omit this step since they are effectively nullified by the removal of duplicate pairs. Then use the trimmed mates with SSPACE which is what we use for scaffolding. This apporach is what we have been using for libraries from the prior protocol and works with the new Nextera libraries (use with 4, 8, 12 kb Nextera libs). It saves on complicated adapter removal protocols. SSPACE config parameters should ensure that paired end contamination is unlikely to result in scaffolding.

Disclaimer - may not work so well if you are "feeding" your mate pairs to the assembler directly instead of using with SSPACE

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Adapter Trimming Illumina Mate Pairs (Nextera Protocol)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News