Seqanswers Leaderboard Ad

**GenoMax** · 03-16-2014, 02:27 PM

Originally posted by arcolombo698 View Post

Hello. Thank you in advance.

do I just use the universal adapter as the input for cutadapt???

See this guide: http://onetipperday.blogspot.com/201...torprimer.html

**arcolombo698** · 03-16-2014, 04:27 PM

Cutadapt

Hello.

Yes this website was helpful.

Does cutadapt take a Fasta Adapter file which specifies which adapters to cut out? it does not appear that the -b, -g -a can cut the 28 different sequences I need trimmed.

Thank you again in advance

**GenoMax** · 03-16-2014, 05:08 PM

Use "trim galore" which is a wrapper for cutadapt to simplify things.

**mmartin** · 03-17-2014, 02:18 AM

There is a section in the README about this:

GitHub - marcelm/cutadapt: Cutadapt removes adapter sequences from sequencing reads

https://github.com/marcelm/cutadapt/#illumina-truseq

Cutadapt removes adapter sequences from sequencing reads - marcelm/cutadapt

In short, just run:

Code:

cutadapt -a AGATCGGAAGAGC -o trimmed.1.fastq.gz reads.1.fastq.gz
cutadapt -a AGATCGGAAGAGC -o trimmed.2.fastq.gz reads.2.fastq.gz

See the other sections in the README if you need to do more specialized things.

**vishwesh** · 03-27-2014, 08:17 AM

Hi,
I am using cutadapt for removing the adapter sequence. I have 2 adapter sequence.

RNA 5Adapter (RA5)
5 GUUCAGAGUUCUACAGUCCGACGAUC
RNA 3?Adapter (RA3)
5 TGGAATTCTCGGGTGCCAAGG

The 1st one is 5' adapter and 2nd is 3' adapter.

I am using the following command line to remove the adapter seq.

cutadapt -a TGGAATTCTCGGGTGCCAAGG -g GUUCAGAGUUCUACAGUCCGACGAUC input.fastq > output.fastq

Length Distribution I get
Mean sequence length: 32.49 ± 10.53 bp
Minimum length: 16 bp
Maximum length: 51 bp
Length range: 36 bp
Mode length: 51 bp with 2,852,626 sequences

And I found that the 5' adapter has U instead of T. Will that be fine?

I tried replacing U with T GUUCAGAGUUCUACAGUCCGACGAUC > GTTCAGAGTTCTACAGTCCGACGATC and tried removing adapter sequence.

cutadapt -a TGGAATTCTCGGGTGCCAAGG -g GTTCAGAGTTCTACAGTCCGACGATC input.fastq > output.fastq

Length Distribution I get
Mean sequence length: 31.26 ± 11.29 bp
Minimum length: 1 bp
Maximum length: 51 bp
Length range: 51 bp
Mode length: 51 bp with 2,805,271 sequences

I get varied length distribution in both the cases. Which one should I choose..
First is the command that I am using is right??

Kindly let me know.

Thanks in advance.

Regards
Vishwesh

**mmartin** · 03-27-2014, 11:46 PM

Originally posted by vishwesh View Post

Hi,
cutadapt -a TGGAATTCTCGGGTGCCAAGG -g GUUCAGAGUUCUACAGUCCGACGAUC input.fastq > output.fastq

Cutadapt removes only one adapter per read, so you need to run it twice with each adapter or specify the option --times=2. Also, you should use specify the 3' adapter starting with a "^" like so: -g ^GTTCAGAG...

And I found that the 5' adapter has U instead of T. Will that be fine?

No, not in cutadapt versions up to 1.4.2. But since it's a very good idea to support this, I just added this feature to cutadapt: Starting with cutadapt 1.5, all Us will be automatically replaced with Ts in the adapter sequence.

**foivos** · 04-23-2014, 04:40 AM

Hi guys

I am using cutadapt 1.3 (I will update to the new version soon), but there is an issue. After trimming the adapter it leaves some empty lines.

Is there a way not to leave empty lines? I don't want to write a script that parses again the file and fixes it.

Thank you in advance

**GenoMax** · 04-23-2014, 05:01 AM

Originally posted by foivos View Post

Hi guys

I am using cutadapt 1.3 (I will update to the new version soon), but there is an issue. After trimming the adapter it leaves some empty lines.

Is there a way not to leave empty lines? I don't want to write a script that parses again the file and fixes it.

Thank you in advance

Following only addresses issue of removing empty lines (I assume the results file is otherwise ok). It may be safer to write to a temp file instead of overwriting the original: http://stackoverflow.com/questions/1...om-a-unix-file

**mmartin** · 04-23-2014, 05:26 AM

Originally posted by foivos View Post

I am using cutadapt 1.3 (I will update to the new version soon), but there is an issue. After trimming the adapter it leaves some empty lines.

Is there a way not to leave empty lines?

Are you talking about reads that have a length of zero? This will appear as empty lines in the output file. Use cutadapt's --minimum-length option and set it to 1 or some higher value to avoid getting empty reads.

Do not do what is described in the stackoverflow link because it will break your FASTQ file.

**foivos** · 04-23-2014, 06:09 AM

No I will not remove the lines as described in stackoverflow.

I will isolate the problem, as it is part of a pipeline and I will make a new post soon.

**foivos** · 04-23-2014, 06:18 AM

Here is what I get

@BS-DSFCONTROL03:317:C3PGTACXX:2:1101:4054:2147 1:N:0:GCCAAT
TTAGGAAGAGGATAACAATTNGAAACAGTTGCTAAAACTCTATATGC
+
CCCFFFFFGHHHHJJJJJJJ#4AHGGIJIJJIJIJJJJJJJJJJJJJ
@BS-DSFCONTROL03:317:C3PGTACXX:2:1101:4107:2164 1:N:0:GCCAAT
AGTACCCCATGGAC
+
?1?DD?BDA:C;22
@BS-DSFCONTROL03:317:C3PGTACXX:2:1101:4138:2178 1:N:0:GCCAAT
ATCGACACTTCGAACGCACTTGCGGCCCCGGGTTCCTCCCGGGGCTACGCC
+
CCCFFFFFHHHHHJJJJJJJJJIJJGGJJ:FG-5@D>EEH<?A@/'5<;;B
@BS-DSFCONTROL03:317:C3PGTACXX:2:1101:4219:2179 1:N:0:GCCAAT

+

@BS-DSFCONTROL03:317:C3PGTACXX:2:1101:4242:2199 1:Y:0:GCCAAT
CATACAGGACTCTTTCGAGGCCCTC
+
==>A+2@<+?+?22<A+23)@C+1=

It keeps the identifier and the "+" and removes the adapter and the sequence.

I want it to remove everyting and not leave any gaps...

**sp144** · 07-31-2014, 03:34 PM

You can do that in post-processing. Just put everything on one line using sed:

sed 'N;N;N;s/\\n/\\t/g'

then remove lines containing \t+\t and after change all \t to \n.

Marcel, is version 1.5 up yet? I can only find 1.4.2 as the latest version. If not, when do you anticipate 1.5 to be out?
Thanks!

**mmartin** · 08-05-2014, 03:49 AM

Originally posted by sp144 View Post

Marcel, is version 1.5 up yet? I can only find 1.4.2 as the latest version. If not, when do you anticipate 1.5 to be out?
Thanks!

Taking your question as a motivation: I've just released cutadapt 1.5! As always, see https://code.google.com/p/cutadapt/ for the changelog and download it from PyPI. Or, even better, just use "pip install cutadapt". Here is a copy of the changelog:

Adapter sequences can now be read from a FASTA file. For example, write -a file:adapters.fasta to read 3' adapters from adapters.fasta. This works also for -b and -g. This fixes the long-standing issue #33. Note that cutadapt isn't really optimized for trimming dozens or even hundreds of adapters!
There is now an option --mask-adapter, which can be used to not remove adapters, but to instead mask them with N characters. Thanks to Vittorio Zamboni for contributing this feature!
U characters in the adapter sequence are automatically converted to T.
Add the option -u/--cut, which can be used to unconditionally remove a number of bases from the beginning or end of each read.
When the new option --quiet is used, no report is printed after all reads have been processed.
When processing paired-end reads, cutadapt now checks whether the reads are properly paired.
To handle paired-end reads, an option --untrimmed-paired-output was added.

**captainentropy** · 08-14-2014, 04:23 PM

Hi mmartin,

I'm using the latest version (1.5) and I noticed the format of the info file doesn't seem to match exactly with the documentation on github (https://github.com/marcelm/cutadapt/...ster/README.md). According to it there's supposed to be 8 columns but I only get 7. Column 5 (Sequence of the read before the adapter match) seems to have been removed, yes?

It's not a big deal I don't think as I can recreate the full read by concatenating columns 5 and 6, like the page says ("The concatenation of the fields 5-6 yields the full read sequence."). Or am I missing something?

thanks!

**mmartin** · 08-18-2014, 03:10 AM

Originally posted by captainentropy View Post

According to it there's supposed to be 8 columns but I only get 7.

There should still be eight fields, but perhaps one of the columns is empty? In that case, you'd have two consecutive tabs within a single line and it'd appear as if you only have seven fields.

Column 5 (Sequence of the read before the adapter match) seems to have been removed, yes?

The format hasn't changed, but I realize that the wording in the README is confusing: The "Sequence of the read before the adapter match" is actually the "sequence of the read to the left of the adapter match".

I've tried to clarify all this in the README now. I've also fixed a mistake in the description of how to get the original read sequence: You need to concatenate columns 5-7, not columns 5-6. Hope that helps!

Topics	Statistics	Last Post
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, 05-14-2024, 07:03 AM	0 responses 24 views 0 likes	Last Post by seqadmin 05-14-2024, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 44 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 58 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 44 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News