How would the masking look like? I imagine lowercase vs. uppercase would be an option. Could you please add this as a feature request to the issue tracker on http://cutadapt.googlecode.com/ ?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
What's the best way to cutadapt Illumina reads where there are 2 adaptors such as:
5' - adaptor1 - sequence - adaptor2 - 3'
I want to the adaptors on both sides. What I have been doing is running cutadapt twice:
a. Run cut-adapt with -a flag with adaptor2
b. Feed the above output to another cut-adapt with -g flag of adaptor1
Is that the best method to handle cutting both the adaptors?
Many thanks,
Phillipe
Comment
-
Originally posted by chjiao View Postbut I wish that you could make the 5' adaptors cut for color-space data, since in the situation when Adaptors connected to another Adaptors this function is necessary.
Comment
-
Originally posted by kbhit View PostWhat's the best way to cutadapt Illumina reads where there are 2 adaptors such as:
5' - adaptor1 - sequence - adaptor2 - 3'
I want to the adaptors on both sides. What I have been doing is running cutadapt twice:
a. Run cut-adapt with -a flag with adaptor2
b. Feed the above output to another cut-adapt with -g flag of adaptor1
Is that the best method to handle cutting both the adaptors?
Comment
-
Originally posted by mmartin View PostI have just released cutadapt 1.1, which adds this feature and is also 30% faster than before. See the release announcement at http://code.google.com/p/cutadapt/
Comment
-
cutadapt trimming length issue
Hi all,
I just used cutadapt to process some libraries I made and got a result that I don't quite understand. It appears that cutadapt trimmed more than 21 bases (the length of the adapter sequence) from many of the reads. I would think that the longest length that cutadapt could trim from the reads would be 21 bp if given a 21 bp-long adapter. Here is my output:
$ cutadapt -q 20 --quality-base=64 -a TCGTATGCCGTCTTCTGCTTG wt.fastq -o wt_adapttrim.fastq
cutadapt version 1.1
Command line parameters: -q 20 --quality-base=64 -a TCGTATGCCGTCTTCTGCTTG wt.fastq -o wt_adapttrim.fastq
Maximum error rate: 10.00%
Processed reads: 54310589
Trimmed reads: 8911163 ( 16.4%)
Total basepairs: 2111179389 (2111.2 Mbp)
Trimmed basepairs: 100971255 (101.0 Mbp) (4.78% of total)
Too short reads: 0 ( 0.0% of processed reads)
Too long reads: 0 ( 0.0% of processed reads)
Total time: 1918.18 s
Time per read: 0.04 ms
=== Adapter 1 ===
Adapter 'TCGTATGCCGTCTTCTGCTTG', length 21, was trimmed 8911163 times.
Lengths of removed sequences
length count expected
3 1088578 848603.0
4 735568 212150.7
5 672468 53037.7
6 645706 13259.4
7 570424 3314.9
8 507225 828.7
9 449549 207.2
10 387252 51.8
11 320108 12.9
12 298280 3.2
13 277658 0.8
14 277521 0.2
15 244603 0.1
16 239725 0.0
17 202783 0.0
18 205865 0.0
19 202092 0.0
20 233860 0.0
21 208748 0.0
22 136163 0.0
>=23 1006987 0.0
Any answers would be greatly appreciated!
This tool is super and very intuitive, thank you for it.
Comment
-
Originally posted by kerhard View PostHi all,
I just used cutadapt to process some libraries I made and got a result that I don't quite understand. It appears that cutadapt trimmed more than 21 bases (the length of the adapter sequence) from many of the reads. I would think that the longest length that cutadapt could trim from the reads would be 21 bp if given a 21 bp-long adapter. Here is my output:
$ cutadapt -q 20 --quality-base=64 -a TCGTATGCCGTCTTCTGCTTG wt.fastq -o wt_adapttrim.fastq
cutadapt version 1.1
Command line parameters: -q 20 --quality-base=64 -a TCGTATGCCGTCTTCTGCTTG wt.fastq -o wt_adapttrim.fastq
Maximum error rate: 10.00%
Processed reads: 54310589
Trimmed reads: 8911163 ( 16.4%)
Total basepairs: 2111179389 (2111.2 Mbp)
Trimmed basepairs: 100971255 (101.0 Mbp) (4.78% of total)
Too short reads: 0 ( 0.0% of processed reads)
Too long reads: 0 ( 0.0% of processed reads)
Total time: 1918.18 s
Time per read: 0.04 ms
=== Adapter 1 ===
Adapter 'TCGTATGCCGTCTTCTGCTTG', length 21, was trimmed 8911163 times.
Lengths of removed sequences
length count expected
3 1088578 848603.0
4 735568 212150.7
5 672468 53037.7
6 645706 13259.4
7 570424 3314.9
8 507225 828.7
9 449549 207.2
10 387252 51.8
11 320108 12.9
12 298280 3.2
13 277658 0.8
14 277521 0.2
15 244603 0.1
16 239725 0.0
17 202783 0.0
18 205865 0.0
19 202092 0.0
20 233860 0.0
21 208748 0.0
22 136163 0.0
>=23 1006987 0.0
Any answers would be greatly appreciated!
This tool is super and very intuitive, thank you for it.
If someone could verify if this is the case, or whether I am missing some other reason, it would be much appreciated.
Comment
-
Originally posted by kerhard View PostSorry, I wasn't reading the description of the options carefully enough. I think I understand now what the above results indicate. Using the -a option and given an adapter 21 bp in length, a longer length than 21 bp would be trimmed by cutadapt from a given read if the adapter was found in the 5' end of the read, followed by more sequence.
If someone could verify if this is the case, or whether I am missing some other reason, it would be much appreciated.
Comment
-
Originally posted by mmartin View PostYes, that is correct. The column indicates the length of the removed sequence, which includes the bases after the adapter if there are any. It used to indicate the length of the matching adapter in earlier cutadapt versions, but I think that was less helpful.
Thanks for confirmation. I'm glad I tried out cutadapt, as I was assuming my libraries were absent of adapter sequences, which turns out not to be true at all.
I suppose I still don't understand how some of these reads can have sequence AFTER the 3' adapters (eg., adapters found in the middle of the read). Searching for the full adapter sequence in the raw read files by hand, I notice that many times the sequences found after the 3' adapter are a string of A's. For example:
AGTCTADAPTERAAAAAAAAAAAA
TGCGTACGRACTADAPTERAAAAA
Any ideas as to what that means and how that may happen? Are these from the sequencing reactions on the Illumina machines or are these from library constructions?
Comment
-
Originally posted by kerhard View PostAny ideas as to what that means and how that may happen? Are these from the sequencing reactions on the Illumina machines or are these from library constructions?
On further thought, I would also expect that artifacts from the sequencing process (the basecaller calling a base although there really is none) would lead to a sequence of random nucleotides, but I’m only speculating here.
Comment
-
Can Cutadapt give the results in csfasta format.
I have solid data and I used cutadapt for removing the adapters in colorspace but there is an option of removing the adapters and then the resultant file in fastq format. can't I get it in csfasta format.
Here is the result of cutadaptIs it correct)
Command line parameters: -c -e 0.12 -a 330201030313112312 -x abc: --maq -o output.fastq /home/rimpi/solid/hubert/mergecs/1A.csfasta /home/rimpi/solid/hubert/mergecs/1A_QV.qual
Maximum error rate: 12.00%
Processed reads: 35052447
Trimmed reads: 23518246 ( 67.1%)
Total basepairs: 2593881078 (2593.9 Mbp)
Trimmed basepairs: 881159025 (881.2 Mbp) (33.97% of total)
Too short reads: 0 ( 0.0% of processed reads)
Too long reads: 0 ( 0.0% of processed reads)
Total time: 3386.48 s
Time per read: 0.10 ms
=== Adapter 1 ===
Adapter '330201030313112312', length 18, was trimmed 23518246 times.
Lengths of removed sequences
length count expected
3 294756 547694.5
4 122284 136923.6
5 155337 34230.9
6 196812 8557.7
7 260390 2139.4
8 318801 534.9
9 772603 133.7
10 419888 33.4
11 140579 8.4
12 214331 2.1
13 229820 0.5
14 456111 0.1
15 120289 0.0
16 107822 0.0
17 169000 0.0
18 235124 0.0
19 206157 0.0
>=20 19098142 0.0
Comment
-
Sorry, csfasta/qual output is not supported at the moment. You will have to use a separate program to convert colorspace FASTQ to csfasta/qual (I think someone posted one in the forum). Don't use the --maq option if you do that. Which read mapper do you use that does not support colorspace FASTQ?
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-02-2024, 08:06 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
05-02-2024, 08:06 AM
|
||
Started by seqadmin, 04-30-2024, 12:17 PM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-30-2024, 12:17 PM
|
||
Started by seqadmin, 04-29-2024, 10:49 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-29-2024, 10:49 AM
|
||
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
Comment