Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Never mind, I realised that decompressing my file and then running trim_galore will bypass zcat. It then works.
Cheers
Rob
EDIT: thanks Felix. I wrote this before I saw your response - I was "never minding" my question not your response!
I have a solution which works; I will decompress before running trim_galore
Leave a comment:
-
Originally posted by Rob Weeks View PostI have only just begun to look at RRBS data. I am trying to use trim_galore to quality trim and adaptor trim my sequences. I am doing this in OS X.
Now when I run 'trim_adaptor filename.fastq.gz' it returns an error due to 'zcat: can't stat: filename.fastq.gz (filename.fastq.gz.Z): No such file or directory".
This is apparently a problem only in OS X, but it is not clear to me how I can get around this problem.
Any ideas would be appreciated
Cheers
I have changed the way Trim Galore reads from files from using a cat stream to using gunzip -c and it seems to work well. I can send you a copy of this tonight as I am at the Festival of Genomics in London all day if you send me an email. Alternatively you could try to change the filename of your input to end in .gz.Z and try that?
Good luck, Felix
Leave a comment:
-
I have only just begun to look at RRBS data. I am trying to use trim_galore to quality trim and adaptor trim my sequences. I am doing this in OS X.
Now when I run 'trim_adaptor filename.fastq.gz' it returns an error due to 'zcat: can't stat: filename.fastq.gz (filename.fastq.gz.Z): No such file or directory".
This is apparently a problem only in OS X, but it is not clear to me how I can get around this problem.
Any ideas would be appreciated
Cheers
Leave a comment:
-
Hi whargrea, the absence of documentation for parallelization does indeed mean that reads are trimmed by calling a single instance of Cutadapt at a time. Since trimming is a one-off process that doesn't really take that long (a matter of hours) compared to the data collection process (often a matter of several days) or other downstream operations (up to several weeks?) we don't tend to bother about it very much. The easiest solution would probably to run all your 48 trims in parallel (even though this might be quite intense on the disc I/O part), or try to find another trimmer that supports parallel trimming natively.
Leave a comment:
-
Hi,
I've been going through the documentation and searching forum threads etc. looking to see if trim_galore can be run in a multi-core multi-thread manner. So far the total lack of information in this regard seems to point towards it not having such a capability.
I'm not sure if this is the appropriate place to ask but I was wondering why this is the case? I have 48 files of ~120mil reads each that I need to perform trimming on and being able to parallelize would greatly boost the speed at which this could be done. It seems to me that since each read is trimmed independently trimming software should easily scale to any number of cores. Am I correct in this assumption or am I missing something?
Cheers.
Leave a comment:
-
Trim Galore should derive its output files from the filenames, so this will only redirect any other output to the screen to a file, so not overly useful but it won't harm.
The trimming algorithm to trim qualities is described in the Cudatapt option -q:
Code:-q [5'CUTOFF,]3'CUTOFF, --quality-cutoff=[5'CUTOFF,]3'CUTOFF Trim low-quality bases from 5' and/or 3' ends of reads before adapter removal. If one value is given, only the 3' end is trimmed. If two comma-separated cutoffs are given, the 5' end is trimmed with the first cutoff, the 3' end with the second. [B]The algorithm is the same as the one used by BWA (see documentation).[/B] (default: no trimming)
Leave a comment:
-
Originally posted by Alex852013 View PostHello everybody,
This is the line i used for trimming on unix command line.
trim_galore ../name_R1_001.fastq ../name_R2_001.fastq -q 20 --paired --phred33 > trim_BAC-1_S9_R1_001.fastq
Therefore i either missinterpret something or i did something wrong.
May please someone tell me what it is?
Thanks a lot, Alex
@felix will confirm. I don't use trim_galore.
Edit: Looking at trim_galore manual -o is not strictly needed. Program will use the current directory by default.
Edit2: @felix clarified the effect of output redirect in the post below.Last edited by GenoMax; 12-03-2015, 08:59 AM.
Leave a comment:
-
Understand the quality trimming
Hello everybody,
it is the first time i try to use trim_galore for quality trimming of paired end reads.
I checked for the sequencing settings with testformat.sh from BBMap which gives me:
sanger fastq raw single-ended 150bp
I'm not sure why there single-ended comes as an output, since it was paired-end.
Before i did the quality trimming, i checked with FastQC.
The programm didn't find adapter sequences any more (i guess they were already cut by the sequencing service) and showed the following pictures
Picture before quality trimming:
This is the line i used for trimming on unix command line.
trim_galore ../name_R1_001.fastq ../name_R2_001.fastq -q 20 --paired --phred33 > trim_BAC-1_S9_R1_001.fastq
Picture after quality trimming:
I had expected, that everything with a quality below 20 would be cut. Therefore i either missinterpret something or i did something wrong.
May please someone tell me what it is?
Thanks a lot, AlexLast edited by Alex852013; 12-03-2015, 08:43 AM.
Leave a comment:
-
Oh dear, you should never post such things on the internet... but I'm glad it helped!
Leave a comment:
-
Hi Felix,
Thanks a lot for quick response. It was really helpful for me.
I just performed a short experiment. Just wanted to share with you. I randomly pooled 1M reads, and made 3 following versions:
version 1: without any trimming
version 2: trim with Trim Galore with default settings
version 3: trim with Trim Galore with default settings and trim 'GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG' with cutadapt.
Results in terms of efficiency after aligning with bismark b2:
version1: 39.7%
version2: 58.9%
version3: 58.2%
When I checked the qualities in FASTQC, even in version 3, it gave some very short (less than 10bp)overrepresented sequences as 'no hit'. So I guess it will always give some overrepresented sequences anyway but I have to understand very well what am I trimming.
One notable thing here is that the efficiency has not improved from version 2 to version 3. Most of the overrepresented sequences has the first part as 'GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG' and second part as the basic standard Illumina paired-end adapter. So those sequences are already rejected from the alignment just after the doing the version 2. That's why version 3 hasn't change that much.
btw I saw several posts containing 'felix is a great guy!'. Now its making a lot more sense. thanks again!
Originally posted by fkrueger View PostHi bluepoison,
The sequence you are seeing overrepresented is most likely some kind of adapter dimer because the sequence is lacking the leading A which it would get as a result of A-tailing the fragments. It is not normally required to trim adapter dimers specifically because they won't align to a reference genome anyway. You need to keep in mind though that the mapping efficiency will look worse because adapter primers won't align.
It would be sufficient for Cutadapt as well as Trim Galore to just specify the first couple of bp, here GATCGGAAGAGCG, in order to trim all lengths of the occurring sequence. As I mentioned above I would not bother though because these sequences won't align anywhere anywhere.
Just generally, the overrepresented sequences plot in FastQC is meant as a quick guide for you to spot sequences that are present in more than 0.1% of case but doesn't mean you should remove all of them from your sequenced library - especially not if you don't actually know what the sequence is. It might be a biological effect after all.
In short: running Trim Galore in default mode will almost certainly do the right thing. Cheers, Felix
Leave a comment:
-
Hi bluepoison,
The sequence you are seeing overrepresented is most likely some kind of adapter dimer because the sequence is lacking the leading A which it would get as a result of A-tailing the fragments. It is not normally required to trim adapter dimers specifically because they won't align to a reference genome anyway. You need to keep in mind though that the mapping efficiency will look worse because adapter primers won't align.
It would be sufficient for Cutadapt as well as Trim Galore to just specify the first couple of bp, here GATCGGAAGAGCG, in order to trim all lengths of the occurring sequence. As I mentioned above I would not bother though because these sequences won't align anywhere anywhere.
Just generally, the overrepresented sequences plot in FastQC is meant as a quick guide for you to spot sequences that are present in more than 0.1% of case but doesn't mean you should remove all of them from your sequenced library - especially not if you don't actually know what the sequence is. It might be a biological effect after all.
In short: running Trim Galore in default mode will almost certainly do the right thing. Cheers, Felix
Leave a comment:
-
Hi all,
This is my first sequencing data analysing. I am having difficulties trimming the adapters/contaminants from the reads. I have got 50bp single paired read. I checked in fastqc that there are overrepresented sequences which are part of 'Illumina Paired End Adapter 2'. But If I trim using the whole 'Illumina Paired End Adapter 2', still there will be plenty of overrepresented sequences left!
Q1) On that case what how much should I trim?
I have these overrepresented sequence,
GATCGGAAGAGCGGTTCAGCAGG
GATCGGAAGAGCGGTTCAGCAGGA
GATCGGAAGAGCGGTTCAGCAGGAA
GATCGGAAGAGCGGTTCAGCAGGAAT
GATCGGAAGAGCGGTTCAGCAGGAATG
GATCGGAAGAGCGGTTCAGCAGGAATGC
GATCGGAAGAGCGGTTCAGCAGGAATGCC
GATCGGAAGAGCGGTTCAGCAGGAATGCCG
GATCGGAAGAGCGGTTCAGCAGGAATGCCGA
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG (Illumina Paired End Adapter 2)
Also I have another sequence which all the 'no hit' contains! That sequence is 'GTTATTTTTTTGTTTTAGTTTTT'. I looked at the contaminant file and there is no match for this.
Q2)Should I trim this sequence without even actually knowing from which this sequence is coming from?
I planned to trim all the sequences from bigger to smaller using cudadapt because there is no way to trim multiple adapters at a time in trim galore. But later I will also use trim galore for quality trimming.
Q3)Is there any way to minimize these steps?
All the scenarios described above is true for all the seven samples I analysed. Also there is know way to know the actual adapters used from the dataset.
Thanks a lot!
Leave a comment:
-
I think if a single or few bases dip but then it recovers the read will actually survive. This is a sliding window model which isn't super harsh to the data.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has...-
Channel: Articles
12-02-2024, 01:49 PM -
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-02-2024, 09:29 AM
|
0 responses
140 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:29 AM
|
||
Started by seqadmin, 12-02-2024, 09:06 AM
|
0 responses
50 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:06 AM
|
||
Started by seqadmin, 12-02-2024, 08:03 AM
|
0 responses
38 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 08:03 AM
|
||
Started by seqadmin, 11-22-2024, 07:36 AM
|
0 responses
70 views
0 likes
|
Last Post
by seqadmin
11-22-2024, 07:36 AM
|
Leave a comment: