Quality-, adapter- and RRBS-trimming with Trim Galore!

fkrueger replied

01-14-2015, 01:59 AM
If you wanted to you could cite its URL, there is no publication as such (apart from the Cutadapt reference). Cheers, Felix
Leave a comment:
frozenlyse replied

01-13-2015, 03:58 PM
Hi Felix - I'm writing the methods sections for a few WGBS papers where I've used trim_galore, is there a paper I can cite for it?
Leave a comment:
yasmin_friedmann replied

09-12-2014, 12:52 AM
trim_galore without adaptor trimming?

Hi All,

Here is my first question ever to this forum! :-)

I have come across trim_galore when looking for a quality trimmer that would trim both paired end reads together. my fastq files are from illumina 1.9. I run the following command:

trim_galore -q 20 --fastqc --gzip --paired filename1 filename3

I get the following error message:

No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

Writing report to 'filename1_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: filename1
Trimming mode: paired-end
Trim Galore version: 0.3.7
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC'
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Running FastQC on the data once trimming has completed
Output file(s) will be GZIP compressed

Writing final adapter and quality trimmed output to filename1_trimmed.fq.gz

>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file filename1 <<<
Traceback (most recent call last):
File "/Users/yasmin/cutadapt-1.4.2/bin//cutadapt", line 9, in <module>
from cutadapt.scripts import cutadapt
File "/Users/yasmin/cutadapt-1.4.2/cutadapt/scripts/cutadapt.py", line 69, in <module>
from cutadapt.adapters import Adapter, ColorspaceAdapter, BACK, FRONT, PREFIX, ANYWHERE
File "/Users/yasmin/cutadapt-1.4.2/cutadapt/adapters.py", line 4, in <module>
from cutadapt import align, colorspace
File "/Users/yasmin/cutadapt-1.4.2/cutadapt/align.py", line 225, in <module>
from cutadapt._align import globalalign_locate, compare_prefixes
ImportError: dlopen(/Users/yasmin/cutadapt-1.4.2/cutadapt/_align.so, 2): no suitable image found. Did find:
/Users/yasmin/cutadapt-1.4.2/cutadapt/_align.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00

Cutadapt terminated with exit signal: '256'.
Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...

if anybody came across this and solved it , please let me know!

Many thanks!
Yasmin

Last edited by yasmin_friedmann; 09-12-2014, 01:16 AM. Reason: added error message
Leave a comment:
rzwu0721 replied

08-14-2014, 11:55 PM
Hi, I am using the software named CLC Genomics Workbench, and it can trim the adapter for just need several minutes, eg. CTGTCTCTTATACACATCT you have mentioned above.So I would recommend you can try to use it.

Best Wishes!
Renzhi Woo,
Guangxi Academy of Sciences
Leave a comment:
fkrueger replied

07-28-2014, 09:41 AM
Originally posted by shawpa View Post

I have used trim galore before using Bismark many times. It just occurred to me though that there might be a problem in my use of the pipeline. What I usually do is trim the reads (both adaptor and quality trimming), then align with Bismark, remove duplicate reads using the deduplicatebismark script provided, then proceed with methylation calling. However, if I am trimming for quality, I am changing the start and end coordinates of the read, which I think would affect the detection of duplicate reads. Could someone please let me know if this is correct? Is trimming for quality, going to adversely affect the detection of duplicate reads?

No, trimming should not affect the deduplication:

Single-end deduplication uses the chromosome, the start coordinate and the orientation of a read. Since you are trimming from the 3' end of a read this has no influence on the start coordinate. (for reverse reads the start coordinate is calculated by adding the read length (using the CIGAR string for gapped alignments if required)).

Paired-end deduplication uses the chromosome, the start coordinate of read 1, the end coordinate of read 2 and the orientation of the read pair (determined by read 1). Again, since you are trimming from the 3' end of both reads the relevant parameters are not affected.
Leave a comment:
shawpa replied

07-28-2014, 09:17 AM
I have used trim galore before using Bismark many times. It just occurred to me though that there might be a problem in my use of the pipeline. What I usually do is trim the reads (both adaptor and quality trimming), then align with Bismark, remove duplicate reads using the deduplicatebismark script provided, then proceed with methylation calling. However, if I am trimming for quality, I am changing the start and end coordinates of the read, which I think would affect the detection of duplicate reads. Could someone please let me know if this is correct? Is trimming for quality, going to adversely affect the detection of duplicate reads?
Leave a comment:
fkrueger replied

07-16-2014, 07:06 AM
I have just released a small fix to Trim Galore (v0.3.7) that makes paired-end trimming work again (which I had accidentally broken by introducing a small change...). The manual has now also been updated.

Please find the latest release here: https://www.bioinformatics.babraham....s/trim_galore/
Leave a comment:
fkrueger replied

07-11-2014, 12:51 PM
First of all apologies for not having released Trim Galore updates lately, I seem to have somehow always postponed and then forgotten them entirely...

A new version of Trim Galore (v0.3.6) is now available from its project page (http://www.bioinformatics.babraham.a...s/trim_galore/), which adds several features and fixes:

- Added the new options '--three_prime_clip_r1' and '--three_prime_clip_r2' to clip any number of bases from the 3' end after adapter/quality trimming has completed
- Added a check to see if Cutadapt exits fine. Else, Trim Galore will bail a well
- The option '--stringency' needs to be spelled out now since using -s was ambiguous because of '--suppress_warn'
- Added the Trim Galore version number to the summary report
- Added single-end or paired-end mode to the summary report
- In paired-end mode, the Read 1 summary report will no longer state that no sequence have been discarded due to trimming. This will be stated in the trimming report of Read 2 once the validation step has been completed

(Edit: The manual needs a little updating, too, I'll work on that...)
Leave a comment:
Kmok replied

07-09-2014, 08:54 AM
Thanks Brian
Shall try to run with BBMerge and see
Leave a comment:
Brian Bushnell replied

07-09-2014, 08:34 AM
Well... if you plot the insert size histogram, and see very sharp peaks at certain lengths, those may be some kind of non-genomic molecules. And once you know the length, you might be able to guess what they are considering all the different reagents that were used. Or you could look at reads with those specific insert sizes and see what the sequence is, to determine what they are. Once you know, you can easily filter them out (digitally). That is of course IF there are sharp peaks in the insert size histogram.

If they are non-genomic artifacts, you won't find them in the insert size histogram you would get from mapping, because they won't map. But (if you have paired reads) you can generate an insert size histogram by overlapping them with BBMerge.
Leave a comment:
Kmok replied

07-09-2014, 01:21 AM
Hi Brain

Thanks very much.
I need to ask our lab on the QC of the library. How can we guess it is dimer from the library insert distribution? Is it the a peak of same size as we seen in the later peaks?

Kin
Leave a comment:
Brian Bushnell replied

07-08-2014, 12:14 PM
I don't know about the later anomalies, but in my tests, Nextera seems to have highly irregular base frequencies for the first ~20bp (as you say, probably due to non-random binding). They are still fairly accurate and do not need to be trimmed.

It's possible that the later peaks are due to primer-dimers or other such artifacts. What is the insert-size distribution of the library?
Leave a comment:
Kmok replied

07-08-2014, 11:43 AM
New Bee on Trim Galore

I use Trim Galore to trim an exome seq data captured with Illumina Nextera. The script used is
$myTrimGalore -q 15 -a CTGTCTCTTATACACATCT --stringency 3 --length 20 -e 0.1 -o $myoutDir --fastqc_args "--outdir $myoutDir" --dont_gzip --paired $myfastq1 $myfastq1.

The Fastqc results after running Trim Galore show there are bias in the nucleotides in the first 15bp (perbaseSequence). I guess this may be related to the non-random binding of transposase. There are over-representations of Kmer also at the 5' as well as in the middle of the sequence. Can anyone help in telling me what is the cause of the Kmers ( ? adapters ?indexes)? How should these be trimmed if they are adapters or indexes?

Thanks in advance
Leave a comment:
maria.gr replied

05-30-2014, 03:40 AM
Thanks ! Found it , but it gave again error of line 471...
Actually I realised that I had to make the file executable ('chmod a+x build/cutadapt/bin/cutadapt' )...
Don't know if it's needed for everyone after downloading the cutadapt, but I say it in case sb has the same problem
So now it runs normally !
Thanks again!
Leave a comment:
fkrueger replied

05-30-2014, 01:38 AM
To supply the path to cutadapt you need to edit trim galore in a text editor and change the path as one of the first lines.
Leave a comment:

Previous 1 4 5 6 7 8 9 10 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News