Quality-, adapter- and RRBS-trimming with Trim Galore!

padmoo replied

06-09-2015, 01:41 AM
Hi everyone,

not sure if this is the right thread but I'll give it a go:

I'm trying to get rid of adapters in my sequences. There aren't many with adapters but I'd like to save what I can.

When I run my command I get the following error:

==================================================================
Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
/gpfs/scratch/cbh12wsu/trim/trim_galore -phred33 --illumina --paired -phred33 --path_to_cutadapt /gpfs/scratch/cbh12wsu/trim/cutadapt.py -o adaptrim.fastq -length 50 1_L.fastq 2_R.fastq
------------------------------------------------------------

Exited with exit code 2.

The output (if any) follows:

Path to Cutadapt set as: '/gpfs/scratch/cbh12wsu/trim/cutadapt.py' (user defined)
File "/gpfs/scratch/cbh12wsu/trim/cutadapt.py", line 99
print(rest, match.read.name, file=self.file)
^
SyntaxError: invalid syntax
Cutadapt seems to be working fine (tested command '/gpfs/scratch/cbh12wsu/trim/cutadapt.py --version')
File "/gpfs/scratch/cbh12wsu/trim/cutadapt.py", line 99
print(rest, match.read.name, file=self.file)
^
SyntaxError: invalid syntax
Failed to write to file '1_L.fastq_trimming_report.txt': No such file or directory
==================================================================

Does indicate a problem with the cutadapt.py?

Thanks!
Leave a comment:
create.share replied

05-30-2015, 12:15 AM
For those with weaker heart (those that cannot use complex scripts in Linux and need a graphic interface) here is another (free) program for trimming qualities:

An efficient SFF/FastQ viewer and editor (GUI)

The gray/green curves in the second graphic shows the average quality before and after trimming the low quality ends.

Only works on Fasta, FastQ, SFF for the moment.
Sorry.

Last edited by create.share; 05-30-2015, 12:19 AM.
Leave a comment:
LindsayR replied

05-29-2015, 06:35 AM
Thanks for helping Felix. A simple typo on my end unfortunately. Problem solved!
Leave a comment:
fkrueger replied

05-29-2015, 05:02 AM
Hi Lindsay,

This is a little odd... The way Trim Galore handles paired-end files (when you specify --paired) is to run single-end trimming on read 1 and read 2 separately, and then run a 'validation' step that checks the length of each read in a sequence pair to decide whether or not to keep or boot the entire read pair. Since reads are not discarded in the (single-end) trimming step even if they are trimmed to a length of 0bp they should then either be kept or discarded as the entire pair. Is there a chance that the FastQ files you fed in did not match up or were truncated?

So in a nutshell, the --paired option is not supposed to be fed through to Cutadapt (which only started supporting paired-end trimming recently), but is handled internally. If you keep having these problems could you please send me a few reads of your FastQ files and I can try to reproduce these errors on my side. Thanks, Felix
Leave a comment:
LindsayR replied

05-29-2015, 04:48 AM
TrimGalore paired end issue

I’m trying to run TrimGalore!v0.4.0 and I have cut adapt 1.8.1 installed using Python 2.7.6. I think that TrimGalore is not feeding in the paired option to cut adapt. I end up with an unequal number of reads in the read1 vs read 2 file and bismark will not align. This is the Summary of trimming: (I bolded the part I think is wrong in cut adapt) Any ideas? Thanks so much! -Lindsay

SUMMARISING RUN PARAMETERS
==========================
Input filename: path/read1_R1_010.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.4.0
Cutadapt version: 1.8.1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file will be GZIP compressed

This is cutadapt 1.8.1 with Python 2.7.6
Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC no –p argument is specified here…… /path_R1_010.fastq.gz
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
Finished in 161.90 s (40 us/read; 1.48 M reads/minute).
Leave a comment:
KJohnson replied

05-07-2015, 10:29 AM
The data is Illumina 1.9. Yes I can email you the report.

Thank you,
Kevin
Leave a comment:
fkrueger replied

05-07-2015, 10:26 AM
Is the data Illumina 1.9 encoded (phred33) or the old 1.5 encoding by any chance? Would you mind attaching or sending me the FastQC report via email? Cheers, Felix
Leave a comment:
KJohnson replied

05-07-2015, 10:22 AM
Hi all,

I am running Trim Galore on illumina pair-end data and am trying to figure out what is going wrong. I have set quality score level to phred score of 30 but when trimming is complete and I view the FastQC file the box-plot whiskers under the Per base sequence quality tab go down to a phred score of 13. Is there something I am doing wrong?

Thanks.

code:
trim_galore -fastqc -q 30 -paired -retain_unpaired Blue_trimmed_1.fq Blue_trimmed_2.fq
Leave a comment:
fkrueger replied

05-07-2015, 05:12 AM
It might help if you could send me the FastQC html report to take a look (email).

In more general terms, it is very well possible that you've got fragments of TruSeq adapters, or especially PCR primers, left in the library after trimming that FastQC warns you about. Quite often these are adapter or primer dimers that don't have the A (from A-tailing) at the start of the sequence. These sequences are not removed from the file, and they generally don't have to be if you are going to align the samples as the next step because they simply won't align.

The adapter contamination you do care about is the read-through contamination at the 3' end which start in a genomic sequence of interest which then continues into adapter contamination. It would appear that trimming got rid of these efficiently.
Leave a comment:
rhinoceros replied

05-07-2015, 04:26 AM
Thats great.

Do you have any hints how to trim ScriptSeq prepped samples? My PE reads clearly had Truseq adaptors, but after trim_galore fastqc tells me that my R1 reads still contain a considerable amount of "TruSeq Adapter, Index 12 (100% over 58bp)" and some other "no hit" stuff whereas my R2 reads apparently contain lots of "Illumina Single End PCR Primer 1 (100% over 52bp)" and "no hit" stuff. Both files have massive k-mer bias in 5'-ends even after trim_galore. The first 13 bp of TruSeq adapters and ScriptSeq adapters are identical so I'm somewhat baffled how these adapters are present in some R1 even after trimming. I presume the R2 stuff is related to 3'-terminal tagging and very short RNA molecules so as a solution I could include the complete Illumina Paired End PCR Primer 1 seq utilizing the -a2 flag.

Last edited by rhinoceros; 05-07-2015, 04:58 AM.
Leave a comment:

fkrueger replied

05-07-2015, 12:34 AM

Originally posted by rhinoceros View Post

There's a small problem with the zip file.

Code:

unzip trim_galore_v0.4.0.zip 
Archive:  trim_galore_v0.4.0.zip
  inflating: Trim_Galore_User_Guide.pdf  
  inflating: trim_galore             
  inflating: RRBS_Guide.pdf          
warning:  skipped "../" path component(s) in ../Bismark/license.txt
  inflating: Bismark/license.txt

Ups... but it is only the license file. I have replaced the zip file now, Cheers, Felix

Leave a comment:

fkrueger replied

05-06-2015, 12:55 AM
Trim Galore v0.4.0 released: Adapter auto-detection

We have just made a new Trim Galore release to version 0.4.0. This adds a few sanity checks and makes the specification of standard adapters more straight forward. In fact we changed the default mode so that Trim Galore attempts to auto-detect which type of adapter has been used in library construction, which results in a 'one command to trim them all' for standard ClusterFlow processing of a highly diverse full Illumina flowcell.

Here are the changes in more detail:

• Unless instructed otherwise Trim Galore will now attempt to auto-detect the adapter which had been used for library construction (choosing from the Illumina universal, Nextera transposase and Illumina small RNA adapters). For this the first 1 million sequences of the first file specified are analysed. If no adapter can be detected within the first 1 million sequences Trim Galore defaults to --illumina. The auto-detection behaviour can be overruled by specifying an adapter sequence or using --illumina, --nextera or --small_rna

• Added the new options '--illumina', '--nextera' and '--small_rna' to use different default sequences for trimming (instead of -a):
Universal Illumina: AGATCGGAAGAGC (TruSeq or Sanger iTag)
Small RNA: ATGGAATTCTCG
Nextera: CTGTCTCTTATA

• Added a sanity check to the start of a Trim Galore run to see if the (first) FastQ file in question does contain information at all or appears to be in SOLiD colorspace format, and bails if either is true. Trim Galore does not support colorspace trimming, but users wishing to do this are kindly referred to using Cutadapt as a standalone program

• Added a new option '--path_to_cutadapt /path/to/cudapt'. Unless this option is specified it is assumed that Cutadapt is in the PATH (equivalent to '--path_to_cutadapt cutadapt'). Also added a test to see if Cutadapt seems to be working before the actual trimming is launched

• Fixed an open command for a certain type of RRBS processing (was open() instead of open3())

Trim Galore is available from the Babraham Bioinformatics projects site.
Leave a comment:
rhinoceros replied

04-21-2015, 03:28 AM
Originally posted by gwilkie View Post

I have also found that when using Nextera sample prep, you should trim at CTGTCTCTTATACACATCT instead of the usual AGATCGGAAGAGC.

Best wishes, Gavin

Is this still the case in 2015? I mean, is "CTGTCTCTTATACACATCT" universal to Nextera prepped samples?
Leave a comment:
MaximeG replied

03-30-2015, 04:23 AM
Hi all,
I have a question about the option non directional of trim galore.
After a lot of reflexion, we have determined that we have done a RRBS library in a directional paired end manner (R1 begin by C/TGG and R2 by CAA). But the option nd permits to cut the CA from R2.
It's a better strategy to let this CA for bismark and then to cut them ?
We have run the two: With nd: 36,6% uniquely aligned pairs + 55.6% Multiple pairs
Without nd: 37.8% uniquely aligned pairs + 55.2% Multiple pairs
Thank you for your future response
Maxime
Leave a comment:

Previous 1 3 4 5 6 7 8 9 10 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 58 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News