Unconfigured Ad

**kmcarr** · 12-20-2012, 12:47 PM

Originally posted by alisrpp View Post

Thanks a lot!!!

Do you recommend me to do an individual FASTA file for each index?

No, I use a single file with all of the Illumina TruSeq sequences in it. I also don't bother designating any of them for "Palindrome" search. I've read the description of Palindrome search several times on the Trimmomatic site and frankly still don't understand it. I just do simple searches for them and it seems to work fine for me.

I have attached the file I use with Trimmomatic.

Attached Files

TruSeqForTimmomatic.fna.zip (1,002 Bytes, 216 views)

**alisrpp** · 12-20-2012, 12:59 PM

Thansk for the answers!

About the palindrome clipping, some days ago i wrote one of the creators of Trimmomatic asking about an alternative explanation to the one in the web site (i couldn't understand it either).
Here is the answer, for me was useful:

Simple clipping is just finding a contaminant sequence somewhere within a read. Conceptually, you get contaminant and read, and you slide them across each other, until you get a perfect or close enough match. So, with R being read bases, and C being contaminant, you check

1)
RRRRRRRRRRR
CCCC

2)
RRRRRRRRRRR
CCCC ->

etc.

Palindrome clipping is a bit more complex - and related to actual palindromes only in a twisted mind like mine. In this case, you 'ligate' the presumed adapter sequence to the start of each read in a pair, and try sliding them over each other.

So with F being bases from the forward read, R being bases from the reverse read, and A being either adapter (technically the two adapters are different, but lets ignore that for now).

AAAAAAFFFFFFF ->
<- RRRRRRRAAAAAA

In this case, the aligning region is much longer, since it consists of the entire read length plus part of the adapter. This gives a very high confidence that an apparent 'read-though' is a true-positive.

**kmcarr** · 12-20-2012, 01:06 PM

Originally posted by alisrpp View Post

Here is the answer, for me was useful:

Simple clipping is just finding a contaminant sequence somewhere within a read. Conceptually, you get contaminant and read, and you slide them across each other, until you get a perfect or close enough match. So, with R being read bases, and C being contaminant, you check

1)
RRRRRRRRRRR
CCCC

2)
RRRRRRRRRRR
CCCC ->

etc.

Palindrome clipping is a bit more complex - and related to actual palindromes only in a twisted mind like mine. In this case, you 'ligate' the presumed adapter sequence to the start of each read in a pair, and try sliding them over each other.

So with F being bases from the forward read, R being bases from the reverse read, and A being either adapter (technically the two adapters are different, but lets ignore that for now).

AAAAAAFFFFFFF ->
<- RRRRRRRAAAAAA

In this case, the aligning region is much longer, since it consists of the entire read length plus part of the adapter. This gives a very high confidence that an apparent 'read-though' is a true-positive.

Yeah, still clear as mud.

**claire.anderson1** · 03-22-2013, 03:05 PM

How does adapter trimming in Trimmomatic work?

I have two adapter sequences of 58 bp and 66 bp that I would like to remove from my Illumina data set (if present). Can Trimmomatic recognise partial matches to these adapter sequences? For example, if I am using 100 bp reads and a particular sequence contains 90 bp of DNA from the source organism, the remaining 10 bp at the end of the read might be from the adapter. Would Trimmomatic be able to pick this up? Or must it find a match to the whole adapter sequence?

I'm new at playing with NGS data, so any advice would be gratefully received!

**cllorens** · 03-22-2013, 03:30 PM

Maybe you can also check out cutadapt, that it is also useful for illumina data.

Google Code Archive - Long-term storage for Google Code Project Hosting.

http://code.google.com/p/cutadapt/

**tonybolger** · 03-24-2013, 05:43 AM

Originally posted by claire.anderson1 View Post

I have two adapter sequences of 58 bp and 66 bp that I would like to remove from my Illumina data set (if present). Can Trimmomatic recognise partial matches to these adapter sequences? For example, if I am using 100 bp reads and a particular sequence contains 90 bp of DNA from the source organism, the remaining 10 bp at the end of the read might be from the adapter. Would Trimmomatic be able to pick this up? Or must it find a match to the whole adapter sequence?

In the case of paired-end data with adapter 'read-though' (where the DNA fragment is less than the read length, and the end of the reads are from the 'opposite' adapter), trimmomatic can remove even a single adapter base (if you use sufficiently aggressive settings). Older versions of trimmomatic required at least 8 bp of adapter in this case, but that was probably too conservative so i reduced it. The latest versions also include the recommended adapter sequences, which have been a common stumbling point.

For other, less common, scenarios, where the adapter location/orientation isn't known in advance, or where you're using single end data, you'd typically want to be a bit more cautious, but 10bp or greater can usually be removed at a reasonable false positive rate.

Hope this helps.

**tonybolger** · 03-24-2013, 06:22 AM

Originally posted by kmcarr View Post

Yeah, still clear as mud.

Sorry that my explanation for this obviously sucks, and now that the adapter sequences are included directly in trimmomatic, there's probably not such a major need for everyone to understand it, but here goes anyway.

During adapter read-though, with paired end data (and assuming the same length of forward and reverse reads) we get pairs with:

The forward read consisting of X useful bases, followed by Y bases from the end of the reverse read adapter.
The reverse read consisting of X useful bases, followed by Y bases from the end of the forward read adapter.

The beauty is that those X bases in both the forward and reverse reads, are the same bases, though in reverse complement, and those Y bases are always specific known sequences starting immediately afterwards. So rather than fish for those Y bases in isolation (which is risky / difficult if Y is small), we can check simultaneously for 3 things:

The first X bases of both reads being reverse complements of each other.
The additional bases from the forward read match the reverse adapter.
The additional bases from the reverse read match the forward adapter.

Since all three must be found to support the 'read-though' hypothesis in a given read pair/position, the false positive rate is very low. Naturally we don't know what X is, but we can check every possible X from zero to the read length.

**leda** · 04-23-2013, 02:34 PM

What do the four columns following the read identifier in the trimlog represent? I can't find this in the documentation.

thanks!

**mastal** · 04-23-2013, 03:18 PM

Introducing the Trimmomatic

This is an extract from the trimmomatic web page:

specifying a trimlog file creates a log of all read trimmings, indicating the following details:

* the read name
* the surviving sequence length
* the location of the first surviving base, aka. the amount trimmed from the start
* the location of the last surviving base in the original read
* the amount trimmed from the end

USADELLAB.org - Trimmomatic: A flexible read trimming tool for Illumina NGS data

http://www.usadellab.org/cms/index.php?page=trimmomatic

**helios** · 05-16-2013, 01:09 AM

Make trimmomatic a binary/executable

Hi Guys,

in case you prefer to run trimmomatic as binary ./trimmomatic

you can follow these steps:

1) download and gunzip stub.sh.gz (in attachment) where trimmomatic-0.X.jar is located
2) cat stub.sh trimmomatic-0.30.jar >> trimmomatic
3) chmod +x trimmomatic
4) add trimmomatic's home to your path

ref: https://coderwall.com/p/ssuaxa

in case you need to modify java's parameters we must modify stub.sh opportunely.

Ciao.

Attached Files

stub.sh.gz (185 Bytes, 151 views)

**rmdoyle** · 06-04-2013, 07:07 AM

Hi everyone,

I've recently used Trimmomatic on some Illumina HiSeq PE fastq files. I then attempted to run the post-Trimmomatic fastq files through fastqc. My original illumina files run through fastqc just fine, but the post-trimmomatic files get stuck, which makes me think I've corrupted the files somehow while using Trimmomatic.

When I run fastqc on my post-trimmomatic fastq files, I get the following output after inputting my sequences:

Exception in thread "Thread-4" java.lang.NullPointerException
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:141)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:105)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
at java.lang.Thread.run(Unknown Source)

I also did get one error message after running trimmomatic. This error was:

Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg

My original trimmomatic code was:

TrimmomaticPE: -phred64 -trimlog trimlog SRR522907_1.fastq SRR522907_2.fastq paired_output1.fastq unpaired_output1.fastq paired_output2.fastq unpaired_output2.fastq ILLUMINACLIP:TruSeq3_PE.fa:2:30:10 LEADING:20 TRAILING:20 MINLEN:30

I'd appreciate any thoughts on where I went wrong...

**tonybolger** · 06-04-2013, 09:02 AM

Originally posted by rmdoyle View Post

I'd appreciate any thoughts on where I went wrong...

Very strange indeed, and nothing i've seen before.

I would suspect something like a lack of disk space, or something killed the trimmomatic process. It may also be a one-off glitch, so perhaps running it again, and checking if the output is still broken might help.

**rmdoyle** · 06-05-2013, 05:51 AM

Hmmm... gave it another shot and still no dice. Any thoughts on the following error/warning, tonybolger?

Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg

**tonybolger** · 06-05-2013, 07:53 AM

Originally posted by rmdoyle View Post

Hmmm... gave it another shot and still no dice. Any thoughts on the following error/warning, tonybolger?

Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg

Ah, that was a trimmomatic error. Normally a FASTQ record should have the same number of bases and quality scores, and for some reason, this read appears to have fewer quality scores, which trimmomatic considers invalid (AFAIK this is correct behaviour). At this point, trimmomatic gives up, and probably leaves a partial output file, which may cause other issues.

The question is why the record is invalid. Can you find that fastq record within the file?

Of course, trimmomatic should really log the name of the record as well, rather than just the data, but i haven't seen this happen before.

**rmdoyle** · 06-05-2013, 09:34 AM

Yup, the complete record is:

@FCB01CWABXX:1:2205:1823:145892
GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC
+FCB01CWABXX:1:2205:1823:145892
ggggggggggggfeggggggggcgggeggggggggeggg18207:146312

I suppose I could just cut this record out?

Interestingly, if I leave out the ILLUMINACLIP:TruSeqForTrimmomatic.fna:2:30:10 option, and leave my code as:

trimmomatic paired-end -phred64 -trimlog trimlog SRR522907_1.fastq SRR522907_2.fastq paired_output1b.fastq unpaired_output1b.fastq paired_output2b.fastq unpaired_output2b.fastq LEADING:20 TRAILING:20 MINLEN:30

I get files that I CAN run through fastqc without any problems (the results don't look great, but I can run the files through). Does that set off any red flags?

Topics	Statistics	Last Post
New Genomic Method Uncovers Ancient Hominin DNA by SEQadmin2 Started by SEQadmin2, 07-31-2026, 02:55 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 07-31-2026, 02:55 AM
Study Captures the First Moments of DNA Replication by SEQadmin2 Started by SEQadmin2, 07-24-2026, 12:17 PM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 07-24-2026, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors by SEQadmin2 Started by SEQadmin2, 07-23-2026, 11:41 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 07-23-2026, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News