Unconfigured Ad

**GenoMax** · 11-01-2014, 04:30 AM

Originally posted by rufessor View Post

I may be answering in part my question- or at least clarifying my question.
I am no longer certain that the sleeping process list actually really effectively did anything to the cache- its almost always used to capacity by linux so I guess my question is-

what the heck are those 40 + sleeping processes doing- and are they effectively actually consuming any resource whatsoever. Do they really hold ram or is it just cached...

Would be curious.

Most modern linux distros manage memory internally so one can't depend on output of htop/top alone. With a TB of RAM you should have no worries about memory consumption

See: http://www.linuxatemyram.com/

**travelk** · 11-26-2014, 05:30 AM

Hey everyone,

I've used trimmomatic to clean up my reads but when I use the cleaned files in tophat, I get an error. When I examined a bit closer I discovered that trimmomatic converted this:

@D3VDZHS1:119:H036PADXX:1:1202:12533:34018 2:N:0:GGACTCCTTAGATCGC
ATAGACAAATGCCTGCAACAACGCAGGGATCTCTTTCCCGGTAAACCAACCGTCGTCATTGAAGATATGCATGCTGGCTCGGGTATCCCATTGCTGATAC
+
@@CADDFFBBFHHJIGIJJIIJIBDDGE@ABGEHGGGIJIGF:CFFHGIJG<?8ADBDB@AC;.;>A>>@>:@ACC@@?C<BB(0>(:@(4::@(+4>A>
@D3VDZHS1:119:H036PADXX:1:1202:12611:34155 2:N:0:GGACTCCTTAGATCGC
GGTATCAACGCAGAGTACTTTTTTTTTTTCTTTTTTTTTTTTTTTTTTAAAGGAAAACCAGACAAATCATGAAGCCACATACGCTAGAGAAGCTCAATAC
+
B@@DFFDFHHGHHGIEHHIJIIJJJJJJI)BFGG):BCDDDDDDDDB#####################################################
@D3VDZHS1:119:H036PADXX:1:1202:12731:34205 2:N:0:GGACTCCTTAGATCGC
TAATAAATCCGCTACCGACGCTGACTAACATTTCGCGATCGTTCATCGCATCACCAAAGGCCGTGCAATCGCGCAACGATAAACCTAAATGTTGGGTCAG
+
@@CFFFFFHHHHHIJJIIJIIIJJJGIIJJJJJIIJJBAHG@GHEHFGF=BCEEEC?AC?AAB8888@CC??>BBDD>BBBBCDDAACDCDDC@8<?CBC

TO:

@D3VDZHS1:119:H036PADXX:1:1202:12533:34018 2:N:0:GGACTCCTTAGATCGC
ATAGACAAATGCCTGCAACAACGCAGGGATCTCTTTCCCGGTAAACCAACCGTCGTCATTGAAGATATGCATGCTGGCTCGGGTAT+
@@CADDFFBBFHHJIGIJJIIJIBDDGE@ABGEHGGGIJIGF:CFFHGIJG<?8ADBDB@AC;.;>A>>@>:@ACC@@?C<BB(0>
@D3VDZHS1:119:H036PADXX:1:1202:12731:34205 2:N:0:GGACTCCTTAGATCGC
TAATAAATCCGCTACCGACGCTGACTAACATTTCGCGATCGTTCATCGCATCACCAAAGGCCGTGCAATCGCGCAACGATAAACCTAAATGTTGGGTCAG
+
@@CFFFFFHHHHHIJJIIJIIIJJJGIIJJJJJIIJJBAHG@GHEHFGF=BCEEEC?AC?AAB8888@CC??>BBDD>BBBBCDDAACDCDDC@8<?CBC

Obviously trimmomatic didn't put + on the next line and now tophat can't read the line properly. This has happened in multiple files. Does anyone know if a) this is normal for trimmomatic and I need to fix this manually or if b) I did something wrong to cause it?

My input code:

Code:

java -jar /path/to/Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads 8 -phred33 -trimlog Sample1trimlog sample1_R1.fastq sample1_R2.fastq sample1_R1_TP.fastq sample1_R1_TU.fastq sample1_R2_TP.fastq sample1_R2_TU.fastq ILLUMINACLIP:/path/to/Trimmomatic-0.32/adapters/adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Thanks for your help!

**westerman** · 11-26-2014, 07:18 AM

Originally posted by travelk View Post

Obviously trimmomatic didn't put + on the next line and now tophat can't read the line properly. This has happened in multiple files. Does anyone know if a) this is normal for trimmomatic and I need to fix this manually or if b) I did something wrong to cause it?

a) No. Not normal.

b) Probably. But your command line looks ok and nothing obvious is popping up.

**tonybolger** · 11-26-2014, 07:56 AM

Originally posted by travelk View Post

Obviously trimmomatic didn't put + on the next line and now tophat can't read the line properly. This has happened in multiple files. Does anyone know if a) this is normal for trimmomatic and I need to fix this manually or if b) I did something wrong to cause it?

It is certainly not normal - it's one of the strangest trimmomatic issue i have heard of. None the less, it should be possible to track down. Some questions:

1) Does this happen consistently on the same lines of the same files if they are run more than once?
1.1) If so, can you isolate and send me a short example where it happens?

2) What OS are you using?

Thanks,

Tony.

**travelk** · 11-27-2014, 05:55 AM

Ok, I re-ran the files using the exact same script on the exact same files as suggested to check if it happened consistently on the same line... but now there is no problem with the new output files and tophat runs them just fine. So, I'm not sure what I did the first time around to have the files corrupt like that.

Thank you nevertheless for taking the time to help me!

**drdna** · 12-23-2014, 09:09 PM

Trimmomatic is not working correctly in paired end mode:

Read names in output files are not in the correct order. Correct phase was lost at read #27 and there are additional phase changes thereafter. Command line was as follows:

java -jar /opt/Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads -phred33 -trimlog Trim_Lesion6.txt 6_S1_L001_R1_001_clip2.fastq 6_S1_L001_R2_001_clip2.fastq 6_S1_L001_R1_001_paired.fastq 6_S1_L001_R1_001_unpaired.fastq 6_S1_L001_R2_001_paired.fastq 6_S1_L001_R2_001_unpaired.fastq MINLEN:40

**mastal** · 12-24-2014, 04:24 AM

The command seems OK, except that you haven't specified the number of threads.

Does trimmomatic give any error messages?

Is there something wrong with the format of read 27 in one of your files
that causes it to be read incorrectly?

Do your 2 input files have the same number of reads, in the same order?

What does the trimmomatic log file indicate is happening when the output files get out of phase?

**drdna** · 12-24-2014, 07:59 AM

Originally posted by mastal View Post

Do your 2 input files have the same number of reads, in the same order?

This was the problem. Interesting, there was, to my knowledge, no stipulation in the original publication, or in the online manual, that the program requires input files with perfectly paired reads. Maybe I'm stupid but I would have thought that a program that is designed to take raw reads, quality trim them and sort them in to paired and unpaired datasets would realize that raw data coming off a sequencing machine often has large numbers of unpaired reads to start off with. As it stands, it appears I have to run one script to cull unpaired mates and then run Trimmomatic. How inefficient is that?

**westerman** · 12-24-2014, 08:07 AM

Originally posted by drdna View Post

Maybe I'm stupid but I would have thought that a program that is designed to take raw reads, quality trim them and sort them in to paired and unpaired datasets would realize that raw data coming off a sequencing machine often has large numbers of unpaired reads to start off with. As it stands, it appears I have to run one script to cull unpaired mates and then run Trimmomatic. How inefficient is that?

Inefficient indeed. But I want to know what type of machine you have that has large numbers of unpaired reads? My miSeqs and hiSeqs always pair reads -- assuming that I tell them that the project is paired.

**drdna** · 12-24-2014, 08:15 AM

We have a MiSeq which gives us scads of high quality data but almost never gives us perfectly paired reads. I'll have to check with our tech to see if she sets any kind of paired data flag. Where would that be?

**westerman** · 12-24-2014, 08:22 AM

Settings would be in the sample sheet.

Are you getting the reads directly from the sequencer or via BaseSpace? The latter may be doing some sort of trimming for you. Because we have a hiSeq and pre-existing pipelines we do not use BaseSpace but rather just grab the raw reads. Thus I am not familiar with BaseSpace but I do know that it can do a lot of useful processing.

**GenoMax** · 12-24-2014, 08:22 AM

Only way I would see that happen is you get consistently bad quality reads on one end that are removed by MiSeq reporter/BaseSpace. Perhaps you should ask the tech to turn off on-instrument analysis (adapter trimming etc) and you can do that offline.

**drdna** · 12-24-2014, 08:27 AM

That's probably it - we use BaseSpace. My guess is that BaseSpace is filtering out pairs with poor quality. I'll have to look into that.

**drdna** · 12-24-2014, 08:30 AM

Thanks for the insights. BaseSpace does a good job of adaptor trimming, demultiplexing. It's probably more efficient to just run the reads through a script to pair them up properly before downstream processing.

**GenoMax** · 12-24-2014, 08:40 AM

repair.sh from BBTools will do that easily: http://seqanswers.com/forums/showpos...8&postcount=61

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News