You're the man! Thank you so much. The command line and --threads 8 really helps for running multiple samples and so much faster both in setup and run time than clicking through with the interactive mode.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hello,
I get the following error trying to run Fastqc (v 0.11.2) on some of my files:
fastqc --outdir Fastqc/ --noextract ctcf.cont.fq
Started analysis of ctcf.cont.fq
Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
at uk.ac.babraham.FastQC.Utilities.QualityCount.<init>(QualityCount.java:13)
at uk.ac.babraham.FastQC.Modules.PerTileQualityScores.processSequence(PerTileQualityScores.java:258)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:88)
at java.lang.Thread.run(Thread.java:662)
I can't figure out how to run Fastqc so that I can specify the memory (I don't really know anything about java). I've tried various things I found in the thread archives, along the lines of the command below, but get errors along the lines of "Could not find the main class"
java -Xmx500m -cp /path/to/FastQC
Comment
-
Originally posted by liz_is View PostHello,
I can't figure out how to run Fastqc so that I can specify the memory (I don't really know anything about java). I've tried various things I found in the thread archives, along the lines of the command below, but get errors along the lines of "Could not find the main class"
I guess odd things could also happen if you had some really long sequences, but they would have to be *very* long to cause problems.
Could the line endings thing be what's happening in your case?
Comment
-
I just tried unzipping a couple of the files and converting the line endings using mac2unix, and I get the same error for one of them. The other gives a different but presumably related error:
Code:fastqc --outdir Fastqc/ --noextract ctcf.chip.fq Started analysis of ctcf.chip.fq Exception in thread "Thread-2" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.toCharArray(String.java:2725)
I have just noticed that for these two files, at least at the top of the file, the records have quality scores that are all "B". I checked another file that did work, and that has more varied quality scores. This suggests to me there might be another problem with the files themselves.
Edit: Update: my colleague tried with v0.10.1 and it finished! There's a lot of poor-quality reads... So I guess I can use an older version but ideally I'd like to get this working.
I also tried with a subset of the reads - the head/tail 100,000 reads it runs fine, taking 1million it crashes ~20% of the way in. Taking 200,000 it says "Analysis complete for test.fq" but then also prints errors.
Code:Approx 95% complete for test.fq Analysis complete for test.fq Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232) at java.lang.StringCoding.encode(StringCoding.java:272) at java.lang.StringCoding.encode(StringCoding.java:284) at java.lang.String.getBytes(String.java:986) at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:144) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:163) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110) at java.lang.Thread.run(Thread.java:662)
Last edited by liz_is; 10-01-2014, 06:33 AM.
Comment
-
Originally posted by liz_is View PostI just tried unzipping a couple of the files and converting the line endings using mac2unix, and I get the same error for one of them. The other gives a different but presumably related error:
Code:fastqc --outdir Fastqc/ --noextract ctcf.chip.fq Started analysis of ctcf.chip.fq Exception in thread "Thread-2" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.toCharArray(String.java:2725)
I have just noticed that for these two files, at least at the top of the file, the records have quality scores that are all "B". I checked another file that did work, and that has more varied quality scores. This suggests to me there might be another problem with the files themselves.
Edit: Update: my colleague tried with v0.10.1 and it finished! There's a lot of poor-quality reads... So I guess I can use an older version but ideally I'd like to get this working.
I also tried with a subset of the reads - the head/tail 100,000 reads it runs fine, taking 1million it crashes ~20% of the way in. Taking 200,000 it says "Analysis complete for test.fq" but then also prints errors.
Code:Approx 95% complete for test.fq Analysis complete for test.fq Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232) at java.lang.StringCoding.encode(StringCoding.java:272) at java.lang.StringCoding.encode(StringCoding.java:284) at java.lang.String.getBytes(String.java:986) at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:144) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:163) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110) at java.lang.Thread.run(Thread.java:662)
Could you possibly put a file which triggers this somewhere I can see it? If I can have a look at the data which causes this I stand a better chance of getting to the bottom of it. If you don't have a site you can upload to then drop me a mail to [email protected] and I'll send you login details for an FTP server you can push to.
Comment
-
The data is available on ENA here: http://www.ebi.ac.uk/ena/data/view/PRJEB3073
The first couple of files (which are the CTCF chip and input) are examples of files which are giving these errors. Some of the other files in this dataset work fine though, e.g the scc2 chip.
Thanks!
Comment
-
Originally posted by liz_is View PostThe data is available on ENA here: http://www.ebi.ac.uk/ena/data/view/PRJEB3073
The first couple of files (which are the CTCF chip and input) are examples of files which are giving these errors. Some of the other files in this dataset work fine though, e.g the scc2 chip.
Thanks!
I'll have a look now to see if I can find anything obvious, but unfortunately I'm away from the office for the rest of this week so I might not get to the bottom of this until next week when I can do some proper profiling to figure out what's going wrong on this data.
Comment
-
Hi Simon,
Can u please explain FastQC tile report in more detail?
I found this page:
I am not able to understand the meaning of
"This module will issue a warning if any tile shows a mean Phred score more than 2 less than the mean for that base across all tile"
What is the meaning of "mean Phred score more than 2 less than the mean for that base across all tile "?
Kindly help me out.
Thanks
Comment
-
Originally posted by srikant_verma View PostHi Simon,
Can u please explain FastQC tile report in more detail?
I found this page:
I am not able to understand the meaning of
"This module will issue a warning if any tile shows a mean Phred score more than 2 less than the mean for that base across all tile"
What is the meaning of "mean Phred score more than 2 less than the mean for that base across all tile "?
Kindly help me out.
Thanks
The idea is that it shouldn't matter if the whole flowcell is good or bad, but all of the tiles should look roughly the same. If one is worse than the rest then this indicates that there is a specific problem which might need to be looked at.
Comment
-
Originally posted by liz_is View PostThe data is available on ENA here: http://www.ebi.ac.uk/ena/data/view/PRJEB3073
The first couple of files (which are the CTCF chip and input) are examples of files which are giving these errors. Some of the other files in this dataset work fine though, e.g the scc2 chip.
Thanks!
The problem seems to be that these files use a variant of the Illumina header format, which is close enough to the ones we've seen before that the program tries to parse it, but then the field it extracts for the tile number is wrong and it predicts an enormous number of tiles, which makes everything die!
The formats we've seen before are either:
Code:@HWI-1KL136:211:D1LGAACXX:1:1101:18518:48851 3:N:0:ATGTCA
Code:@HWUSI-EAS493_0001:2:1:1000:16900#0/1
The ids in the file you found looked like:
Code:@HWI-EAS212_1:8:1:4130:3711:0:1
The quick fix is that if you edit your limits.conf file in your fastqc installation (in the Configuration directory) you can turn off the per-tile quality module and you should be able to process these files.
Does anyone here know if this format is something which is actually generated by an Illumina sequencer, or is it something an individual or maybe the ENA have done to the file? I can add a quick fix to just abandon the module if too many tiles are predicted, but if this is a format which might be more generally about then I should try to cope with this properly.
Cheers
Simon.Last edited by simonandrews; 10-09-2014, 04:49 AM. Reason: Added code tags to remove smilies from illumina ids!
Comment
-
Thanks for the reply.
I've tried what you suggested but it doesn't help! I've tried both specifying a limits file using --limits and editing 'limits.txt' in the Configuration directory of the installed FastQC to include the lineCode:tile ignore 1
Code:Started analysis of ctcf.cont.fq Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded at uk.ac.babraham.FastQC.Modules.PerTileQualityScores.processSequence(PerTileQualityScores.java:258) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:88) at java.lang.Thread.run(Thread.java:745)
Comment
-
Originally posted by liz_is View PostThanks for the reply.
I've tried what you suggested but it doesn't help! I've tried both specifying a limits file using --limits and editing 'limits.txt' in the Configuration directory of the installed FastQC to include the lineCode:tile ignore 1
I've just put up a development snapshot at http://www.bioinformatics.babraham.a...11.3_devel.zip which contains the fix for both of these issues. You should be able to use that to process these files.
Comment
-
Kmer overrepresentation and per base sequence content in Nextera XT libraries
Hi all,
After reading around on the forums and elsewhere on the internet, it seems like seeing weird results for Kmer overrepresentation and per base sequence content after running FastQC on Nextera XT libraries is common.
The data I have here are sequencing data (MiSeq V3, 300 bp reads) of mitochondrial genomes from wheat. The Nextera XT libraries were prepared from purified organellar DNA (~450 kb genome) so the coverage is really high (~400X after trimming).
The files with the no_trim_prefix are the raw data. You can see that the "per base sequence content" looks weird for the first few bases. Also, the Kmer content is high in the first few bases. I have tried blasting these sequences and get no hits. The "Sequence Duplication Levels" are high most likely because of the high coverage of a small genome. I suspect this because another library I sequenced has only 60X coverage and the duplication levels are fine.
The files with the trim_prefix are the trimmed data. The data were quality and length trimmed (min. length 250 bp) with Trimmomatic. Unfortunately the trimming did not make a difference in the per base content or the Kmer overrepresentation.
My question is, will this matter for mapping and assembly? I plan on mapping these reads to already available mitochondrial genomes, as well as performing de novo assembly with Geneious.
Thanks in advance for any suggestions you all may have!
Comment
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-14-2024, 07:03 AM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
05-14-2024, 07:03 AM
|
||
Started by seqadmin, 05-10-2024, 06:35 AM
|
0 responses
44 views
0 likes
|
Last Post
by seqadmin
05-10-2024, 06:35 AM
|
||
Started by seqadmin, 05-09-2024, 02:46 PM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
05-09-2024, 02:46 PM
|
||
Started by seqadmin, 05-07-2024, 06:57 AM
|
0 responses
44 views
0 likes
|
Last Post
by seqadmin
05-07-2024, 06:57 AM
|
Comment