Originally posted by duartemolha
View Post
I am having problems using clumpify with my fastqs and I beleive it is related to the UMI on the header of the fastq reads
Here is a read from my read1 fastq:
@VL00773:6:AAFVNLMM5:1:1101:21412:1000:CTGGTGGTT 1:N:0:ACTCTCGA+CTGTACCA
GTGGGCACTAGCATACTTCCCAAGCTTGGGGTAGGGCAATATAGGCAAGTCGATCAAGCTTGCAGCTGACTCCCTTTGGGATCTTGGGCTTAACCTCCTTGGGCTTTACGAGGGCCTCGATAGCCTTGGCACGTGCACTCATGGCCTTGGC
+
CCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCC;CCCCCCCCCCCCCCCCCCCCCCCC
if I remove the :CTGGTGGTT from the end of the header I can use clumpify
but with it there it just fails:
clumpify.sh in1=sample1_R1_001.fastq.gz in2=sample1_R2_001.fastq.gz out1=sample1_dedup_R1_001.fastq.gz out
2=sample1_dedup_R2_001.fastq.gz dedupe=t optical=t dupedist=40 spany=t t=1 -Xmx100g -Xms100g
openjdk version "1.8.0_112"
OpenJDK Runtime Environment (Zulu 8.19.0.1-linux64) (build 1.8.0_112-b16)
OpenJDK 64-Bit Server VM (Zulu 8.19.0.1-linux64) (build 25.112-b16, mixed mode)
java -ea -Xmx100g -Xms100g -cp .../bbtools/lib/current/ clump.Clumpify in1=sample1_R1_001.fastq.gz in2=sample1_R2_001.fastq.gz out1=sample1_dedup_R1_001.fastq.gz out out2=sample1_dedup_R2_001.fastq.gz out dedupe=t optical=t dupedist=40 spany=t t=1 -Xmx100g -Xms100g
Executing clump.Clumpify [in1=sample1_R1_001.fastq.gz, in2=sample1_R2_001.fastq.gz, out1=sample1_dedup_R1_001.fastq.gz, out
2=sample1_dedup_R2_001.fastq.gz, dedupe=t, optical=t, dupedist=40, spany=t, t=1, -Xmx100g, -Xms100g]
Clumpify version 37.62
Read Estimate: 21805466
Memory Estimate: 16636 MB
Memory Available: 80430 MB
Set groups to 1
Executing clump.KmerSort [in1=sample1_R1_001.fastq.gz, in2=sample1_R2_001.fastq.gz, out1=sample1_dedup_R1_001.fastq.gz, out
2=sample1_dedup_R2_001.fastq.gz, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, dedupe=t, t=1, -Xmx100g, -Xms100g]
Set threads to 1
Making comparator.
Made a comparator with k=31, seed=1, border=1, hashes=4
Starting cris 0.
Fetching reads.
Making fetch threads.
Starting threads.
Waiting for threads.
Exception in thread "Thread-3" java.lang.AssertionError: VL00773:7:AAFYLV7M5:1:1101:18648:1000:TAACCCATC 1:N:0:ACTCCATC+GATCAAGG
at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:92)
at clump.ReadKey.<init>(ReadKey.java:46)
at clump.ReadKey.<init>(ReadKey.java:33)
at clump.ReadKey.makeKey(ReadKey.java:23)
at clump.KmerComparator.hash(KmerComparator.java:73)
at clump.KmerComparator.hash(KmerComparator.java:66)
at clump.KmerSort$FetchThread.run(KmerSort.java:816)
Fetch time: 0.076 seconds.
Closing input stream.
Combining thread output.
Combine time: 0.000 seconds.
Exception in thread "main" java.lang.AssertionError: 0, 400, true
at clump.KmerSort.fetchReads(KmerSort.java:718)
at clump.KmerSort.processInner(KmerSort.java:400)
at clump.KmerSort.process(KmerSort.java:320)
at clump.KmerSort.main(KmerSort.java:51)
at clump.Clumpify.process(Clumpify.java:247)
at clump.Clumpify.main(Clumpify.java:37)
Anyone has any solution to make this work without having to loose all my UMI information?
Here is a read from my read1 fastq:
@VL00773:6:AAFVNLMM5:1:1101:21412:1000:CTGGTGGTT 1:N:0:ACTCTCGA+CTGTACCA
GTGGGCACTAGCATACTTCCCAAGCTTGGGGTAGGGCAATATAGGCAAGTCGATCAAGCTTGCAGCTGACTCCCTTTGGGATCTTGGGCTTAACCTCCTTGGGCTTTACGAGGGCCTCGATAGCCTTGGCACGTGCACTCATGGCCTTGGC
+
CCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCC;CCCCCCCCCCCCCCCCCCCCCCCC
if I remove the :CTGGTGGTT from the end of the header I can use clumpify
but with it there it just fails:
clumpify.sh in1=sample1_R1_001.fastq.gz in2=sample1_R2_001.fastq.gz out1=sample1_dedup_R1_001.fastq.gz out
2=sample1_dedup_R2_001.fastq.gz dedupe=t optical=t dupedist=40 spany=t t=1 -Xmx100g -Xms100g
openjdk version "1.8.0_112"
OpenJDK Runtime Environment (Zulu 8.19.0.1-linux64) (build 1.8.0_112-b16)
OpenJDK 64-Bit Server VM (Zulu 8.19.0.1-linux64) (build 25.112-b16, mixed mode)
java -ea -Xmx100g -Xms100g -cp .../bbtools/lib/current/ clump.Clumpify in1=sample1_R1_001.fastq.gz in2=sample1_R2_001.fastq.gz out1=sample1_dedup_R1_001.fastq.gz out out2=sample1_dedup_R2_001.fastq.gz out dedupe=t optical=t dupedist=40 spany=t t=1 -Xmx100g -Xms100g
Executing clump.Clumpify [in1=sample1_R1_001.fastq.gz, in2=sample1_R2_001.fastq.gz, out1=sample1_dedup_R1_001.fastq.gz, out
2=sample1_dedup_R2_001.fastq.gz, dedupe=t, optical=t, dupedist=40, spany=t, t=1, -Xmx100g, -Xms100g]
Clumpify version 37.62
Read Estimate: 21805466
Memory Estimate: 16636 MB
Memory Available: 80430 MB
Set groups to 1
Executing clump.KmerSort [in1=sample1_R1_001.fastq.gz, in2=sample1_R2_001.fastq.gz, out1=sample1_dedup_R1_001.fastq.gz, out
2=sample1_dedup_R2_001.fastq.gz, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, dedupe=t, t=1, -Xmx100g, -Xms100g]
Set threads to 1
Making comparator.
Made a comparator with k=31, seed=1, border=1, hashes=4
Starting cris 0.
Fetching reads.
Making fetch threads.
Starting threads.
Waiting for threads.
Exception in thread "Thread-3" java.lang.AssertionError: VL00773:7:AAFYLV7M5:1:1101:18648:1000:TAACCCATC 1:N:0:ACTCCATC+GATCAAGG
at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:92)
at clump.ReadKey.<init>(ReadKey.java:46)
at clump.ReadKey.<init>(ReadKey.java:33)
at clump.ReadKey.makeKey(ReadKey.java:23)
at clump.KmerComparator.hash(KmerComparator.java:73)
at clump.KmerComparator.hash(KmerComparator.java:66)
at clump.KmerSort$FetchThread.run(KmerSort.java:816)
Fetch time: 0.076 seconds.
Closing input stream.
Combining thread output.
Combine time: 0.000 seconds.
Exception in thread "main" java.lang.AssertionError: 0, 400, true
at clump.KmerSort.fetchReads(KmerSort.java:718)
at clump.KmerSort.processInner(KmerSort.java:400)
at clump.KmerSort.process(KmerSort.java:320)
at clump.KmerSort.main(KmerSort.java:51)
at clump.Clumpify.process(Clumpify.java:247)
at clump.Clumpify.main(Clumpify.java:37)
Anyone has any solution to make this work without having to loose all my UMI information?
Leave a comment: