Greetings, I am trying to normalize 5 sets of Illumina paired-end 100 reads with Jellyfish on a Mac Pro (12 core/128 GB RAM); however, only 2 of them complete. The other 3 generate this error:
tsmac:trinityrnaseq home$ ./util/normalize_by_kmer_coverage.pl --seqType fq --JM 100G --max_cov 30 --left 30l.fq --right -30r.fq --pairs_together --PARALLEL_STATS --JELLY_CPU 16 --output 30normalized
Converting input files. (both directions in parallel)CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /Users/home/Research/Trinity/trinityrnaseq/30l.fq >> left.fa
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /Users/home/Research/Trinity/trinityrnaseq/-30r.fq >> right.fa
CMD finished (0 seconds)
CMD finished (264 seconds)
Done converting input files.CMD: cat left.fa right.fa > both.fa
CMD finished (247 seconds)
-------------------------------------------
----------- Jellyfish --------------------
-- (building a k-mer catalog from reads) --
-------------------------------------------
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/jellyfish/bin/jellyfish count -t 16 -m 25 -s 14702953559 *--both-strands *both.fa
Warn: Bad character in sequence: o
Warn: Bad character in sequence: *
Warn: Bad character in sequence: /
Warn: Bad character in sequence: U
Warn: Bad character in sequence: e
Warn: Bad character in sequence: /
Warn: Bad character in sequence: o
Warn: Bad character in sequence: u
Warn: Bad character in sequence: z
Warn: Bad character in sequence: /
Warn: Bad character in sequence: e
Warn: Bad character in sequence: e
Warn: Bad character in sequence: /
Warn: Bad character in sequence: i
Warn: Bad character in sequence: i
Warn: Bad character in sequence: /
Warn: Bad character in sequence: i
Warn: Bad character in sequence: i
Warn: Bad character in sequence: e
Warn: Bad character in sequence: q
Warn: Bad character in sequence: /
Warn: Bad character in sequence: 3
Warn: Bad character in sequence: 0
Warn: Bad character in sequence: .
Warn: Bad character in sequence: f
Warn: Bad character in sequence: q
Warn: Bad character in sequence: *
Warn: Bad character in sequence: f
Warn: Bad character in sequence: i
Warn: Bad character in sequence: l
Warn: Bad character in sequence: e
Warn: Bad character in sequence: *
Warn: Bad character in sequence: f
Warn: Bad character in sequence: o
Warn: Bad character in sequence: u
Warn: Bad character in sequence: !
CMD finished (862 seconds)
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/jellyfish/bin/jellyfish dump -L 2 mer_counts_0 >> jellyfish.K25.min2.kmers.fa
CMD finished (327 seconds)
CMD: touch jellyfish.K25.min2.kmers.fa.success
CMD finished (0 seconds)
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads left.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 **--DS *> left.fa.K25.stats
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads right.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 **--DS *> right.fa.K25.stats
-reading Kmer occurences...
-reading Kmer occurences...
done parsing 73199622 Kmers, 73199622 added, taking 1045 seconds.
done parsing 73199622 Kmers, 73199622 added, taking 1047 seconds.
CMD finished (1088 seconds)
CMD finished (5019 seconds)
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//util/nbkc_merge_left_right_stats.pl --left left.fa.K25.stats --right right.fa.K25.stats *> pairs.K25.stats
CMD finished (0 seconds)
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//util/nbkc_normalize.pl pairs.K25.stats 30 100 > pairs.K25.stats.C30.pctSD100.accs
CMD finished (0 seconds)
Thread 6 terminated abnormally: Error: Couldn't open /Users/home/Research/Trinity/trinityrnaseq/-30r.fq
Error encountered with thread.
Error, at least one thread died at ./util/normalize_by_kmer_coverage.pl line 353.
tsmac:trinityrnaseq home$
The submitted files for this run are "30l.fq" and "30r.fq" and after the error message there's a file in the working directory called "-30r.fq.normalized_K25_C30_pctSD100.fq". In the output directory, the left.fa file seems fine, but the right.fa file contains "No (path)/-20r.fq file found!"
The other two sets of files that fail are named as above but with their numbers (20 and 31) with exactly the same bad characters as above. There are also -20r.fq.normalized_K25_C30_pctSD100.fq and -31r.fq.normalized_K25_C30_pctSD100.fq files in the directory.
Unfortunately, I'm not a bioinformaticist, just an end-user, but it seems this process should be straight forward based upon the Haas paper published earlier this year and on the Trinity web page that discusses normalization. Since 2 of the 5 worked, perhaps there are errors in the other 3 files?
If it's relevant, I ran them in this order:
18 (succeeded)
20 (failed)
29 (succeeded)
30 (failed)
31 (failed)
Thanks for any insight you might be able to provide.
tsmac:trinityrnaseq home$ ./util/normalize_by_kmer_coverage.pl --seqType fq --JM 100G --max_cov 30 --left 30l.fq --right -30r.fq --pairs_together --PARALLEL_STATS --JELLY_CPU 16 --output 30normalized
Converting input files. (both directions in parallel)CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /Users/home/Research/Trinity/trinityrnaseq/30l.fq >> left.fa
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /Users/home/Research/Trinity/trinityrnaseq/-30r.fq >> right.fa
CMD finished (0 seconds)
CMD finished (264 seconds)
Done converting input files.CMD: cat left.fa right.fa > both.fa
CMD finished (247 seconds)
-------------------------------------------
----------- Jellyfish --------------------
-- (building a k-mer catalog from reads) --
-------------------------------------------
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/jellyfish/bin/jellyfish count -t 16 -m 25 -s 14702953559 *--both-strands *both.fa
Warn: Bad character in sequence: o
Warn: Bad character in sequence: *
Warn: Bad character in sequence: /
Warn: Bad character in sequence: U
Warn: Bad character in sequence: e
Warn: Bad character in sequence: /
Warn: Bad character in sequence: o
Warn: Bad character in sequence: u
Warn: Bad character in sequence: z
Warn: Bad character in sequence: /
Warn: Bad character in sequence: e
Warn: Bad character in sequence: e
Warn: Bad character in sequence: /
Warn: Bad character in sequence: i
Warn: Bad character in sequence: i
Warn: Bad character in sequence: /
Warn: Bad character in sequence: i
Warn: Bad character in sequence: i
Warn: Bad character in sequence: e
Warn: Bad character in sequence: q
Warn: Bad character in sequence: /
Warn: Bad character in sequence: 3
Warn: Bad character in sequence: 0
Warn: Bad character in sequence: .
Warn: Bad character in sequence: f
Warn: Bad character in sequence: q
Warn: Bad character in sequence: *
Warn: Bad character in sequence: f
Warn: Bad character in sequence: i
Warn: Bad character in sequence: l
Warn: Bad character in sequence: e
Warn: Bad character in sequence: *
Warn: Bad character in sequence: f
Warn: Bad character in sequence: o
Warn: Bad character in sequence: u
Warn: Bad character in sequence: !
CMD finished (862 seconds)
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/jellyfish/bin/jellyfish dump -L 2 mer_counts_0 >> jellyfish.K25.min2.kmers.fa
CMD finished (327 seconds)
CMD: touch jellyfish.K25.min2.kmers.fa.success
CMD finished (0 seconds)
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads left.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 **--DS *> left.fa.K25.stats
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads right.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 **--DS *> right.fa.K25.stats
-reading Kmer occurences...
-reading Kmer occurences...
done parsing 73199622 Kmers, 73199622 added, taking 1045 seconds.
done parsing 73199622 Kmers, 73199622 added, taking 1047 seconds.
CMD finished (1088 seconds)
CMD finished (5019 seconds)
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//util/nbkc_merge_left_right_stats.pl --left left.fa.K25.stats --right right.fa.K25.stats *> pairs.K25.stats
CMD finished (0 seconds)
CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//util/nbkc_normalize.pl pairs.K25.stats 30 100 > pairs.K25.stats.C30.pctSD100.accs
CMD finished (0 seconds)
Thread 6 terminated abnormally: Error: Couldn't open /Users/home/Research/Trinity/trinityrnaseq/-30r.fq
Error encountered with thread.
Error, at least one thread died at ./util/normalize_by_kmer_coverage.pl line 353.
tsmac:trinityrnaseq home$
The submitted files for this run are "30l.fq" and "30r.fq" and after the error message there's a file in the working directory called "-30r.fq.normalized_K25_C30_pctSD100.fq". In the output directory, the left.fa file seems fine, but the right.fa file contains "No (path)/-20r.fq file found!"
The other two sets of files that fail are named as above but with their numbers (20 and 31) with exactly the same bad characters as above. There are also -20r.fq.normalized_K25_C30_pctSD100.fq and -31r.fq.normalized_K25_C30_pctSD100.fq files in the directory.
Unfortunately, I'm not a bioinformaticist, just an end-user, but it seems this process should be straight forward based upon the Haas paper published earlier this year and on the Trinity web page that discusses normalization. Since 2 of the 5 worked, perhaps there are errors in the other 3 files?
If it's relevant, I ran them in this order:
18 (succeeded)
20 (failed)
29 (succeeded)
30 (failed)
31 (failed)
Thanks for any insight you might be able to provide.
Comment