Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Jellyfish problem

    Greetings, I am trying to normalize 5 sets of Illumina paired-end 100 reads with Jellyfish on a Mac Pro (12 core/128 GB RAM); however, only 2 of them complete. The other 3 generate this error:

    tsmac:trinityrnaseq home$ ./util/normalize_by_kmer_coverage.pl --seqType fq --JM 100G --max_cov 30 --left 30l.fq --right -30r.fq --pairs_together --PARALLEL_STATS --JELLY_CPU 16 --output 30normalized
    Converting input files. (both directions in parallel)CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /Users/home/Research/Trinity/trinityrnaseq/30l.fq >> left.fa
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /Users/home/Research/Trinity/trinityrnaseq/-30r.fq >> right.fa
    CMD finished (0 seconds)
    CMD finished (264 seconds)
    Done converting input files.CMD: cat left.fa right.fa > both.fa
    CMD finished (247 seconds)
    -------------------------------------------
    ----------- Jellyfish --------------------
    -- (building a k-mer catalog from reads) --
    -------------------------------------------

    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/jellyfish/bin/jellyfish count -t 16 -m 25 -s 14702953559 *--both-strands *both.fa
    Warn: Bad character in sequence: o
    Warn: Bad character in sequence: *
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: U
    Warn: Bad character in sequence: e
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: o
    Warn: Bad character in sequence: u
    Warn: Bad character in sequence: z
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: e
    Warn: Bad character in sequence: e
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: i
    Warn: Bad character in sequence: i
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: i
    Warn: Bad character in sequence: i
    Warn: Bad character in sequence: e
    Warn: Bad character in sequence: q
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: 3
    Warn: Bad character in sequence: 0
    Warn: Bad character in sequence: .
    Warn: Bad character in sequence: f
    Warn: Bad character in sequence: q
    Warn: Bad character in sequence: *
    Warn: Bad character in sequence: f
    Warn: Bad character in sequence: i
    Warn: Bad character in sequence: l
    Warn: Bad character in sequence: e
    Warn: Bad character in sequence: *
    Warn: Bad character in sequence: f
    Warn: Bad character in sequence: o
    Warn: Bad character in sequence: u
    Warn: Bad character in sequence: !
    CMD finished (862 seconds)
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/jellyfish/bin/jellyfish dump -L 2 mer_counts_0 >> jellyfish.K25.min2.kmers.fa
    CMD finished (327 seconds)
    CMD: touch jellyfish.K25.min2.kmers.fa.success
    CMD finished (0 seconds)
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads left.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 **--DS *> left.fa.K25.stats
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads right.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 **--DS *> right.fa.K25.stats
    -reading Kmer occurences...
    -reading Kmer occurences...

    done parsing 73199622 Kmers, 73199622 added, taking 1045 seconds.

    done parsing 73199622 Kmers, 73199622 added, taking 1047 seconds.
    CMD finished (1088 seconds)
    CMD finished (5019 seconds)
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//util/nbkc_merge_left_right_stats.pl --left left.fa.K25.stats --right right.fa.K25.stats *> pairs.K25.stats
    CMD finished (0 seconds)
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//util/nbkc_normalize.pl pairs.K25.stats 30 100 > pairs.K25.stats.C30.pctSD100.accs
    CMD finished (0 seconds)
    Thread 6 terminated abnormally: Error: Couldn't open /Users/home/Research/Trinity/trinityrnaseq/-30r.fq
    Error encountered with thread.
    Error, at least one thread died at ./util/normalize_by_kmer_coverage.pl line 353.
    tsmac:trinityrnaseq home$


    The submitted files for this run are "30l.fq" and "30r.fq" and after the error message there's a file in the working directory called "-30r.fq.normalized_K25_C30_pctSD100.fq". In the output directory, the left.fa file seems fine, but the right.fa file contains "No (path)/-20r.fq file found!"

    The other two sets of files that fail are named as above but with their numbers (20 and 31) with exactly the same bad characters as above. There are also -20r.fq.normalized_K25_C30_pctSD100.fq and -31r.fq.normalized_K25_C30_pctSD100.fq files in the directory.

    Unfortunately, I'm not a bioinformaticist, just an end-user, but it seems this process should be straight forward based upon the Haas paper published earlier this year and on the Trinity web page that discusses normalization. Since 2 of the 5 worked, perhaps there are errors in the other 3 files?

    If it's relevant, I ran them in this order:
    18 (succeeded)
    20 (failed)
    29 (succeeded)
    30 (failed)
    31 (failed)

    Thanks for any insight you might be able to provide.

  • #2
    A couple of suggestions that may be more general in nature:

    1. It is not a good idea to have folder/file names start with "-". As you have figured out the unix command take options that start with "-".
    2. There may be some sort of file read permission error with one or more the files. In a terminal window can you verify with the file list command ($ ls -l) that there are read permissions for all (three "r" permissions in the long listing).
    Last edited by GenoMax; 11-10-2013, 04:55 AM.

    Comment


    • #3
      Thanks very much - the '-' is what caused the problem. I'm not sure how I put it in there for those three files, but removing it seems to fixed it. Thanks again - I'm moving forward again.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 06:35 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 02:46 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-07-2024, 06:57 AM
      0 responses
      17 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-06-2024, 07:17 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Working...
      X