Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Jellyfish problem

    Greetings, I am trying to normalize 5 sets of Illumina paired-end 100 reads with Jellyfish on a Mac Pro (12 core/128 GB RAM); however, only 2 of them complete. The other 3 generate this error:

    tsmac:trinityrnaseq home$ ./util/normalize_by_kmer_coverage.pl --seqType fq --JM 100G --max_cov 30 --left 30l.fq --right -30r.fq --pairs_together --PARALLEL_STATS --JELLY_CPU 16 --output 30normalized
    Converting input files. (both directions in parallel)CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /Users/home/Research/Trinity/trinityrnaseq/30l.fq >> left.fa
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /Users/home/Research/Trinity/trinityrnaseq/-30r.fq >> right.fa
    CMD finished (0 seconds)
    CMD finished (264 seconds)
    Done converting input files.CMD: cat left.fa right.fa > both.fa
    CMD finished (247 seconds)
    -------------------------------------------
    ----------- Jellyfish --------------------
    -- (building a k-mer catalog from reads) --
    -------------------------------------------

    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/jellyfish/bin/jellyfish count -t 16 -m 25 -s 14702953559 *--both-strands *both.fa
    Warn: Bad character in sequence: o
    Warn: Bad character in sequence: *
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: U
    Warn: Bad character in sequence: e
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: o
    Warn: Bad character in sequence: u
    Warn: Bad character in sequence: z
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: e
    Warn: Bad character in sequence: e
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: i
    Warn: Bad character in sequence: i
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: i
    Warn: Bad character in sequence: i
    Warn: Bad character in sequence: e
    Warn: Bad character in sequence: q
    Warn: Bad character in sequence: /
    Warn: Bad character in sequence: 3
    Warn: Bad character in sequence: 0
    Warn: Bad character in sequence: .
    Warn: Bad character in sequence: f
    Warn: Bad character in sequence: q
    Warn: Bad character in sequence: *
    Warn: Bad character in sequence: f
    Warn: Bad character in sequence: i
    Warn: Bad character in sequence: l
    Warn: Bad character in sequence: e
    Warn: Bad character in sequence: *
    Warn: Bad character in sequence: f
    Warn: Bad character in sequence: o
    Warn: Bad character in sequence: u
    Warn: Bad character in sequence: !
    CMD finished (862 seconds)
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//trinity-plugins/jellyfish/bin/jellyfish dump -L 2 mer_counts_0 >> jellyfish.K25.min2.kmers.fa
    CMD finished (327 seconds)
    CMD: touch jellyfish.K25.min2.kmers.fa.success
    CMD finished (0 seconds)
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads left.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 **--DS *> left.fa.K25.stats
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads right.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 **--DS *> right.fa.K25.stats
    -reading Kmer occurences...
    -reading Kmer occurences...

    done parsing 73199622 Kmers, 73199622 added, taking 1045 seconds.

    done parsing 73199622 Kmers, 73199622 added, taking 1047 seconds.
    CMD finished (1088 seconds)
    CMD finished (5019 seconds)
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//util/nbkc_merge_left_right_stats.pl --left left.fa.K25.stats --right right.fa.K25.stats *> pairs.K25.stats
    CMD finished (0 seconds)
    CMD: /Users/home/Research/Trinity/trinityrnaseq/util/..//util/nbkc_normalize.pl pairs.K25.stats 30 100 > pairs.K25.stats.C30.pctSD100.accs
    CMD finished (0 seconds)
    Thread 6 terminated abnormally: Error: Couldn't open /Users/home/Research/Trinity/trinityrnaseq/-30r.fq
    Error encountered with thread.
    Error, at least one thread died at ./util/normalize_by_kmer_coverage.pl line 353.
    tsmac:trinityrnaseq home$


    The submitted files for this run are "30l.fq" and "30r.fq" and after the error message there's a file in the working directory called "-30r.fq.normalized_K25_C30_pctSD100.fq". In the output directory, the left.fa file seems fine, but the right.fa file contains "No (path)/-20r.fq file found!"

    The other two sets of files that fail are named as above but with their numbers (20 and 31) with exactly the same bad characters as above. There are also -20r.fq.normalized_K25_C30_pctSD100.fq and -31r.fq.normalized_K25_C30_pctSD100.fq files in the directory.

    Unfortunately, I'm not a bioinformaticist, just an end-user, but it seems this process should be straight forward based upon the Haas paper published earlier this year and on the Trinity web page that discusses normalization. Since 2 of the 5 worked, perhaps there are errors in the other 3 files?

    If it's relevant, I ran them in this order:
    18 (succeeded)
    20 (failed)
    29 (succeeded)
    30 (failed)
    31 (failed)

    Thanks for any insight you might be able to provide.

  • #2
    A couple of suggestions that may be more general in nature:

    1. It is not a good idea to have folder/file names start with "-". As you have figured out the unix command take options that start with "-".
    2. There may be some sort of file read permission error with one or more the files. In a terminal window can you verify with the file list command ($ ls -l) that there are read permissions for all (three "r" permissions in the long listing).
    Last edited by GenoMax; 11-10-2013, 04:55 AM.

    Comment


    • #3
      Thanks very much - the '-' is what caused the problem. I'm not sure how I put it in there for those three files, but removing it seems to fixed it. Thanks again - I'm moving forward again.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-25-2024, 11:49 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-24-2024, 08:47 AM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      62 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X