Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trinity, running really slowly or just not working?

    Hi all,

    I am attempting to use trinity for the first time on a combined dataset with ~25m reads. I had a few issues with the memory size of my VM but I hope I have managed to fix that now and trinity is currently running with max_memory 100G.

    However, I am really paranoid that it is not actually working, as the terminal is currently sitting on one of the first lines in inchworm (which happens to be the one it sat on for a day before I added an extra disk).

    I know that the inchworm is memory intensive and this dataset could take a pretty long time, but I have a feeling that nothing is happening. I have been checking the trinity_out_dir and none of the files have been updated since it started this step, including the supposed output of this step inchworm.K25.L25.fa.temp which hours later still has a size of 0.

    However when I use the top command, inchworm is using lots of the CPU and memory, but I am not sure if this is giving me false hope.

    I am very new to using linux, so I am sorry if this is a stupid question but could someone advise me if they think trinity is running but just very very slowly or if its not working and I need to figure out what is happening? I think CPU may be an issue.

    Would greatly appreciate any suggestions.

    Thanks,
    Hannah

    For reference my exact command line was;

    Trinity --seqType fq --SS_lib_type_RF --left ALL.1.fq --right ALL.2.fq --CPU 4 --max_memory 100G

  • #2
    It isn't a stupid question. Programs all too often 'play dead' yet are still doing work. ABySS is a good example -- it can take forever to generate the first file without a sign of anything happening.

    As for your problem. Inchworm usually does not take that long to run. Hours not days. Thus I suspect a problem. Several things to do:

    0) First make sure you are running the most recent version of Trinity. v.2.1.1 is what I use.

    1) You could stop the process and start over but this time capture the output like so:

    Code:
    ..trinity command line ..  2>&1 | tee stderr.stdout.txt
    This might give a clue as to what is happening via looking at the 'stderr.stdout.txt' file while the program is running. At the very least it would give you a file that you could show the rest of us.


    2) Alternatively keep the process running but do a deeper inspection into the process. Linux (which I presume you are using) has all sorts of information under the /proc/PID' directories. This may be too much of a learning experience for a newbie but it can be fun.

    a) Find the PID (process ID) of the program. My example is not Trinity but another program I happen to be running.

    To get all of the programs I am running in any of my sessions:

    ps -fu my_user_name_which_is_gcore

    I get, along with other items

    Code:
    gcore    31931 31927 99 11:13 pts/0    02:53:42  program_name
    The PID is 31931. Then I can do:

    Code:
    ls -l /proc/31931/fd
    Which gets me all of the open files like so:

    Code:
    0 -> /dev/null
    1 -> M13_GAGCA.fastq.counts
    2 -> /dev/pts/0
    3 -> yeast_strains.txt
    4 -> M13_GAGCA.fastq
    I want to see how far my program is within my input file (the .fastq and yeast_strains files) and the output file. So I can do:

    Code:
    cat /proc/31931/fdinfo/4
    Which gets me back
    Code:
    pos:    1236590592
    flags:  0100000
    Do this several times in a row (with at least a couple of minutes in between) and I can see the 'pos' change and thus I know that the program is reading in the file. If the pos is not changing then something is amiss. My other input file - the yeast one - does not show any change but this is expected for the current program since it reads in the yeast file first and then the fastq file. I can also look at the output file.

    Anyway the above may be too much for a newbie but it is one way to look into a program. There may be GUI programs that will do this for you instead of command line. I just happen to be more familiar with the command line.

    Comment


    • #3
      Are these trimmed reads ? I had terrible problems with Trinity last year since many read sets had substantial levels of Ns. Using more extensive quality trimming solved the problem.

      Another thing is, are you sure the --SS_lib_type_RF is correct ?

      In my data usually FR not RF is typical.

      The default trinity command is to leave this setting as default ( at least for version 2.0.6) .

      # *Note, a typical Trinity command might be:
      #
      # Trinity --seqType fq --max_memory 50G --left reads_1.fq --right reads_2.fq --CPU 6

      Comment


      • #4
        Thank you both so much for your responses!

        I managed to use the proc/PID directories to find out what is going on (thanks so much for your very clear instructions), it will definitely be a useful things to know for the future. But the position of the input file is ticking over nicely, so I comforted that something is happening.

        I also tried adding the text file to the command line, but for some reason this text file only contained the same information printed in the terminal, so I think the position information will be more useful for me.

        The reads I used were from SRA so I wasn't sure if they had already been trimmed, but in the releveant publication they are described as "raw reads" so I decided use trimmomatic parameter, which appears to have worked. Also thank you so much for noticing my RF, FR issue. I am using illumina reads so think you are right that FR is the correct parameter for me to use.

        So I have set trinity up again, following all your advise and it appears to be working, time will tell!!

        Thanks again for all your help/advise it is really appreciated and I have learnt lots already today!!


        Thanks,
        Hannah

        Comment


        • #5
          When dealing with a new program for the first time, for a difficult problem like assembly that could take anywhere from a minute to 1000 years (and is basically unbounded), I highly recommend you "fire and forget". Just set a long time limit - say, a week - and if it is not done by that time, or before you get bored, terminate it and contact the author, telling them it is too slow for your use. Bear in mind that some programs may give you useful results in months or years, but it's impossible to say when a program might finish.

          Actually, I recommend contacting the author for all slow programs; but you should wait at least a week first for programs that have a good reason to be slow, like assemblers.
          Last edited by Brian Bushnell; 02-04-2016, 09:08 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 11:49 AM
          0 responses
          2 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X