Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Examples of STAR usage in documentation?

    Dear Alex and Shawn,


    Could you please try to run 2-3 jobs (without killing your server ) pausing for 10sec between them, and send me the Log.out outputs for each job.
    Our server is slightly busy at the moment (Nicolas Nalpas si pulling all the blanket for himself ), but I'll try asap.

    It does make sense that jobs submitted simultaneously can't see the genome being loaded into shared memory (we have a looped script submitting a bunch of STAR jobs all within a few seconds from each other). For the record, we recently submitted 10 jobs in such a loop, and we went over our 256GB RAM + 4GB swap, which slowed the server down.


    I think the documentation should include some examples because the explanation is a little confusing.
    I support this idea, as we previously also had weird experiences with the outFilter(NminMatch) and (others) parameters which seem to filter on a number of consecutive matches rather than total number of matches for instance. I'd rather have detailed description (or maybe a short and a more detailed) of each option than a doubt which requires additional testing and guessing which impeges on my real project time.

    Regarding the original point, example usage would be appreciated too. Maybe users could actually participate to such an effort, as we could rapidly gather a diversity of applications along with the combination of options we succesfully used? If so, we'd need some other place than this thread to share our commands.

    Kevin

    Comment


    • #62
      Are there any plan to make STAR works for fusion genes? The existing tools are too slow for me...

      Comment


      • #63
        Originally posted by ymc View Post
        Are there any plan to make STAR works for fusion genes? The existing tools are too slow for me...
        STAR can detect chimeric alignments both "spanning" and "encompassing" chimeric junctions. However, you would need to do all the post-processing: filtering alignments, collapsing the chimeric junctions, annotating fused genes. There is some discussion about it on the STAR forum: https://groups.google.com/d/msg/rna-...U/yxj5C8LaovIJ

        Comment


        • #64
          Originally posted by alexdobin View Post
          STAR can detect chimeric alignments both "spanning" and "encompassing" chimeric junctions. However, you would need to do all the post-processing: filtering alignments, collapsing the chimeric junctions, annotating fused genes. There is some discussion about it on the STAR forum: https://groups.google.com/d/msg/rna-...U/yxj5C8LaovIJ
          Thanks for your reply. I think I can count spanning chimeric junctions from Chimeric.out.junction. How can I count "encompassing"? Which output file should I look at?

          Comment


          • #65
            Originally posted by ymc View Post
            Thanks for your reply. I think I can count spanning chimeric junctions from Chimeric.out.junction. How can I count "encompassing"? Which output file should I look at?
            The Chimeric.out.junction file contains the encompassing junctions as well. They are marked with -1 in column 7 (junction type). Of course, to assign the encompassing reads to a chimeric junction, you have to know the coordinates of the junction, or somehow cluster the inner ends of the encompassing mates.

            Comment


            • #66
              I am unable to get STAR use the SHARED MEMORY option between different instances. I followed the instructions given at
              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


              I am trying to run two instances (with delays of at least 2-3 mins), but each individual instance seems to allocate its own memory

              $top
              PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
              5291 xxx 20 0 26.6g 25g 25g R 100 20.6 17:37.64 STAR
              5299 xxx 20 0 26.6g 25g 25g R 100 20.4 13:18.87 STAR

              What else can I try?

              STAR Version: STAR_2.3.0e.Linux_x86_64

              $ uname -a
              Linux XXX 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 2010 x86_64 GNU/Linux


              $ipcs
              ------ Shared Memory Segments --------
              key shmid owner perms bytes nattch status
              0x17000006 2588672 xxx 666 28271287966 2

              ------ Semaphore Arrays --------
              key semid owner perms nsems

              ------ Message Queues --------
              key msqid owner perms used-bytes messages

              Comment


              • #67
                Can I use STAR's sam output to call SNP? If so, how?

                Comment


                • #68
                  Originally posted by ymc View Post
                  Can I use STAR's sam output to call SNP? If so, how?
                  Just search this site for "RNAseq SNP" for a plethora of examples, like this.

                  Comment


                  • #69
                    Originally posted by wildtypegoose View Post
                    I am unable to get STAR use the SHARED MEMORY option between different instances. I followed the instructions given at
                    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                    I am trying to run two instances (with delays of at least 2-3 mins), but each individual instance seems to allocate its own memory

                    $top
                    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
                    5291 xxx 20 0 26.6g 25g 25g R 100 20.6 17:37.64 STAR
                    5299 xxx 20 0 26.6g 25g 25g R 100 20.4 13:18.87 STAR

                    What else can I try?
                    It seems to me that the shared memory is working fine - since ipcs shows two instances attached to the same shared memory piece. What you see in 'top' is memory usage per process - and 25GB out of 26.6GB are shared for each process. To test it you can try to run >5 jobs at the same time - they would not be able to run without sharing memory.

                    Comment


                    • #70
                      Originally posted by alexdobin View Post
                      It seems to me that the shared memory is working fine - since ipcs shows two instances attached to the same shared memory piece. What you see in 'top' is memory usage per process - and 25GB out of 26.6GB are shared for each process. To test it you can try to run >5 jobs at the same time - they would not be able to run without sharing memory.
                      The global "used" memory (as reported by top) increases linearly if I run another STAR process, which made me think that probably the shared mem option is not working correctly. I'll try your suggestion of running >5 jobs once the server is free.

                      Thanks for your input!

                      Comment


                      • #71
                        Originally posted by wildtypegoose View Post
                        The global "used" memory (as reported by top) increases linearly if I run another STAR process, which made me think that probably the shared mem option is not working correctly. I'll try your suggestion of running >5 jobs once the server is free.

                        Thanks for your input!
                        STAR will use a 1-2 GB of memory per process for temporary storage and I/O buffers, however, the ~25GB of genome files are shared. The "used" memory reported by top includes "cached", and it's hard to determine how much physical RAM the process is actually using.

                        Comment


                        • #72
                          Hello

                          I am happy with STAR but not too happy with the MQ scores it is difficult to filter out reads based on their mapping quality score when you only have 255,3,2,1,0 (I believe 0 is one of them I forget). Does anyone know of any program which can convert these 5 or 4 values based on the CIGAR information of the read to phred scale values?

                          Thanks,
                          Nino

                          Comment


                          • #73
                            Originally posted by Nino View Post
                            Hello

                            I am happy with STAR but not too happy with the MQ scores it is difficult to filter out reads based on their mapping quality score when you only have 255,3,2,1,0 (I believe 0 is one of them I forget). Does anyone know of any program which can convert these 5 or 4 values based on the CIGAR information of the read to phred scale values?

                            Thanks,
                            Nino
                            That turns out to be a surprisingly difficult thing to do, as you often end up needing to realign everything so that you know how many second-best alignments there are and what their score is (unless one of STAR's more verbose output modes provides this).

                            Comment


                            • #74
                              Hey Devon,

                              Its turns it is not difficult since a group of individual from Case Western Reserve University, Cleveland, OH published a paper on a program they developed called LoQuM which does exactly what I wanted. I have not tried the program yet but here is the title of article if you would like to read if yourself

                              "Accurate estimation of short read mapping quality for next-generation genome sequencing"

                              Thanks,
                              Nino

                              Comment


                              • #75
                                Originally posted by alexdobin View Post
                                STAR will use a 1-2 GB of memory per process for temporary storage and I/O buffers, however, the ~25GB of genome files are shared. The "used" memory reported by top includes "cached", and it's hard to determine how much physical RAM the process is actually using.

                                Hi Alex,
                                You are right: I was able to run 7 star jobs simultaneously on our server with 128GB of memory. I used the LoadAndExit option of genomeLoad flag to first load the genome in shared memory, and then used the LoadAndRemove for all the simultaneous STAR jobs.

                                Although the processes went well on memory usage, but I noticed that server at times became irresponsible due to a lot of I/O (as shown by D state for many of the STAR processes in "top" output). Any suggestion/s to avoid this bottleneck?

                                Thanks a lot!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                31 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                33 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X