Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Examples of STAR usage in documentation?

    Dear Alex and Shawn,


    Could you please try to run 2-3 jobs (without killing your server ) pausing for 10sec between them, and send me the Log.out outputs for each job.
    Our server is slightly busy at the moment (Nicolas Nalpas si pulling all the blanket for himself ), but I'll try asap.

    It does make sense that jobs submitted simultaneously can't see the genome being loaded into shared memory (we have a looped script submitting a bunch of STAR jobs all within a few seconds from each other). For the record, we recently submitted 10 jobs in such a loop, and we went over our 256GB RAM + 4GB swap, which slowed the server down.


    I think the documentation should include some examples because the explanation is a little confusing.
    I support this idea, as we previously also had weird experiences with the outFilter(NminMatch) and (others) parameters which seem to filter on a number of consecutive matches rather than total number of matches for instance. I'd rather have detailed description (or maybe a short and a more detailed) of each option than a doubt which requires additional testing and guessing which impeges on my real project time.

    Regarding the original point, example usage would be appreciated too. Maybe users could actually participate to such an effort, as we could rapidly gather a diversity of applications along with the combination of options we succesfully used? If so, we'd need some other place than this thread to share our commands.

    Kevin

    Comment


    • #62
      Are there any plan to make STAR works for fusion genes? The existing tools are too slow for me...

      Comment


      • #63
        Originally posted by ymc View Post
        Are there any plan to make STAR works for fusion genes? The existing tools are too slow for me...
        STAR can detect chimeric alignments both "spanning" and "encompassing" chimeric junctions. However, you would need to do all the post-processing: filtering alignments, collapsing the chimeric junctions, annotating fused genes. There is some discussion about it on the STAR forum: https://groups.google.com/d/msg/rna-...U/yxj5C8LaovIJ

        Comment


        • #64
          Originally posted by alexdobin View Post
          STAR can detect chimeric alignments both "spanning" and "encompassing" chimeric junctions. However, you would need to do all the post-processing: filtering alignments, collapsing the chimeric junctions, annotating fused genes. There is some discussion about it on the STAR forum: https://groups.google.com/d/msg/rna-...U/yxj5C8LaovIJ
          Thanks for your reply. I think I can count spanning chimeric junctions from Chimeric.out.junction. How can I count "encompassing"? Which output file should I look at?

          Comment


          • #65
            Originally posted by ymc View Post
            Thanks for your reply. I think I can count spanning chimeric junctions from Chimeric.out.junction. How can I count "encompassing"? Which output file should I look at?
            The Chimeric.out.junction file contains the encompassing junctions as well. They are marked with -1 in column 7 (junction type). Of course, to assign the encompassing reads to a chimeric junction, you have to know the coordinates of the junction, or somehow cluster the inner ends of the encompassing mates.

            Comment


            • #66
              I am unable to get STAR use the SHARED MEMORY option between different instances. I followed the instructions given at
              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


              I am trying to run two instances (with delays of at least 2-3 mins), but each individual instance seems to allocate its own memory

              $top
              PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
              5291 xxx 20 0 26.6g 25g 25g R 100 20.6 17:37.64 STAR
              5299 xxx 20 0 26.6g 25g 25g R 100 20.4 13:18.87 STAR

              What else can I try?

              STAR Version: STAR_2.3.0e.Linux_x86_64

              $ uname -a
              Linux XXX 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 2010 x86_64 GNU/Linux


              $ipcs
              ------ Shared Memory Segments --------
              key shmid owner perms bytes nattch status
              0x17000006 2588672 xxx 666 28271287966 2

              ------ Semaphore Arrays --------
              key semid owner perms nsems

              ------ Message Queues --------
              key msqid owner perms used-bytes messages

              Comment


              • #67
                Can I use STAR's sam output to call SNP? If so, how?

                Comment


                • #68
                  Originally posted by ymc View Post
                  Can I use STAR's sam output to call SNP? If so, how?
                  Just search this site for "RNAseq SNP" for a plethora of examples, like this.

                  Comment


                  • #69
                    Originally posted by wildtypegoose View Post
                    I am unable to get STAR use the SHARED MEMORY option between different instances. I followed the instructions given at
                    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                    I am trying to run two instances (with delays of at least 2-3 mins), but each individual instance seems to allocate its own memory

                    $top
                    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
                    5291 xxx 20 0 26.6g 25g 25g R 100 20.6 17:37.64 STAR
                    5299 xxx 20 0 26.6g 25g 25g R 100 20.4 13:18.87 STAR

                    What else can I try?
                    It seems to me that the shared memory is working fine - since ipcs shows two instances attached to the same shared memory piece. What you see in 'top' is memory usage per process - and 25GB out of 26.6GB are shared for each process. To test it you can try to run >5 jobs at the same time - they would not be able to run without sharing memory.

                    Comment


                    • #70
                      Originally posted by alexdobin View Post
                      It seems to me that the shared memory is working fine - since ipcs shows two instances attached to the same shared memory piece. What you see in 'top' is memory usage per process - and 25GB out of 26.6GB are shared for each process. To test it you can try to run >5 jobs at the same time - they would not be able to run without sharing memory.
                      The global "used" memory (as reported by top) increases linearly if I run another STAR process, which made me think that probably the shared mem option is not working correctly. I'll try your suggestion of running >5 jobs once the server is free.

                      Thanks for your input!

                      Comment


                      • #71
                        Originally posted by wildtypegoose View Post
                        The global "used" memory (as reported by top) increases linearly if I run another STAR process, which made me think that probably the shared mem option is not working correctly. I'll try your suggestion of running >5 jobs once the server is free.

                        Thanks for your input!
                        STAR will use a 1-2 GB of memory per process for temporary storage and I/O buffers, however, the ~25GB of genome files are shared. The "used" memory reported by top includes "cached", and it's hard to determine how much physical RAM the process is actually using.

                        Comment


                        • #72
                          Hello

                          I am happy with STAR but not too happy with the MQ scores it is difficult to filter out reads based on their mapping quality score when you only have 255,3,2,1,0 (I believe 0 is one of them I forget). Does anyone know of any program which can convert these 5 or 4 values based on the CIGAR information of the read to phred scale values?

                          Thanks,
                          Nino

                          Comment


                          • #73
                            Originally posted by Nino View Post
                            Hello

                            I am happy with STAR but not too happy with the MQ scores it is difficult to filter out reads based on their mapping quality score when you only have 255,3,2,1,0 (I believe 0 is one of them I forget). Does anyone know of any program which can convert these 5 or 4 values based on the CIGAR information of the read to phred scale values?

                            Thanks,
                            Nino
                            That turns out to be a surprisingly difficult thing to do, as you often end up needing to realign everything so that you know how many second-best alignments there are and what their score is (unless one of STAR's more verbose output modes provides this).

                            Comment


                            • #74
                              Hey Devon,

                              Its turns it is not difficult since a group of individual from Case Western Reserve University, Cleveland, OH published a paper on a program they developed called LoQuM which does exactly what I wanted. I have not tried the program yet but here is the title of article if you would like to read if yourself

                              "Accurate estimation of short read mapping quality for next-generation genome sequencing"

                              Thanks,
                              Nino

                              Comment


                              • #75
                                Originally posted by alexdobin View Post
                                STAR will use a 1-2 GB of memory per process for temporary storage and I/O buffers, however, the ~25GB of genome files are shared. The "used" memory reported by top includes "cached", and it's hard to determine how much physical RAM the process is actually using.

                                Hi Alex,
                                You are right: I was able to run 7 star jobs simultaneously on our server with 128GB of memory. I used the LoadAndExit option of genomeLoad flag to first load the genome in shared memory, and then used the LoadAndRemove for all the simultaneous STAR jobs.

                                Although the processes went well on memory usage, but I noticed that server at times became irresponsible due to a lot of I/O (as shown by D state for many of the STAR processes in "top" output). Any suggestion/s to avoid this bottleneck?

                                Thanks a lot!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X