Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sindrle
    Senior Member
    • Aug 2013
    • 266

    #91
    Quick question, on a fastq file Tophat2 spends about 3-4h, STAR runs in 5 min - 1.5h.

    How come only 5 min? Thats almost unbelievable.. It says success, but Im very sceptical..

    Comment

    • dpryan
      Devon Ryan
      • Jul 2011
      • 3478

      #92
      That sounds about right. STAR is vastly faster than tophat, but requires much more memory.

      Comment

      • sindrle
        Senior Member
        • Aug 2013
        • 266

        #93
        Holy moly.

        Comment

        • apredeus
          Senior Member
          • Jul 2012
          • 151

          #94
          yeah file Log.out has actual mapping speed

          so check it to see, it's usually somewhere around hundreds of millions (reads) per hour on 8 CPUs

          if it's substantially lower, something is going wrong (i.e. if you use UCSC mRNAs as a reference there is a lot of junctions that are of "-1" length for whatever reason, so it slows STAR down a lot)

          Comment

          • dxkorall
            Junior Member
            • Jun 2014
            • 4

            #95
            Alex, is it possible to integrate your instrument to the Tool Panel of Main Public Galaxy Server (usegalaxy.org) as a working tool?

            Comment

            • alexdobin
              Senior Member
              • Feb 2009
              • 161

              #96
              Originally posted by dxkorall View Post
              Alex, is it possible to integrate your instrument to the Tool Panel of Main Public Galaxy Server (usegalaxy.org) as a working tool?
              This is certainly doable, however, such decisions are in the hands of the Galaxy Team - please make a request of Anton Nekrutenko & Co.

              Cheers
              Alex

              Comment

              • kjusto
                Junior Member
                • Apr 2014
                • 5

                #97
                Hi guys,
                1.Any package available for STAR for easy installing
                2. Architecture: i686 CPU op-mode(s): 32-bit, 64-bit CPU(s): 2 is it compatible with any of STAR app either from binary or source
                Thanks

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #98
                  Pre-compiled linux binary is available here: https://code.google.com/p/rna-star/d...4.tgz&can=2&q=

                  Comment

                  • emmanouela
                    Junior Member
                    • Jul 2014
                    • 4

                    #99
                    Reads with very long "deletions"

                    Hello,

                    I used STAR to map our rna-seq single-end reads which are 50bp long (both with and without a gtf file). However, I get quite a few reads which supposedly have these huge deletions/gaps of hundreds of kb, which look like mapping issues.
                    Some time the "deletions" even span entire genes within them.

                    Two examples are:
                    HISEQ2000-02:509:C4C7EACXX:4:1306:5718:46438 0 chr10 94874729 255 44M196485N7M * 0 0 TAACGGAACTCCTACTAGATACATCAGGATGCAAACTATAAAAGGGTCAGT @@@DDD?D@DDHB>?B<B<<CAC?BEDG?9*)1CF;<??BF*??B)?90?? NH:i:1 HI:i:1 AS:i:45 nM:i:1 jM:B:c,1 jI:B:i,94874773,95071257

                    HISEQ2000-02:509:C4C7EACXX:4:2303:5831:46194 0 chr10 95008269 255 23M125529N28M * 0 0 CAATAAAAACGTATACCGATTGGCAAAAAAAAAGAAAAAAAAAAAAAAAAA CBCFFFFFHHFHHJJJJHIIJHEHJJJJJJJJJ-5@GIJHFDDDDDDDDDD NH:i:1 HI:i:1 AS:i:39 nM:i:0 jM:B:c,5 jI:B:i,95008292,95133820


                    Has anyone else seen these? Is there any way to filter them out???

                    Comment

                    • Brian Bushnell
                      Super Moderator
                      • Jan 2014
                      • 2709

                      Not sure about the second one, but the first one with a 200kbp deletion anchored by a 7bp of read sequence looks like a probable mapping error to me, considering that a 7bp exact match would be expected purely by chance within about 16kbp of any random location. However, if that 200kbp corresponds exactly to a known intron in the GTF file, and only occurs when using the GTF file, it's probably OK. Does it?

                      Comment

                      • kjusto
                        Junior Member
                        • Apr 2014
                        • 5

                        Originally posted by GenoMax View Post
                        Pre-compiled linux binary is available here: https://code.google.com/p/rna-star/d...4.tgz&can=2&q=
                        Thanks for the link... got use proxies to get it though....google issues here....my question was about 32 bit linux OS,any binaries for it.

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          Don't think Alex provides 32-bit binaries. If you have a large genome (~ human) 32-bit may not work.

                          Build from source if you must have 32-bit: https://code.google.com/p/rna-star/d...e.tgz&can=2&q=

                          Comment

                          • emmanouela
                            Junior Member
                            • Jul 2014
                            • 4

                            Originally posted by Brian Bushnell View Post
                            However, if that 200kbp corresponds exactly to a known intron in the GTF file, and only occurs when using the GTF file, it's probably OK. Does it?
                            Hi Brian,
                            No, I didn't use a gtf to do the mapping in this case. Plus the mapped read corresponds to a known intron (of a short gene) on one side but a random intergenic region way after the end of the gene of which it starts in ( at least according to UCSC) on the other side. And the 200kb overlaps with 4 other known genes too. So to my eyes thats definitely a mapping error too. The question now is how to filter those out (because they are quite a few of them).

                            Comment

                            • kjusto
                              Junior Member
                              • Apr 2014
                              • 5

                              Hi,
                              Trying to generate genome from Rice reference and I get the following error,have tried several STAR patches available:

                              biostat1@biostat[STAR_2.3.1z10] ./STAR --runMode genomeGenerate --genomeDir IRGSP_genome --genomeFastaFiles L1_1.fq L1_2.fq
                              Jul 17 10:55:11 ..... Started STAR run
                              Jul 17 10:55:11 ... Starting to generate Genome files
                              terminate called after throwing an instance of 'std:ut_of_range'
                              what(): vector::_M_range_check
                              zsh: abort ./STAR --runMode genomeGenerate --genomeDir IRGSP_genome --genomeFastaFiles

                              Any ideas,
                              Thanks!

                              Comment

                              • alexdobin
                                Senior Member
                                • Feb 2009
                                • 161

                                Originally posted by emmanouela View Post
                                Hi Brian,
                                No, I didn't use a gtf to do the mapping in this case. Plus the mapped read corresponds to a known intron (of a short gene) on one side but a random intergenic region way after the end of the gene of which it starts in ( at least according to UCSC) on the other side. And the 200kb overlaps with 4 other known genes too. So to my eyes thats definitely a mapping error too. The question now is how to filter those out (because they are quite a few of them).
                                Hi Emma,

                                these long-gap splices, often connecting adjacent genes, are somewhat common in RNA-seq data. It's hard to say whether they are biochemically real "read-through transcription" events, or some kind of wet-lab or mapping artifacts.
                                They would be clearly mapping artifacts if "better" alignments of these sequences can be found, however, BLATing or BLASTing them did not result in any better alignments.
                                One way to get rid of them is to completely prohibit long gaps with --alignIntronMax N, which would prohibit any gap longer than N (by default this is ~600000). However, if you make this too small, say 100000, you may miss a number of valid junctions, as mammalian introns can be hundred of kilobases long.
                                A better approach is filter out long-gap alignments supported by too few reads, e.g. :
                                --outFilterType BySJout --outSJfilterIntronMaxVsReadN 10000 20000 50000 100000
                                This would only allow unannotated junctions <=10kb supported by >=1 spliced read, <=20kb supported by >=2 reads, <=50kb by >= 3 reads, <=10kb by >=4 reads.

                                There is more discussion on this type of filtering in this post.

                                Cheers
                                Alex

                                Comment

                                Latest Articles

                                Collapse

                                • GATTACAT
                                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by GATTACAT
                                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                  07-01-2026, 11:43 AM
                                • SEQadmin2
                                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by SEQadmin2


                                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                  Here are nine questions we think about, in roughly the order they matter, before...
                                  06-18-2026, 07:11 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 11:08 AM
                                0 responses
                                7 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-30-2026, 05:37 AM
                                0 responses
                                11 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-26-2026, 11:10 AM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-17-2026, 06:09 AM
                                0 responses
                                53 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...