Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    Quick question, on a fastq file Tophat2 spends about 3-4h, STAR runs in 5 min - 1.5h.

    How come only 5 min? Thats almost unbelievable.. It says success, but Im very sceptical..

    Comment


    • #92
      That sounds about right. STAR is vastly faster than tophat, but requires much more memory.

      Comment


      • #93
        Holy moly.

        Comment


        • #94
          yeah file Log.out has actual mapping speed

          so check it to see, it's usually somewhere around hundreds of millions (reads) per hour on 8 CPUs

          if it's substantially lower, something is going wrong (i.e. if you use UCSC mRNAs as a reference there is a lot of junctions that are of "-1" length for whatever reason, so it slows STAR down a lot)

          Comment


          • #95
            Alex, is it possible to integrate your instrument to the Tool Panel of Main Public Galaxy Server (usegalaxy.org) as a working tool?

            Comment


            • #96
              Originally posted by dxkorall View Post
              Alex, is it possible to integrate your instrument to the Tool Panel of Main Public Galaxy Server (usegalaxy.org) as a working tool?
              This is certainly doable, however, such decisions are in the hands of the Galaxy Team - please make a request of Anton Nekrutenko & Co.

              Cheers
              Alex

              Comment


              • #97
                Hi guys,
                1.Any package available for STAR for easy installing
                2. Architecture: i686 CPU op-mode(s): 32-bit, 64-bit CPU(s): 2 is it compatible with any of STAR app either from binary or source
                Thanks

                Comment


                • #98
                  Pre-compiled linux binary is available here: https://code.google.com/p/rna-star/d...4.tgz&can=2&q=

                  Comment


                  • #99
                    Reads with very long "deletions"

                    Hello,

                    I used STAR to map our rna-seq single-end reads which are 50bp long (both with and without a gtf file). However, I get quite a few reads which supposedly have these huge deletions/gaps of hundreds of kb, which look like mapping issues.
                    Some time the "deletions" even span entire genes within them.

                    Two examples are:
                    HISEQ2000-02:509:C4C7EACXX:4:1306:5718:46438 0 chr10 94874729 255 44M196485N7M * 0 0 TAACGGAACTCCTACTAGATACATCAGGATGCAAACTATAAAAGGGTCAGT @@@DDD?D@DDHB>?B<B<<CAC?BEDG?9*)1CF;<??BF*??B)?90?? NH:i:1 HI:i:1 AS:i:45 nM:i:1 jM:B:c,1 jI:B:i,94874773,95071257

                    HISEQ2000-02:509:C4C7EACXX:4:2303:5831:46194 0 chr10 95008269 255 23M125529N28M * 0 0 CAATAAAAACGTATACCGATTGGCAAAAAAAAAGAAAAAAAAAAAAAAAAA CBCFFFFFHHFHHJJJJHIIJHEHJJJJJJJJJ-5@GIJHFDDDDDDDDDD NH:i:1 HI:i:1 AS:i:39 nM:i:0 jM:B:c,5 jI:B:i,95008292,95133820


                    Has anyone else seen these? Is there any way to filter them out???

                    Comment


                    • Not sure about the second one, but the first one with a 200kbp deletion anchored by a 7bp of read sequence looks like a probable mapping error to me, considering that a 7bp exact match would be expected purely by chance within about 16kbp of any random location. However, if that 200kbp corresponds exactly to a known intron in the GTF file, and only occurs when using the GTF file, it's probably OK. Does it?

                      Comment


                      • Originally posted by GenoMax View Post
                        Pre-compiled linux binary is available here: https://code.google.com/p/rna-star/d...4.tgz&can=2&q=
                        Thanks for the link... got use proxies to get it though....google issues here....my question was about 32 bit linux OS,any binaries for it.

                        Comment


                        • Don't think Alex provides 32-bit binaries. If you have a large genome (~ human) 32-bit may not work.

                          Build from source if you must have 32-bit: https://code.google.com/p/rna-star/d...e.tgz&can=2&q=

                          Comment


                          • Originally posted by Brian Bushnell View Post
                            However, if that 200kbp corresponds exactly to a known intron in the GTF file, and only occurs when using the GTF file, it's probably OK. Does it?
                            Hi Brian,
                            No, I didn't use a gtf to do the mapping in this case. Plus the mapped read corresponds to a known intron (of a short gene) on one side but a random intergenic region way after the end of the gene of which it starts in ( at least according to UCSC) on the other side. And the 200kb overlaps with 4 other known genes too. So to my eyes thats definitely a mapping error too. The question now is how to filter those out (because they are quite a few of them).

                            Comment


                            • Hi,
                              Trying to generate genome from Rice reference and I get the following error,have tried several STAR patches available:

                              biostat1@biostat[STAR_2.3.1z10] ./STAR --runMode genomeGenerate --genomeDir IRGSP_genome --genomeFastaFiles L1_1.fq L1_2.fq
                              Jul 17 10:55:11 ..... Started STAR run
                              Jul 17 10:55:11 ... Starting to generate Genome files
                              terminate called after throwing an instance of 'std:ut_of_range'
                              what(): vector::_M_range_check
                              zsh: abort ./STAR --runMode genomeGenerate --genomeDir IRGSP_genome --genomeFastaFiles

                              Any ideas,
                              Thanks!

                              Comment


                              • Originally posted by emmanouela View Post
                                Hi Brian,
                                No, I didn't use a gtf to do the mapping in this case. Plus the mapped read corresponds to a known intron (of a short gene) on one side but a random intergenic region way after the end of the gene of which it starts in ( at least according to UCSC) on the other side. And the 200kb overlaps with 4 other known genes too. So to my eyes thats definitely a mapping error too. The question now is how to filter those out (because they are quite a few of them).
                                Hi Emma,

                                these long-gap splices, often connecting adjacent genes, are somewhat common in RNA-seq data. It's hard to say whether they are biochemically real "read-through transcription" events, or some kind of wet-lab or mapping artifacts.
                                They would be clearly mapping artifacts if "better" alignments of these sequences can be found, however, BLATing or BLASTing them did not result in any better alignments.
                                One way to get rid of them is to completely prohibit long gaps with --alignIntronMax N, which would prohibit any gap longer than N (by default this is ~600000). However, if you make this too small, say 100000, you may miss a number of valid junctions, as mammalian introns can be hundred of kilobases long.
                                A better approach is filter out long-gap alignments supported by too few reads, e.g. :
                                --outFilterType BySJout --outSJfilterIntronMaxVsReadN 10000 20000 50000 100000
                                This would only allow unannotated junctions <=10kb supported by >=1 spliced read, <=20kb supported by >=2 reads, <=50kb by >= 3 reads, <=10kb by >=4 reads.

                                There is more discussion on this type of filtering in this post.

                                Cheers
                                Alex

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X