Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    Quick question, on a fastq file Tophat2 spends about 3-4h, STAR runs in 5 min - 1.5h.

    How come only 5 min? Thats almost unbelievable.. It says success, but Im very sceptical..

    Comment


    • #92
      That sounds about right. STAR is vastly faster than tophat, but requires much more memory.

      Comment


      • #93
        Holy moly.

        Comment


        • #94
          yeah file Log.out has actual mapping speed

          so check it to see, it's usually somewhere around hundreds of millions (reads) per hour on 8 CPUs

          if it's substantially lower, something is going wrong (i.e. if you use UCSC mRNAs as a reference there is a lot of junctions that are of "-1" length for whatever reason, so it slows STAR down a lot)

          Comment


          • #95
            Alex, is it possible to integrate your instrument to the Tool Panel of Main Public Galaxy Server (usegalaxy.org) as a working tool?

            Comment


            • #96
              Originally posted by dxkorall View Post
              Alex, is it possible to integrate your instrument to the Tool Panel of Main Public Galaxy Server (usegalaxy.org) as a working tool?
              This is certainly doable, however, such decisions are in the hands of the Galaxy Team - please make a request of Anton Nekrutenko & Co.

              Cheers
              Alex

              Comment


              • #97
                Hi guys,
                1.Any package available for STAR for easy installing
                2. Architecture: i686 CPU op-mode(s): 32-bit, 64-bit CPU(s): 2 is it compatible with any of STAR app either from binary or source
                Thanks

                Comment


                • #98
                  Pre-compiled linux binary is available here: https://code.google.com/p/rna-star/d...4.tgz&can=2&q=

                  Comment


                  • #99
                    Reads with very long "deletions"

                    Hello,

                    I used STAR to map our rna-seq single-end reads which are 50bp long (both with and without a gtf file). However, I get quite a few reads which supposedly have these huge deletions/gaps of hundreds of kb, which look like mapping issues.
                    Some time the "deletions" even span entire genes within them.

                    Two examples are:
                    HISEQ2000-02:509:C4C7EACXX:4:1306:5718:46438 0 chr10 94874729 255 44M196485N7M * 0 0 TAACGGAACTCCTACTAGATACATCAGGATGCAAACTATAAAAGGGTCAGT @@@DDD?D@DDHB>?B<B<<CAC?BEDG?9*)1CF;<??BF*??B)?90?? NH:i:1 HI:i:1 AS:i:45 nM:i:1 jM:B:c,1 jI:B:i,94874773,95071257

                    HISEQ2000-02:509:C4C7EACXX:4:2303:5831:46194 0 chr10 95008269 255 23M125529N28M * 0 0 CAATAAAAACGTATACCGATTGGCAAAAAAAAAGAAAAAAAAAAAAAAAAA CBCFFFFFHHFHHJJJJHIIJHEHJJJJJJJJJ-5@GIJHFDDDDDDDDDD NH:i:1 HI:i:1 AS:i:39 nM:i:0 jM:B:c,5 jI:B:i,95008292,95133820


                    Has anyone else seen these? Is there any way to filter them out???

                    Comment


                    • Not sure about the second one, but the first one with a 200kbp deletion anchored by a 7bp of read sequence looks like a probable mapping error to me, considering that a 7bp exact match would be expected purely by chance within about 16kbp of any random location. However, if that 200kbp corresponds exactly to a known intron in the GTF file, and only occurs when using the GTF file, it's probably OK. Does it?

                      Comment


                      • Originally posted by GenoMax View Post
                        Pre-compiled linux binary is available here: https://code.google.com/p/rna-star/d...4.tgz&can=2&q=
                        Thanks for the link... got use proxies to get it though....google issues here....my question was about 32 bit linux OS,any binaries for it.

                        Comment


                        • Don't think Alex provides 32-bit binaries. If you have a large genome (~ human) 32-bit may not work.

                          Build from source if you must have 32-bit: https://code.google.com/p/rna-star/d...e.tgz&can=2&q=

                          Comment


                          • Originally posted by Brian Bushnell View Post
                            However, if that 200kbp corresponds exactly to a known intron in the GTF file, and only occurs when using the GTF file, it's probably OK. Does it?
                            Hi Brian,
                            No, I didn't use a gtf to do the mapping in this case. Plus the mapped read corresponds to a known intron (of a short gene) on one side but a random intergenic region way after the end of the gene of which it starts in ( at least according to UCSC) on the other side. And the 200kb overlaps with 4 other known genes too. So to my eyes thats definitely a mapping error too. The question now is how to filter those out (because they are quite a few of them).

                            Comment


                            • Hi,
                              Trying to generate genome from Rice reference and I get the following error,have tried several STAR patches available:

                              biostat1@biostat[STAR_2.3.1z10] ./STAR --runMode genomeGenerate --genomeDir IRGSP_genome --genomeFastaFiles L1_1.fq L1_2.fq
                              Jul 17 10:55:11 ..... Started STAR run
                              Jul 17 10:55:11 ... Starting to generate Genome files
                              terminate called after throwing an instance of 'std:ut_of_range'
                              what(): vector::_M_range_check
                              zsh: abort ./STAR --runMode genomeGenerate --genomeDir IRGSP_genome --genomeFastaFiles

                              Any ideas,
                              Thanks!

                              Comment


                              • Originally posted by emmanouela View Post
                                Hi Brian,
                                No, I didn't use a gtf to do the mapping in this case. Plus the mapped read corresponds to a known intron (of a short gene) on one side but a random intergenic region way after the end of the gene of which it starts in ( at least according to UCSC) on the other side. And the 200kb overlaps with 4 other known genes too. So to my eyes thats definitely a mapping error too. The question now is how to filter those out (because they are quite a few of them).
                                Hi Emma,

                                these long-gap splices, often connecting adjacent genes, are somewhat common in RNA-seq data. It's hard to say whether they are biochemically real "read-through transcription" events, or some kind of wet-lab or mapping artifacts.
                                They would be clearly mapping artifacts if "better" alignments of these sequences can be found, however, BLATing or BLASTing them did not result in any better alignments.
                                One way to get rid of them is to completely prohibit long gaps with --alignIntronMax N, which would prohibit any gap longer than N (by default this is ~600000). However, if you make this too small, say 100000, you may miss a number of valid junctions, as mammalian introns can be hundred of kilobases long.
                                A better approach is filter out long-gap alignments supported by too few reads, e.g. :
                                --outFilterType BySJout --outSJfilterIntronMaxVsReadN 10000 20000 50000 100000
                                This would only allow unannotated junctions <=10kb supported by >=1 spliced read, <=20kb supported by >=2 reads, <=50kb by >= 3 reads, <=10kb by >=4 reads.

                                There is more discussion on this type of filtering in this post.

                                Cheers
                                Alex

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X