Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • alexdobin
    Senior Member
    • Feb 2009
    • 161

    Originally posted by kjusto View Post
    Hi,
    Trying to generate genome from Rice reference and I get the following error,have tried several STAR patches available:

    biostat1@biostat[STAR_2.3.1z10] ./STAR --runMode genomeGenerate --genomeDir IRGSP_genome --genomeFastaFiles L1_1.fq L1_2.fq
    Jul 17 10:55:11 ..... Started STAR run
    Jul 17 10:55:11 ... Starting to generate Genome files
    terminate called after throwing an instance of 'std:ut_of_range'
    what(): vector::_M_range_check
    zsh: abort ./STAR --runMode genomeGenerate --genomeDir IRGSP_genome --genomeFastaFiles

    Any ideas,
    Thanks!
    Hi @kjusto,
    Is it possible that your L1_1.fq L1_2.fq files are not fasta files - they seem to have FASTQ extension?
    If they are valid fasta, please send me a link to them, and I will try to figure out the problem.
    Cheers
    Alex

    Comment

    • ndaniel
      Member
      • Feb 2009
      • 33

      Originally posted by alexdobin View Post
      Hi Emma,

      these long-gap splices, often connecting adjacent genes, are somewhat common in RNA-seq data. It's hard to say whether they are biochemically real "read-through transcription" events, or some kind of wet-lab or mapping artifacts.
      They would be clearly mapping artifacts if "better" alignments of these sequences can be found, however, BLATing or BLASTing them did not result in any better alignments.
      One way to get rid of them is to completely prohibit long gaps with --alignIntronMax N, which would prohibit any gap longer than N (by default this is ~600000). However, if you make this too small, say 100000, you may miss a number of valid junctions, as mammalian introns can be hundred of kilobases long.
      A better approach is filter out long-gap alignments supported by too few reads, e.g. :
      --outFilterType BySJout --outSJfilterIntronMaxVsReadN 10000 20000 50000 100000
      This would only allow unannotated junctions <=10kb supported by >=1 spliced read, <=20kb supported by >=2 reads, <=50kb by >= 3 reads, <=100kb by >=4 reads.
      This looks like a very bad idea. There are plenty of fusion genes which are known to be "adjacent" but still are genuine fusion genes.

      For example, the famous fusion gene FGFR3-TACC3 (for more info about it please Google it) has the genes FGFR3 and TACC3 as adjacent and the distance between them is less than 50kb. If this fusion is expressed moderately there could be, for example, 2 reads supporting the fusion and it will be discarded according to the above criteria.

      Out there are several fusion finders which can handle this kind of cases with very low false positives rate, e.g. https://code.google.com/p/fusioncatcher/wiki/comparison
      Last edited by ndaniel; 09-03-2014, 11:18 AM.

      Comment

      • coryfunk
        Junior Member
        • Mar 2014
        • 8

        In using STAR for my alignments, I'm subsequently using samtools for some basic evaluations (and GATK after that to generate a vcf file). An issue I'm having is that even though STAR is correctly including the mapping location for each read in the sam file, it is not setting the flag that says if a read is mapped or not (-f 4). As such, when I use samtools to count the number of mapped reads, it's always zero. Samtools accurately counts the number of total reads and unmapped reads.

        Is this an issue with STAR or possibly something I'm doing incorrectly?

        Comment

        • dpryan
          Devon Ryan
          • Jul 2011
          • 3478

          STAR doesn't output unmapped reads in the SAM file (at least not by default), so bit 4 shouldn't be set there for any record.

          Comment

          • coryfunk
            Junior Member
            • Mar 2014
            • 8

            That's a bit of an issue because the default of not having a flag set is to call it "unmapped".

            Running:

            samtools view -c -F 4 <file.name>

            counts all the reads as unmapped. Downstream tools that rely on that flag are thus rendered incompatible.

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              No, not setting that bit means the read is mapped.
              Edit:I'll add that samtools view -c -F 4 foo.bam will count the mapped reads. Using -f 4 will count the unmapped reads, which will be 0.

              Comment

              • Brian Bushnell
                Super Moderator
                • Jan 2014
                • 2709

                0x4 means 'unmapped'. If no flags are set, that means it's a single-ended read mapped to the plus strand.

                Comment

                • coryfunk
                  Junior Member
                  • Mar 2014
                  • 8

                  My mistake. Thanks!

                  Comment

                  • serash
                    Junior Member
                    • Nov 2014
                    • 3

                    Hi,
                    I'm currently doing 2 pass alignment with the --twopass1readsN option, which adds the 1st pass junctions into genome in memory. I'm curious to know if this works when a genome in memory i used (--genomeLoad LoadAndKeep) from previous runs. And will this give the correct results if I run additional 2 pass alignments with this genome in memory or do I need to reload the genome for each run?

                    Thanks!

                    Comment

                    • alexdobin
                      Senior Member
                      • Feb 2009
                      • 161

                      Originally posted by serash View Post
                      Hi,
                      I'm currently doing 2 pass alignment with the --twopass1readsN option, which adds the 1st pass junctions into genome in memory. I'm curious to know if this works when a genome in memory i used (--genomeLoad LoadAndKeep) from previous runs. And will this give the correct results if I run additional 2 pass alignments with this genome in memory or do I need to reload the genome for each run?

                      Thanks!
                      Hi,

                      the --twopass1readsN cannot be used with --genomeLoad LoadAndKeep option, you will need to re-load the genome for every sample. For each of the samples the 2-nd pass genome indices will be different, since the junctions discovered in the 1st pass will change from sample to sample.

                      For multiple samples, one option is to map (2-pass with --twopass1readsN -1 which will map all reads in the first pass) all the samples together, and use the Read Group tags to mark different samples.

                      Another option is the manual 2-step operation: run the 1st pass on all samples, collect the detected junctions from all samples, re-generate the genome, and run 2nd pass with the new genome, which can now be loaded with --genomeLoad LoadAndKeep option.

                      Cheers
                      Alex

                      Comment

                      • serash
                        Junior Member
                        • Nov 2014
                        • 3

                        Hi again,

                        It makes sense I need to rebuild the genome first and can than use it again for all samples. However, I noticed that rebuilding the genome takes longer than the alignment itself. If I run the 2-pass alignment with --twopass1readsN -1 the outcome is achieved much faster by adding the SJ.out.tab in memory from the first run.
                        This said, I was wondering if its possible to do the same as the 2-pass run in shared memory, to add the data (give the file by --sjdbFileChrStartEnd?) to an existing genome in memory to avoid rebuilding the entire genome and thus saving time.

                        cheers,
                        Dries

                        Comment

                        • alexdobin
                          Senior Member
                          • Feb 2009
                          • 161

                          Originally posted by serash View Post
                          Hi again,

                          It makes sense I need to rebuild the genome first and can than use it again for all samples. However, I noticed that rebuilding the genome takes longer than the alignment itself. If I run the 2-pass alignment with --twopass1readsN -1 the outcome is achieved much faster by adding the SJ.out.tab in memory from the first run.
                          This said, I was wondering if its possible to do the same as the 2-pass run in shared memory, to add the data (give the file by --sjdbFileChrStartEnd?) to an existing genome in memory to avoid rebuilding the entire genome and thus saving time.

                          cheers,
                          Dries
                          This is a good suggestion, and it's already on my (short) TODO-list - adding annotated junctions on the fly, so that there no need to re-generate the whole genome index when you only want to change annotations.

                          Cheers
                          Alex

                          Comment

                          • serash
                            Junior Member
                            • Nov 2014
                            • 3

                            That's great to know! Thanks

                            Comment

                            • dan
                              wiki wiki
                              • Jul 2008
                              • 194

                              Star is generating files like this in the directory I'm running it from:
                              • SJ.out.tab
                              • Log.std.out
                              • Log.out
                              • Log.final.out
                              • Log.progress.out


                              How can I put those files into a specific directory so that I can run multiple jobs in parallel from a single directory?


                              Cheers,
                              Dan.
                              Homepage: Dan Bolser
                              MetaBase the database of biological databases.

                              Comment

                              • dpryan
                                Devon Ryan
                                • Jul 2011
                                • 3478

                                Just use the --outFileNamePrefix option to give everything an appropriate prefix.

                                Comment

                                Latest Articles

                                Collapse

                                • GATTACAT
                                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by GATTACAT
                                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                  07-01-2026, 11:43 AM
                                • SEQadmin2
                                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by SEQadmin2


                                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                  Here are nine questions we think about, in roughly the order they matter, before...
                                  06-18-2026, 07:11 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 07-02-2026, 11:08 AM
                                0 responses
                                18 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-30-2026, 05:37 AM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-26-2026, 11:10 AM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-17-2026, 06:09 AM
                                0 responses
                                54 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...