Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • arcolombo698
    Senior Member
    • Nov 2013
    • 142

    STAR rna seq Aligner installation

    Hello.
    I am having issues installing the STAR RNAseq aligner.

    I downloaded the reference genome from the website but an not sure how to generate the genome from the manual.

    when I untar the hg19 folder, there exists a file titled "Genome" but has no extension, and I tried to "head" the file to check if it is the genome.fa (which I hope it is) and I'm not able to view the contents.

    here are my parameters, and here is the error I get when trying to make the reference genome.


    [acolombo@hpc-login2 hg19]$ /auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/hg19 --genomeFastaFiles /auto/rcf-proj/sa1/data/hg19/Genome --runThreadN 16 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 0
    Jan 06 21:41:42 ..... Started STAR run
    Jan 06 21:41:42 ... Starting to generate Genome files
    terminate called after throwing an instance of 'std:ut_of_range'
    what(): vector::_M_range_check
    Abort
  • shunyip
    Member
    • Oct 2013
    • 20

    #2
    Hello Arcolombo,

    I believe the problem is your genome file, just as you are suspecting. A genome file should be called "genome.fa".

    I hope this post will help you find what you need: http://seqanswers.com/forums/showthread.php?t=5996

    Comment

    • ffinkernagel
      Senior Member
      • Oct 2009
      • 110

      #3
      The genomes you can download from the STAR website have already been prepared - no need to run genomeGenerate on them again. Just skip ahead to the alignment stage.

      Comment

      • arcolombo698
        Senior Member
        • Nov 2013
        • 142

        #4
        If I wish to re create the Genome directory from a previous directory used, and using the junctions bed file that was found on the STAR website, how to proceed?

        I ran a STAR command line that called a previous genome.fa from the UCSC site that I use for tophat. I added the parameters that point to the genome directory (UCSC hg19 directory) and also points to the genome.fa (from the previously used hg19 file). but in the genome creation I added the junctions file (according to the manual it is more accurate).

        I still get an error regarding

        [acolombo@hpc-login2 STAR]$ /auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta --genomeFastaFiles /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa --runThreadN 1 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 1 genomeChrBinNbits 12
        Jan 07 11:10:23 ..... Started STAR run
        Jan 07 11:10:24 ... Starting to generate Genome files
        Jan 07 11:15:40 ... finished processing splice junctions database ...
        Jan 07 11:16:55 ... starting to sort Suffix Array. This may take a long time...
        Jan 07 11:17:26 ... sorting Suffix Array chunks and saving them to disk...
        terminate called after throwing an instance of 'std::bad_alloc'
        what(): std::bad_alloc
        Abort

        Comment

        • arcolombo698
          Senior Member
          • Nov 2013
          • 142

          #5
          This issue was found in the previous announcement of STAR release and the work solution was to use the parameters

          /auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/hg19/Sequence --genomeFastaFiles /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa --runThreadN 1 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 1 genomeChrBinNbits 6 --genomeSAindexNbases 4


          Yet this is processing for over 45 minutes - 1 .5 hours. (quite very slow)

          Comment

          • arcolombo698
            Senior Member
            • Nov 2013
            • 142

            #6
            Here are the results

            it gives an error about not enough SA indices... currently re running
            Attached Files

            Comment

            • ffinkernagel
              Senior Member
              • Oct 2009
              • 110

              #7
              How much memory do you have?

              Comment

              • shunyip
                Member
                • Oct 2013
                • 20

                #8
                Originally posted by ffinkernagel View Post
                How much memory do you have?
                Agreed, you may have run out of disk space. How much free memory do you have in your hard drive?

                Comment

                • ffinkernagel
                  Senior Member
                  • Oct 2009
                  • 110

                  #9
                  Not disk space, RAM - STAR uses quite a lot of ram to generate it's index (last time I checked, 16 GB were not enough for a human genome)

                  Comment

                  • anagd
                    Junior Member
                    • Oct 2016
                    • 1

                    #10
                    Hello I am having problmes trying to generate the index of mouse GRCm38 from Ensembl.
                    STAR stops when.. sorting Suffix Array chunks and saving them to disk... is running without any error so my Genome file for the next step is not generated.

                    I am running STAR using cygwin from windows and I have 64Gb RAM.
                    I heard that maybe the problem ends up with STAR's pre-compiled build. I am not an expert in informatics and RNA-seq analysis is also new for me, so I don't understand well how I have to compile STAR executable but what I did is set the working directory in cd STAR/source and runing STAR from here. Also I set the path to STAR executable in PATH enviroment variable in windows setting system. You guys did you have similar problems?

                    Im very stuck in this step for several days and I dont know what to do. Any help is welcoming. Could I use a already index generated from STAR in case I cannot do my own indexes?

                    I have a Intel Xeon CPU 3.5Ghz Number of Cores 4, Number of logical Procss 8 The mouse genome and genes.gtf files I downloaded them from iGenome website and I am using the WholeGenome.fa file from Ensembl. Is this genome too big and I have RAM limitiation? Should I generate my index chromosome per chromosome? How long could be last the index generation?

                    This is my command:

                    ./STAR --runMode genomeGenerate --genomeDir /cygdrive/c/Ana_Gómez_Secuenciación/CM1_FACS/20160818_Carpeta_de_trabajo_H3YJLBGXY/index --genomeFastaFiles /cygdrive/c/Ana_Gómez_Secuenciación/Genome/reference/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa --runThreadN 6 --sjdbGTFfile /cygdrive/c/Ana_Gómez_Secuenciación/Genome/GTF_files/referenceGTF/genes.gtf --sjdbOverhang 75 --genomeSAsparseD parameter 1

                    Comment

                    • cmbetts
                      Senior Member
                      • Jun 2012
                      • 120

                      #11
                      Generally, you can't just drop a linux binary into a Cygwin environment and expect it to run. As you alluded to, you almost certainly have to use MinGW to compile your own binaries. As someone who's slammed their head into a wall repeatedly trying to compile NGS analysis tools in Cygwin (I really wish I'd documented how I got samtools to compile properly that one time!), I'd highly recommend running Linux in a VM, I run ubuntu server installed under VirtualBox on my work mandated Windows PC, or natively as a dual boot. You'll find nothing but pain trying to get a usable NGS environment going on Windows, while almost everything you'd want to use was designed for and probably has a precompiled binary available for Linux (Not to mention a competent commandline, which is how nearly all of the tools are run).

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                        by SEQadmin2


                        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                        Here are nine questions we think about, in roughly the order they matter, before...
                        06-18-2026, 07:11 AM
                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 06-26-2026, 11:10 AM
                      0 responses
                      16 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-17-2026, 06:09 AM
                      0 responses
                      49 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      108 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      125 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...