No announcement yet.

STAR rna seq Aligner installation

  • Filter
  • Time
  • Show
Clear All
new posts

  • STAR rna seq Aligner installation

    I am having issues installing the STAR RNAseq aligner.

    I downloaded the reference genome from the website but an not sure how to generate the genome from the manual.

    when I untar the hg19 folder, there exists a file titled "Genome" but has no extension, and I tried to "head" the file to check if it is the genome.fa (which I hope it is) and I'm not able to view the contents.

    here are my parameters, and here is the error I get when trying to make the reference genome.

    [[email protected] hg19]$ /auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/hg19 --genomeFastaFiles /auto/rcf-proj/sa1/data/hg19/Genome --runThreadN 16 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 0
    Jan 06 21:41:42 ..... Started STAR run
    Jan 06 21:41:42 ... Starting to generate Genome files
    terminate called after throwing an instance of 'std:ut_of_range'
    what(): vector::_M_range_check

  • #2
    Hello Arcolombo,

    I believe the problem is your genome file, just as you are suspecting. A genome file should be called "genome.fa".

    I hope this post will help you find what you need:


    • #3
      The genomes you can download from the STAR website have already been prepared - no need to run genomeGenerate on them again. Just skip ahead to the alignment stage.


      • #4
        If I wish to re create the Genome directory from a previous directory used, and using the junctions bed file that was found on the STAR website, how to proceed?

        I ran a STAR command line that called a previous genome.fa from the UCSC site that I use for tophat. I added the parameters that point to the genome directory (UCSC hg19 directory) and also points to the genome.fa (from the previously used hg19 file). but in the genome creation I added the junctions file (according to the manual it is more accurate).

        I still get an error regarding

        [[email protected] STAR]$ /auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta --genomeFastaFiles /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa --runThreadN 1 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 1 genomeChrBinNbits 12
        Jan 07 11:10:23 ..... Started STAR run
        Jan 07 11:10:24 ... Starting to generate Genome files
        Jan 07 11:15:40 ... finished processing splice junctions database ...
        Jan 07 11:16:55 ... starting to sort Suffix Array. This may take a long time...
        Jan 07 11:17:26 ... sorting Suffix Array chunks and saving them to disk...
        terminate called after throwing an instance of 'std::bad_alloc'
        what(): std::bad_alloc


        • #5
          This issue was found in the previous announcement of STAR release and the work solution was to use the parameters

          /auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/hg19/Sequence --genomeFastaFiles /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa --runThreadN 1 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 1 genomeChrBinNbits 6 --genomeSAindexNbases 4

          Yet this is processing for over 45 minutes - 1 .5 hours. (quite very slow)


          • #6
            Here are the results

            it gives an error about not enough SA indices... currently re running
            Attached Files


            • #7
              How much memory do you have?


              • #8
                Originally posted by ffinkernagel View Post
                How much memory do you have?
                Agreed, you may have run out of disk space. How much free memory do you have in your hard drive?


                • #9
                  Not disk space, RAM - STAR uses quite a lot of ram to generate it's index (last time I checked, 16 GB were not enough for a human genome)


                  • #10
                    Hello I am having problmes trying to generate the index of mouse GRCm38 from Ensembl.
                    STAR stops when.. sorting Suffix Array chunks and saving them to disk... is running without any error so my Genome file for the next step is not generated.

                    I am running STAR using cygwin from windows and I have 64Gb RAM.
                    I heard that maybe the problem ends up with STAR's pre-compiled build. I am not an expert in informatics and RNA-seq analysis is also new for me, so I don't understand well how I have to compile STAR executable but what I did is set the working directory in cd STAR/source and runing STAR from here. Also I set the path to STAR executable in PATH enviroment variable in windows setting system. You guys did you have similar problems?

                    Im very stuck in this step for several days and I dont know what to do. Any help is welcoming. Could I use a already index generated from STAR in case I cannot do my own indexes?

                    I have a Intel Xeon CPU 3.5Ghz Number of Cores 4, Number of logical Procss 8 The mouse genome and genes.gtf files I downloaded them from iGenome website and I am using the WholeGenome.fa file from Ensembl. Is this genome too big and I have RAM limitiation? Should I generate my index chromosome per chromosome? How long could be last the index generation?

                    This is my command:

                    ./STAR --runMode genomeGenerate --genomeDir /cygdrive/c/Ana_Gómez_Secuenciación/CM1_FACS/20160818_Carpeta_de_trabajo_H3YJLBGXY/index --genomeFastaFiles /cygdrive/c/Ana_Gómez_Secuenciación/Genome/reference/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa --runThreadN 6 --sjdbGTFfile /cygdrive/c/Ana_Gómez_Secuenciación/Genome/GTF_files/referenceGTF/genes.gtf --sjdbOverhang 75 --genomeSAsparseD parameter 1


                    • #11
                      Generally, you can't just drop a linux binary into a Cygwin environment and expect it to run. As you alluded to, you almost certainly have to use MinGW to compile your own binaries. As someone who's slammed their head into a wall repeatedly trying to compile NGS analysis tools in Cygwin (I really wish I'd documented how I got samtools to compile properly that one time!), I'd highly recommend running Linux in a VM, I run ubuntu server installed under VirtualBox on my work mandated Windows PC, or natively as a dual boot. You'll find nothing but pain trying to get a usable NGS environment going on Windows, while almost everything you'd want to use was designed for and probably has a precompiled binary available for Linux (Not to mention a competent commandline, which is how nearly all of the tools are run).