Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • STAR rna seq Aligner installation

    Hello.
    I am having issues installing the STAR RNAseq aligner.

    I downloaded the reference genome from the website but an not sure how to generate the genome from the manual.

    when I untar the hg19 folder, there exists a file titled "Genome" but has no extension, and I tried to "head" the file to check if it is the genome.fa (which I hope it is) and I'm not able to view the contents.

    here are my parameters, and here is the error I get when trying to make the reference genome.


    [acolombo@hpc-login2 hg19]$ /auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/hg19 --genomeFastaFiles /auto/rcf-proj/sa1/data/hg19/Genome --runThreadN 16 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 0
    Jan 06 21:41:42 ..... Started STAR run
    Jan 06 21:41:42 ... Starting to generate Genome files
    terminate called after throwing an instance of 'std:ut_of_range'
    what(): vector::_M_range_check
    Abort

  • #2
    Hello Arcolombo,

    I believe the problem is your genome file, just as you are suspecting. A genome file should be called "genome.fa".

    I hope this post will help you find what you need: http://seqanswers.com/forums/showthread.php?t=5996

    Comment


    • #3
      The genomes you can download from the STAR website have already been prepared - no need to run genomeGenerate on them again. Just skip ahead to the alignment stage.

      Comment


      • #4
        If I wish to re create the Genome directory from a previous directory used, and using the junctions bed file that was found on the STAR website, how to proceed?

        I ran a STAR command line that called a previous genome.fa from the UCSC site that I use for tophat. I added the parameters that point to the genome directory (UCSC hg19 directory) and also points to the genome.fa (from the previously used hg19 file). but in the genome creation I added the junctions file (according to the manual it is more accurate).

        I still get an error regarding

        [acolombo@hpc-login2 STAR]$ /auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta --genomeFastaFiles /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa --runThreadN 1 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 1 genomeChrBinNbits 12
        Jan 07 11:10:23 ..... Started STAR run
        Jan 07 11:10:24 ... Starting to generate Genome files
        Jan 07 11:15:40 ... finished processing splice junctions database ...
        Jan 07 11:16:55 ... starting to sort Suffix Array. This may take a long time...
        Jan 07 11:17:26 ... sorting Suffix Array chunks and saving them to disk...
        terminate called after throwing an instance of 'std::bad_alloc'
        what(): std::bad_alloc
        Abort

        Comment


        • #5
          This issue was found in the previous announcement of STAR release and the work solution was to use the parameters

          /auto/rcf-proj/sa1/software/STAR_2.3.0e/STAR --runMode genomeGenerate --genomeDir /auto/rcf-proj/sa1/data/hg19/Sequence --genomeFastaFiles /auto/rcf-proj/sa1/data/Homo_sapiens1/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa --runThreadN 1 --sjdbFileChrStartEnd /auto/rcf-proj/sa1/data/Junctions_Annotations --sjdbOverhang 1 genomeChrBinNbits 6 --genomeSAindexNbases 4


          Yet this is processing for over 45 minutes - 1 .5 hours. (quite very slow)

          Comment


          • #6
            Here are the results

            it gives an error about not enough SA indices... currently re running
            Attached Files

            Comment


            • #7
              How much memory do you have?

              Comment


              • #8
                Originally posted by ffinkernagel View Post
                How much memory do you have?
                Agreed, you may have run out of disk space. How much free memory do you have in your hard drive?

                Comment


                • #9
                  Not disk space, RAM - STAR uses quite a lot of ram to generate it's index (last time I checked, 16 GB were not enough for a human genome)

                  Comment


                  • #10
                    Hello I am having problmes trying to generate the index of mouse GRCm38 from Ensembl.
                    STAR stops when.. sorting Suffix Array chunks and saving them to disk... is running without any error so my Genome file for the next step is not generated.

                    I am running STAR using cygwin from windows and I have 64Gb RAM.
                    I heard that maybe the problem ends up with STAR's pre-compiled build. I am not an expert in informatics and RNA-seq analysis is also new for me, so I don't understand well how I have to compile STAR executable but what I did is set the working directory in cd STAR/source and runing STAR from here. Also I set the path to STAR executable in PATH enviroment variable in windows setting system. You guys did you have similar problems?

                    Im very stuck in this step for several days and I dont know what to do. Any help is welcoming. Could I use a already index generated from STAR in case I cannot do my own indexes?

                    I have a Intel Xeon CPU 3.5Ghz Number of Cores 4, Number of logical Procss 8 The mouse genome and genes.gtf files I downloaded them from iGenome website and I am using the WholeGenome.fa file from Ensembl. Is this genome too big and I have RAM limitiation? Should I generate my index chromosome per chromosome? How long could be last the index generation?

                    This is my command:

                    ./STAR --runMode genomeGenerate --genomeDir /cygdrive/c/Ana_Gómez_Secuenciación/CM1_FACS/20160818_Carpeta_de_trabajo_H3YJLBGXY/index --genomeFastaFiles /cygdrive/c/Ana_Gómez_Secuenciación/Genome/reference/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa --runThreadN 6 --sjdbGTFfile /cygdrive/c/Ana_Gómez_Secuenciación/Genome/GTF_files/referenceGTF/genes.gtf --sjdbOverhang 75 --genomeSAsparseD parameter 1

                    Comment


                    • #11
                      Generally, you can't just drop a linux binary into a Cygwin environment and expect it to run. As you alluded to, you almost certainly have to use MinGW to compile your own binaries. As someone who's slammed their head into a wall repeatedly trying to compile NGS analysis tools in Cygwin (I really wish I'd documented how I got samtools to compile properly that one time!), I'd highly recommend running Linux in a VM, I run ubuntu server installed under VirtualBox on my work mandated Windows PC, or natively as a dual boot. You'll find nothing but pain trying to get a usable NGS environment going on Windows, while almost everything you'd want to use was designed for and probably has a precompiled binary available for Linux (Not to mention a competent commandline, which is how nearly all of the tools are run).

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Genetic Variation in Immunogenetics and Antibody Diversity
                        by seqadmin



                        The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                        11-06-2024, 07:24 PM
                      • seqadmin
                        Choosing Between NGS and qPCR
                        by seqadmin



                        Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                        10-18-2024, 07:11 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 11:09 AM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Today, 06:13 AM
                      0 responses
                      20 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 11-01-2024, 06:09 AM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-30-2024, 05:31 AM
                      0 responses
                      21 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X