Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Dear Alex,

    I am currently using STAR on a new strand-specific paired-end RNA-seq dataset, and I am very happy with the output.
    I just have one question though (as a matter of interest really), does STAR calculate the MAPQ value for the reads? I am asking because at the moment with my new dataset (and also with a simulated dataset) all my mapped reads have a MAPQ value of 255, see below:
    Code:
    FCC1GLHACXX:3:1101:16858:25563#GTGAAA   163     5       19275940        255     1S65M24S        =       19275940        65      GCCTTTTGAGACAATACAAATCAAAATATTTACAGAGATAAGGCAGAATCAAACTACATTAAGGAGGGTCTGAGATCGGAAGAGCGTCGT      bbbeeeeegggggiiiiiiiiiiiiiiiiiiiiihiiiiiiiiiiiiihihiiiiiiiiihiiiiiiidggggggeeeeccccccccccc      NH:i:1  HI:i:1	AS:i:124        nM:i:2
    FCC1GLHACXX:3:1101:16858:25563#GTGAAA   83      5       19275940        255     19S65M6S        =       19275940        -65     CGTGTGCTCTTCCGATCTGCCTTTTGAGACAATACAAATCAAAATATTTACAGAGATAAGGCAGAATCAAACTACATTAAGGAGGGTCTG      cdddeeeegggihhhiiiiiihhfhiihghiihgiiihghiiiiiiiiiiihiiiiiiiiiiiiiiiiiiihhgiiigggggeeeeebbb      NH:i:1  HI:i:1	AS:i:124        nM:i:2
    FCC1GLHACXX:3:1101:16918:25584#GTGAAA   99      14      81019100        255     90M     =       81019191        181     TTGTACCAGTTATCAAACTGTGTTTTGATGGGATAGAGATTGATATTTTGTTTGCAAGATTAGCACTGCAGACTATTCCAGAAGACTTGG      bbbeeeeeggggghiiiiiiehdghiifhhiifghiegfhhhihiiiiiigiihifhihhhiiiiiiiiiiiiiiiihgggfggeee_cd      NH:i:1  HI:i:1  AS:i:176	nM:i:1
    FCC1GLHACXX:3:1101:16918:25584#GTGAAA   147     14      81019191        255     90M     =       81019100        -181    CTTAAGAGATGACAGTCTGCTTAAAAATTTAGATATAAGATGTATAAGAAGTCTTAACGGTTGCAGGGTAACCGATGAAATTTTACATCT      ddddddeeedeeeggggihdeiiiiiiiiiiihiihiiihhiihhhhiiihhihiihihhhiiiiiiiiiiiiiiiigggggeeeeeab_      NH:i:1  HI:i:1  AS:i:176	nM:i:1
    Thanks a lot for you help, regards,
    Nicolas

    Comment


    • #47
      Hey Nicolas,

      Star doesnt actually have a mapping quality score for the reads theirs is setup like this:

      255 = uniquely mapped reads
      3 = read maps to 2 locations
      2 = read maps to 3 locations
      1 = reads maps to 4-9 locations
      0 = reads maps to 10 or more locations

      So in essence they only have 5 values for their mapping quality score 255, 3, 2, 1, and 0 with respect to what I just mentioned above earlier. I hope this helps in answering your question

      Thanks,
      Nino

      Comment


      • #48
        Dear Nino,
        That's great, thanks a lot for your answer.
        It makes a lot of sense since I set up the alignment to output only uniquely mapping reads.
        Thanks for the help.
        Regards,
        Nicolas

        Comment


        • #49
          Originally posted by alexdobin View Post
          I believe it's best to feed Cufflinks only with the highest confidence alignments, and non-canonical junctions in my experience contain more false positives.
          Also, many non-canonical splices occur just a few bases away from the highly expressed canonical, which could be caused by sequencing/mapping errors, and possibly by spliceosome errors. These splices will likely throw Cufflinks assembly off.
          to add to that recommendation I've also found that cufflinks seems to perform worse if your alignments contain primary and secondary positions for reads. It's best to only have one alignment per read or pair.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment


          • #50
            Issue with shared memory

            Dear STAR community and developers,

            STAR is great and efficient tool for RNA reads mapping, thank you.

            Meanwhile, I think I have an issue being that STAR does not seem to share the genome between parallel runs:
            When running htop while submitting parallel STAR jobs, I observe a propotional increase in the amount of RAM used, about 25GB per genome for the bovine genome. (memory bar was essentially empty before submitting the jobs, and cache filled 25*5GB after submitting 5 jobs)

            For the record, in htop the increase of RAM memory used affects the cache memory(yellow bars), while the "used" RAM (green) remains very low (about 5GB).

            The system is Ubuntu 12.04.2 LTS (GNU/Linux 3.5.0-34-generic x86_64)

            Here are two commands among the several I submitted in parallel (replace some paths with ...):

            STAR --runMode alignReads --genomeDir /.../STAR2.3.0e_no_annotation/ --genomeLoad LoadAndRemove --readFilesIn /.../N1855_CN_0H_pe1.fastq /.../N1855_CN_0H_pe2.fastq --runThreadN 3 --outFilterMultimapNmax 10 --outSAMmode Full --outSAMattributes Standard --outFileNamePrefix ./N1855_CN_0H_ --outReadsUnmapped Fastx

            STAR --runMode alignReads --genomeDir /.../STAR2.3.0e_no_annotation/ --genomeLoad LoadAndRemove --readFilesIn /.../N1178_CN_0H_pe1.fastq /.../N1178_CN_0H_pe2.fastq --runThreadN 3 --outFilterMultimapNmax 10 --outSAMmode Full --outSAMattributes Standard --outFileNamePrefix ./N1178_CN_0H_ --outReadsUnmapped Fastx

            Here is the memory status (as output from top) after 5 jobs submitted.
            Mem: 264131104k total, 193475272k used, 70655832k free, 89844k buffers


            Note:
            I obtained the numbers above by submitting the jobs right after restarting the machine. About 15min into the jobs, the 256 GB of RAM were fully in use:
            Mem: 264131104k total, 263666924k used, 464180k free, 62412k buffers


            Am I right in stating the genome is loaded separately in memory for the different jobs, as they seem to add up in the htop memory bar? If so, is it something in the command lines submitted which is wrong?

            Many thanks,
            Kevin Rue

            Comment


            • #51
              I may not be entirely correct as I suppose I am still a novice, but yes, I do believe Star loads it all into memory. However, after the first run the remaining alignments are faster. (At least this is what I've observed.) I also attempted to submit multiple runs at one time and I ended up just killing the process. If you have the computing power, just run Star with a few more threads one alignment at a time. It really is a fast tool.

              Comment


              • #52
                Issue with shared memory (2nd post)

                Hi NitaC,

                Thank you for your answer. Submitting one job at a time with more resources is an option, but the sharing of genome in memory is definitely a powerful and attractive feature which I would like to use.

                I am not sure what you mean by:
                loads it all into memory
                From my point of vie, "all" would be a single instance of the bovine genome. I suspect you meant "loads all the genomes for each separate job" ?
                From what the manual states, if one instance is already present in memory, any subsequent job should use this instance, instead of loading another instance of the same genome. (LoadAndRemove option)

                Meanwhile, what I observed is that for each concurrent job submitted (all pointing at the same genome folder), the amount of cache RAM used increases by approx. 25GB.
                In htop, for each job, I read the three columns:
                VIRT RES SHR
                25.9G 25.7G 25.1G

                These numbers suggest that the genome is loaded in the shared memory, although we recently submitted 10 jobs simultaneously, and the machine almost froze (extremely slow, even to log in), and even the SWAP memory was fully used.

                I hope this explains more clearly my dilemna, between my understanding of the manual (one instance loaded) and my observations (increasing RAM usage, frozen machine)

                Anyone has a further insight into the sharing of genome?

                Many thanks
                Kevin

                Comment


                • #53
                  First load the genome into shared memory without aligning anything at all:
                  Code:
                  STAR --genomeDir /.../STAR2.3.0e_no_annotation/ --genomeLoad LoadAndExit
                  Then for each instance you need to specify the option '--genomeLoad LoadAndKeep'. This instructs STAR to check for the shared genome in shared memory and load it for the run then leave it loaded in shared memory.

                  When you're finally done aligning you need to run the following to unload the genome from shared memory:

                  Code:
                  STAR --genomeDir /.../STAR2.3.0e_no_annotation/ --genomeLoad Remove
                  They also suggest that if a genome has been loaded into shared memory for some time it may need to be unloaded and reloaded because it may get "paged out" by the system if it wasn't being used. This can have a serious impact to STAR's performance.
                  /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                  Salk Institute for Biological Studies, La Jolla, CA, USA */

                  Comment


                  • #54
                    Issue with shared memory (solved)

                    Hi sdriscoll,

                    Many thanks for the very helpful answer, it seems to work on our system exactly the way you described it.

                    For those interested, I attached a file containing the memory usage as outputed by top when submitting multiple jobs the way sdriscoll described.

                    In short:
                    • Loading the genome was accompanied with 52GB increase of RAM usage (probably some independent process account for part of this, as bovine genome is expected ~27GB)
                    • Each job was accompanied by a marginal increase of RAM usage (jobs confirmed by a 3% increase in CPU usage)
                    • Killing the jobs left the RAM usage constant (but reduced the CPU usage)
                    • Removing the genome reduced the RAM usage by ~27GB (expected size of bovine genome)


                    Our confusion was that we understood from the STAR manual, that the LoadAndRemove option would share a genome the same way as LoadAndKeep, except for removing the genome from memory after the last job finishes.
                    Apparently, this is not the case.

                    If I got it right:
                    • each job using LoadAndRemove will ignore any pre-loaded instance of the genome, load a new instance and remove it when done
                    • each job using LoadAndKeep will share the existing instance of the genome, or load one if absent


                    Please correct me if I am wrong! (peer-review always appreciated )
                    Kevin
                    Attached Files

                    Comment


                    • #55
                      I think the documentation should include some examples because the explanation is a little confusing. Glad that worked, though.
                      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                      Salk Institute for Biological Studies, La Jolla, CA, USA */

                      Comment


                      • #56
                        Hi everyone,

                        At the moment, I am having an issue with STAR 2.3.0e, which I am trying to use in order to align 2x90bases paired-end reads sample against Mycobacterium bovis genome NC_002945.3 from NCBI.
                        I obtain the error below:

                        Code:
                        Jun 19 19:45:09 ..... Started STAR run
                        Jun 19 19:45:09 ..... Started mapping
                        Segmentation fault (core dumped)
                        The command used to generate genome was (which seems to be working fine):

                        Code:
                        STAR --runMode genomeGenerate --genomeDir /path/STAR2.3.0e --genomeFastaFiles /path/Mycobacterium_bovis_NC_002945.3.fasta --runThreadN 1
                        The command used to align reads was (which gives me the error above):

                        Code:
                        STAR --runMode alignReads --genomeDir /workspace/storage/genomes/Mycobacterium_bovis/NC_002945.3/STAR2.3.0e --genomeLoad NoSharedMemory --readFilesIn /home/dmagee/scratch/ALV_MAC_RNAseq/fastq_sequence/N1178_CN_0H_pe1.fastq /home/dmagee/scratch/ALV_MAC_RNAseq/fastq_sequence/N1178_CN_0H_pe2.fastq --runThreadN 1 --outFilterMultimapNmax 10 --outSAMmode Full --outSAMattributes Standard --outFileNamePrefix ./N1178_CN_0H_ --outReadsUnmapped Fastx
                        I am not sure of what is going on there, since my command was working perfectly when I aligned the exact same reads sample against Bos taurus genome.

                        Any idea on what could be the problem, it will be very much appreciated.
                        Thanks a lot.
                        Best wishes,
                        Nicolas

                        Comment


                        • #57
                          Hi Kevin, Shawn,

                          it's great that Shawn's work-around worked. However, the --genomeLoad LoadAndRemove option is supposed to work the same way, allowing one copy of the genome to be shared between the jobs. The only thing I can think of at the moment is that if you submit the jobs precisely at the same moment, they might not see each other shared memory, and decide to allocate it on its own. Could you please try to run 2-3 jobs (without killing your server ) pausing for 10sec between them, and send me the Log.out outputs for each job. Also, while they are running, you can run Linux 'ipcs' command which will tell us which shared memory pieces are being used.

                          The fact that 50GB of RAM is used for the 27GB index also concerns me a bit.

                          Cheers
                          Alex

                          Comment


                          • #58
                            Originally posted by Nicolas Nalpas View Post
                            Hi everyone,

                            At the moment, I am having an issue with STAR 2.3.0e, which I am trying to use in order to align 2x90bases paired-end reads sample against Mycobacterium bovis genome NC_002945.3 from NCBI.
                            I obtain the error below:

                            Code:
                            Jun 19 19:45:09 ..... Started STAR run
                            Jun 19 19:45:09 ..... Started mapping
                            Segmentation fault (core dumped)
                            Nicolas
                            Hi Nicolas,

                            This is a known problem for very small genomes. At the genome generation step, please try to reduce the value of --genomeSAindexNbases to <=8, and then re-run the mapping step.
                            Generally, --genomeSAindexNbases needs to be scaled with the genome length, as ~min(14,log2(ReferenceLength)/2 - 1). I will need to incorporate this scaling in the future releases.

                            Cheers
                            Alex

                            Comment


                            • #59
                              Dear Alex,

                              Thanks for your help, I have tried to generate my Mycobacterium bovis genome as you rocommanded:

                              Code:
                              STAR --runMode genomeGenerate --genomeDir /workspace/storage/genomes/Mycobacterium_bovis/NC_002945.3/STAR2.3.0e --genomeFastaFiles /workspace/storage/genomes/Mycobacterium_bovis/NC_002945.3/source_file/Mycobacterium_bovis_NC_002945.3.fasta --genomeSAindexNbases 8 --runThreadN 1
                              And I now obtain a different error while doing the alignment:

                              Code:
                              Jun 20 09:23:01 ...... FATAL ERROR, exiting
                              Jun 20 09:23:01 ..... Started STAR run
                              
                              EXITING because of FATAL error, could not open file /path/sjdbInfo.txt
                              SOLUTION: check that the path to genome files, specified in --genomDir is correct and the files are present, and have user read permsissions
                              So I checked the genome generate directory, into which read and write permissions are allowed, however there is no such "sjdbInfo.txt" file.

                              My command for alignment was:

                              Code:
                              STAR --runMode alignReads --genomeDir /path/STAR2.3.0e --genomeLoad LoadAndKeep --readFilesIn /path/N1178_CN_0H_pe1.fastq /path/N1178_CN_0H_pe2.fastq --runThreadN 1 --outFilterMultimapNmax 10 --outSAMmode Full --outSAMattributes Standard --outFileNamePrefix ./N1178_CN_0H_ --outReadsUnmapped Fastx
                              Any idea on how I can sort this out?

                              Thanks a lot for all your help.
                              Regards,
                              Nicolas

                              Comment


                              • #60
                                Dear Alex,

                                I actually sorted it out by reading again your previous answer, so I tried out with --genomeSAindexNbases to 7 and it seems to work fine now, no error so far.

                                Thanks again for all your help, very appreciated.
                                Regards,
                                Nicolas

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X