Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Does the 2nd column indicate the strand? I check a few lines of my data. The frequency and number (2nd column value) are listed below.

    Frequency Number
    847 147
    75 163
    902 339
    94 355
    97 403
    909 419
    71 83
    847 99

    419/339 pair stands for the "+" strand, 147/99 the "-" strand"? What others?

    Comment


    • Originally posted by harrike View Post
      Does the 2nd column indicate the strand? I check a few lines of my data. The frequency and number (2nd column value) are listed below.

      Frequency Number
      847 147
      75 163
      902 339
      94 355
      97 403
      909 419
      71 83
      847 99

      419/339 pair stands for the "+" strand, 147/99 the "-" strand"? What others?
      See the added info in my post above.

      Comment


      • Hi Genomax,

        Thanks for providing the info. It is quite helpful. I am clear now.

        Rui

        Comment


        • Hi Alex,

          This time I am using STAR to another set of data, which are strand-specific, paired-end, and of 150 bp read length. The command I used is "

          STAR-STAR_2.4.2a/bin/Linux_x86_64/STAR --genomeDir Zmay_AGPv2_STAR_index/ --runThreadN 24 --readFilesIn Zm_ant_02_07a_TGACCA_L001_R1_001.fastq Zm_ant_02_07a_TGACCA_L001_R2_001.fx/ --runThreadN 24 --readFilesIn Zm_TGACCA_L001_R1_001.fastq Zm_TGACCA_L001_R2_001.fastq --outSAype EndToEnd --outFilterIntronMotifs RemoveNoncanonical --outFilterType BySJout --outFileNamePrefix Zm_antMtype BAM Unsorted --outFilterMultimapNmax 20 --alignIntronMax 10000 --alignMatesGapMax 10000 --alignEndsType EndToEnd --outFilterIntronMotifs RemoveNoncanonical --outFilterType BySJout --outFileNamePrefix Zm_TGACCA_L001_R1R2_

          There are 56.34 of reads unmapped (short), see the Log.final.out file below:

          Started job on | Feb 19 07:56:41
          Started mapping on | Feb 19 07:57:00
          Finished on | Feb 19 08:02:30
          Mapping speed, Million of reads per hour | 209.57

          Number of input reads | 19210657
          Average input read length | 302
          UNIQUE READS:
          Uniquely mapped reads number | 5863394
          Uniquely mapped reads % | 30.52%
          Average mapped length | 301.78
          Number of splices: Total | 5527054
          Number of splices: Annotated (sjdb) | 5329770
          Number of splices: GT/AG | 5445210
          Number of splices: GC/AG | 76305
          Number of splices: AT/AC | 5539
          Number of splices: Non-canonical | 0
          Mismatch rate per base, % | 0.83%
          Deletion rate per base | 0.07%
          Deletion average length | 3.02
          Insertion rate per base | 0.07%
          Insertion average length | 1.98
          MULTI-MAPPING READS:
          Number of reads mapped to multiple loci | 1335209
          % of reads mapped to multiple loci | 6.95%
          Number of reads mapped to too many loci | 5625
          % of reads mapped to too many loci | 0.03%
          UNMAPPED READS:
          % of reads unmapped: too many mismatches | 4.69%
          % of reads unmapped: too short | 56.34%
          % of reads unmapped: other | 1.46%
          What are the possible reason of this low-mapping rate? Thanks,

          Rui

          Comment


          • I suggest that you start by looking at a few (10-20) unmapped reads and blast them against nt to see what they are aligning to. You may be surprised by what you find and it may provide an explanation for the low % alignment.

            Comment


            • Hi Rui,

              here are a few suggestions in addition to @GenoMax's suggestion.

              1. You are using --alignEndsType EndToEnd, which requires end-to-end alignment for each read (no soft clipping). This might be too harsh for longer reads, which are more likely to have poor quality tails, adapters at the ends etc. Please try to map without this option.
              2. Map read1 and read2 separately - you may have a problem with one of the reads.
              3. Check sequencing quality by plotting quality scores vs position in read (Illumina pipelines typically produce these plots). If sequencing quality drops towards the ends of the reads for a substantial portion of the reads, this would explain poor mappability.

              Cheers
              Alex

              Comment


              • Hi Alex,

                Thanks for your suggestions.

                I manually checked a couple of reads as Genomax suggested, and find the major reason of this low mapping rate is because that most of the reads have adapter, due to the poor construction of RNA-seq library. What I am trying to do is to trim the adapter and do the mapping again. The read quality is good per FastQC check.

                I will try to relax --alignEndsType option, and see if the mapping will become better or not.

                Rui

                Comment


                • Thank your article. very helpful article. thank you very much.

                  Comment


                  • Just a quick question here. Is the parameters file used with --parametersFile just a list of command-line options in the same way I type in the console?

                    Comment


                    • Originally posted by SamCurt View Post
                      Just a quick question here. Is the parameters file used with --parametersFile just a list of command-line options in the same way I type in the console?
                      The file with parameters should have each parameter on a separate line:
                      <parameterName> <parameterValue(s)>
                      parameterName should not contain --
                      For instance,
                      genomeChrBinNbits 18
                      genomeSAsparseD 1
                      readFilesIn Read1 Read2
                      readFilesCommand -

                      Comment


                      • Thank you for the quick reply, Alex.

                        I also have another problem here. My new institution only has 2.4.0j on their cluster, and it'd take about a week to get a newer version installed. Do you think it's safe to run the first pass using 2.4.0j, and use its SJ.out.tab files for --sjdbFileChrStartEnd when I get, say, 2.5.1b?

                        Comment


                        • Originally posted by SamCurt View Post
                          Thank you for the quick reply, Alex.

                          I also have another problem here. My new institution only has 2.4.0j on their cluster, and it'd take about a week to get a newer version installed. Do you think it's safe to run the first pass using 2.4.0j, and use its SJ.out.tab files for --sjdbFileChrStartEnd when I get, say, 2.5.1b?

                          Hi Sam,

                          this would be generally safe, however, when you publish your method, the reviewers and readers will have a bone to pick with you.
                          STAR does not really require installation, you can download a pre-compiled executable and run it instead of the one "installed" on your cluster.
                          I recommend re-generating the genome indexes for the 2.5.1b.

                          Cheers
                          Alex

                          Comment


                          • Originally posted by alexdobin View Post
                            I recommend re-generating the genome indexes for the 2.5.1b.

                            Cheers
                            Alex
                            @Alex: Does that mean indexes generated with older versions won't work or you recommend that they be regenerated.

                            Comment


                            • Originally posted by GenoMax View Post
                              @Alex: Does that mean indexes generated with older versions won't work or you recommend that they be regenerated.
                              The new versions of STAR may not work with old genome indexes in rare cases - hence my recommendation to re-generate with 2.5.1 that is very stable.

                              Comment


                              • So, just for gene expression profiling purposes, should I keep my sjDb file set for second-pass alignment constant?

                                Complete story: I have a set of ~40 samples already completed the entire set of double-pass alignment for both gene expression and variation analysis purposes. sjDb files from the first-passes of these samples were used for their second-pass alignments.

                                Now I have received a further ~15 samples within the same project of which I'd perform gene expression only. I wonder whether I should I do a first-pass on these new samples and pool their sjDb's with the old ones for second-pass, or just do a "second-pass" with the old sjDb's? My concern is obviously not about time, but rather whether using a different sjDb set would make the gene counts less comparable.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X