Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by westerman View Post
    That is low but it depends on your reference and your DNA and your organism. Which I do not think you have stated. But given this thread I presume that your reference is genomic and your DNA is microRNA. In that case you have to ask yourself, "how much of the genome do I expect to be miRNA as versus other RNA, genes, and structural?" If the answer is that you expect only 0.3% of your genome to be miRNA then your mapping is fine.

    microRNA are so newly discovered -- i.e., since I've been out of school -- that I am not sure how much of a genome should be miRNA. I could tell you roughly how much of genome should be gene and thus how much a mRNA experiment should have have as coverage but not for miRNAs.
    But according to this ABI document, they are getting 50% reads mapped to miRNA so 0.3% is worrying. And it is already enriched for small RNA. I am skeptical of the 50% claim though, wonder what other people are getting?
    Attached Files

    Comment


    • #17
      Originally posted by westerman View Post
      2 mismatches is great for SNP discovery since any given read is unlikely to have more than 1 SNP in it. Anything else can be discarded as error.
      The fraction of possible 50bp reads with X SNPs (from hg18 and dbsnp) is:

      0 84.08%
      1 13.02%
      2 2.30%
      3 0.40%
      4 0.10%
      5 0.03%
      ...

      so make your own judgment.

      On the other hand some of us have to deal DNA from species only partially related to our known (and often incomplete) reference sequence. We then use larger mismatch parameters and are thankful for what information we do get back.
      I think that if it is possible, try to align with the greatest sensitivity as possible, since you will recover the most amount of data. SOLiD color error rates are non-trivial and can be easily corrected (while correctly using dynamic programming, not valid-adjacent rules). I would recommend somewhere around 10% color differences (in most cases SNPs count as two, color errors as one).

      Comment


      • #18
        Originally posted by Sheila View Post
        In the configuration file you can choose between "all" or "unique".
        all = all mapping positions
        unique= unique mapping positions
        Thanks. Even with this, the pipeline still discards the reads that map to multiple places - even though a read may map to a reference with 0 mismatches and another one with 2 mismatches.

        Comment


        • #19
          Originally posted by fishtank View Post
          I am wondering where you came to the conclusion that last bases of the miRNA that are close to the adaptor have a high error rate. Could these be due to miRNA editing?
          Hi,
          It's is known the last bases close to the adaptor have a higher error rate so I would not use 0 mismatches first because you would not detect any isomiR with 1nt diference (polymorphic or not) and second because of the higher error rate at the end of the sequences.
          I'm still playing with the parameters, it's hard to define what's best.

          S.

          Comment


          • #20
            I am trying to figure out how the *.csfasta_extend.counts.35.6 gets generated from .csfasta_extend.ma.35.6. In the .csfasta_extend.ma.35.6, what does

            >1_17_829_F3,220_-79.6.21
            T13100202312110020020101102011303111

            means? I saw some documents that says it should be
            >TAG_ID,LOCATION,MISMATCHES.

            so 1_17_829_F3 is the TAG_ID.
            Is 6 is the mismatches? But how do I decode the location part?

            Thanks.

            Comment


            • #21
              Using rna2map, it seems to me the start/end chromosome coordinates in the *.csfasta_extend.counts.35.6 is offset by 1 relative to the reads...i.e. to view the read sequence correctly, I have to input chr:start-1 to end-1 into the ucsc genome browser.
              But if I take the chromosome location specified in mirBase.13.0.fasta generated, I don't have the offset to view the reference sequence. Why the difference?
              Can someone confirm this?

              Comment


              • #22
                Originally posted by fishtank View Post
                I am trying to figure out how the *.csfasta_extend.counts.35.6 gets generated from .csfasta_extend.ma.35.6. In the .csfasta_extend.ma.35.6, what does

                >1_17_829_F3,220_-79.6.21
                T13100202312110020020101102011303111
                PANEL_XCOORD_YCOORD_[F3/BC],FASTASEQNUMBER_LOCATION.MISMATCHES.LENGTH


                where FASTASEQNUMBER is the 1-indexed sequence number in your multi-entry fasta file.

                Comment


                • #23
                  Originally posted by OneManArmy View Post
                  PANEL_XCOORD_YCOORD_[F3/BC],FASTASEQNUMBER_LOCATION.MISMATCHES.LENGTH

                  where FASTASEQNUMBER is the 1-indexed sequence number in your multi-entry fasta file.
                  Thanks. It took me a while before I realize location is sequence number in fasta file. Any explanation regarding the chromosome coordinates "offset" in *.csfasta_extend.counts posted earlier? Thanks again.

                  Comment


                  • #24
                    I can provide some statistics concerning small RNA matching pipeline from AB.
                    I use a small RNA purifyed human sample in a barcoding experiment with 7.3M reads

                    I've run the pipeline many times with differents parameters :
                    - SeedMM : 0,1,2,3
                    - ExtendMM : 1, 3 or 6
                    - ReadType : random or unique

                    R_0_6 = Random, 0 seed MM and 6 Extend MM
                    For Tag count, Total beads and uniquely placed beads

                    _____________Tags________Total_____Unique
                    R_0_6 : __983.679____1.023.809____527.973
                    R_1_6 : 1.377.737____1.433.096____752.479
                    R_2_6 : 1.677.397____1.739.800____925.693
                    R_3_6 : 1.762.540____1.834.924____981.906

                    R_0_1 : __441.813______469.826____162.466


                    I do not perform genome mapping but we get between 13% to 24% of useable reads
                    mapped to a miRNA reference (the more we allow mismatchs, the more we have reads mapping miR).
                    Note that the number of uniquely placed beads does not increase (~55%),
                    and i would think that the more MM we allow the more there is a possibility that a read match
                    multiple references miR and does not uniquely mapped... Any idea where i'm wrong ?

                    Anyway it seems that in the later analysis that miR expression is not
                    clearly affected by the parameters we took to run the pipeline (Hopefully).

                    Comment


                    • #25
                      Hi:
                      I was wondering what people are doing with their miRNA data to quantitate
                      miR and miR* from their sequence reads.Especially novel miR*.
                      (I wish they change miR* nomenclature to more sensible 3p-5p one)
                      Is anybody aware of any computational approaches to automate miR vs miR*
                      quantitation?

                      Also I was wondering how people are addressing sense - antisense
                      mapping issues related to ds regions in pre-miRs?
                      We are still not sure how small RNA pipeline handles strand information, how it counts reads when they map to both strands (looks like it double-counts them).
                      And how do we summarize read counts efficiently in table form (not GB track) efficiently with strand information preserved.

                      Thanks

                      Comment


                      • #26
                        Realsitic miRNA mapping from SREK

                        Hi. I have a long experience in miRNA identification from 454 data and from march of this year I am grinding my teeth on SOLiD SREK results. I am using SHRiMP and custom made scripts both for genome mapping and mapping against miRBase reference (both mature and haripin)

                        Even biologically, the claim of 50% of miRNAs in a sample is unbelievable. I do think this number is including tRNAs (yes, there are many tRNA fragments of the stem very similar to miRNAs), snoRNAs etc etc. I am very cautious and conservative in this classification. I would say that mapping percentage of small RNAs from SREK experiment against Hs Genome will be between 50% and 60% of the reads. Known (ie well established) miRNAs will be from 5% to 15% of the total beads, i.e. from 10% to 30% of the mappable reads. You should be well aware of the danger of false positives also in known miRNA identification. More details on request


                        Originally posted by fishtank View Post
                        But according to this ABI document, they are getting 50% reads mapped to miRNA so 0.3% is worrying. And it is already enriched for small RNA. I am skeptical of the 50% claim though, wonder what other people are getting?

                        Comment


                        • #27
                          Is there any documentation of the algorithms inside rna2map?

                          Comment


                          • #28
                            Has anyone every found a detailed description of the rna2map tool?

                            Comment


                            • #29
                              How mismatches are calculated

                              In ideal world, I would expect rna2map pipeline to report number of mismatches that are present between "the part of the read that aligns to reference" and "the reference sequence". That is to say in old BLAST searches way of things, mismatches between highscoring pairs.
                              However, after doing some digging in to the code of the rna2map pipeline and analyzing mapping results, i have discovered that rna2map stupidly puts the number of mismatches that are found with the adaptor sequence as well in the alignment.

                              therefore if the alignment reads as follows
                              >TagID1,1_1000.6.22
                              >TagID2,1_1000.6.22

                              it means that there are six errors in total for both tags.
                              now consider this: your miRNA aligning with 0 mismatches to the reference for 22 bp (which is great) but adaptor is aligning with 6 mismatches (who cares).
                              and in second case: your miRNA aligning with 6 mismatches to the reference for 22 bp (which is not so great) but adaptor is aligning with 0 mismatches (who cares).

                              now if we looked at the alignment file only and not the reads that actually align, then we would be tempted to use both reads with equal weight. however, in real world it would not be such a great idea to use a read with six mismatches over 22 bp (~73% match).

                              has anybody ever looked into this kind of things before or anybody accounted for this ever before.

                              please share your views and opinions and we can discuss it further.

                              cheers

                              hardip
                              Post-doctoral Fellow
                              John Curtin School of Medical Research
                              Australian National University, Canberra, ACT, Australia

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM
                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 05-10-2024, 06:35 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-09-2024, 02:46 PM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-07-2024, 06:57 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-06-2024, 07:17 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X