Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • only 40% of reads were mapped successfully

    I used Bowtie(missmatch 3), BWA(missmatch 4) for mapping reads to neurospora genome. I don't know why i only have 40% of reads which could be mapped, the rest were not mappable. I have never experienced such things, so what are the possible reasons for this, anyone have any idea?

  • #2
    Often: contaminations.
    Try to assemble the non mappable reads, for example with ABYSS, blast the resulting contigs to get an idea of what's in there, then try to align again against those organisms you identified in the contamination.

    You have checked the duplication ratio of your reads first though, right?

    Comment


    • #3
      Poor quality sequence, contamination, enrichment of repetitive sequence... plenty of possible reasons.

      I'd suggest running some QC on your raw sequence to see if that turns up any problems before delving any further into the failures. 40% isn't disastrously low so it may not be too serious a problem.

      Comment


      • #4
        Originally posted by ffinkernagel View Post
        Often: contaminations.
        Try to assemble the non mappable reads, for example with ABYSS, blast the resulting contigs to get an idea of what's in there, then try to align again against those organisms you identified in the contamination.

        You have checked the duplication ratio of your reads first though, right?
        What is "duplication ratio", how should i estimate that? Thanks

        Comment


        • #5
          Originally posted by hannat View Post
          What is "duplication ratio", how should i estimate that? Thanks
          It's a measure of how often each unique sequence is seen. High duplication levels indicate that your sequence may have been overamplified during library preparation. The QC report I linked to will show you a duplication level plot to see how many times you see unique, duplicated, triplicated etc sequences. It will also spot heavily overrepresented sequences in case you have a small number of heavy contaminants (eg primers).

          Comment


          • #6
            Originally posted by simonandrews View Post
            It's a measure of how often each unique sequence is seen. High duplication levels indicate that your sequence may have been overamplified during library preparation. The QC report I linked to will show you a duplication level plot to see how many times you see unique, duplicated, triplicated etc sequences. It will also spot heavily overrepresented sequences in case you have a small number of heavy contaminants (eg primers).
            I see a rise in the end of the duplication plot, so i have large number of sequence which were duplicated.
            Attached Files

            Comment


            • #7
              Originally posted by hannat View Post
              I see a rise in the end of the duplication plot, so i have large number of sequence which were duplicated.
              Actually I'd be more concerned about the front of the plot. This shows that you have a very high percentage of sequences which are replicated a small number of times (say up to 5). This either means that you have a huge fold coverage over the region that you're sequencing, or that your library has suffered from over-amplification.

              What you would hope to see on these plots is that the duplication rate immediately falls to very close to zero and stays there. Any significant amount of duplication is something to be concerned about.

              Comment


              • #8
                Going along with what Simon said (ooh, pad pun), how many reads are in your data set? The Neurospora crassa genome is ~40Mb. If you have close to, or more than 40 million reads you would expect to see some degree of low level duplication. The rise at the high end of the plot may be due to the over representation of the mt plasmid.

                Comment


                • #9
                  What's wrong with my ChIP seq data?

                  I perform H3K9me3 ChIP experiment and built the libarary acoording illumina's ChIP seq libarary protocol. The analysis of the data is as follows:

                  raw read: 46891730
                  map read: 42812364
                  uniq read: 40442380
                  used read:13409805
                  map ratio: 91.30%
                  uniq read: 86.25%
                  used ratio:28.6%
                  region: 253


                  The used raed/used ratio/region is too low. I cannot figure out the problem, could anyone help me?? Thanks!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  26 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X