Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • NGSfan
    Senior Member
    • Apr 2009
    • 181

    #31
    Hello to everyone and thanks for adding to the discussion here. Sorry I did not have the time to reply. I really appreciated everyone's input and I even learned more things than I expected.

    Alim, so from what I understand from your post, you think that the ~60% mapping rate of raw 25bp reads from your experiment is most likely due to the higher error rates of the old illumina sequencing chemistry.

    The newer chemistry then, should be better and help to increase the mapping rates by reducing the sequencing errors. I will take a closer look at this once the longer read data sets are available.

    Comment

    • Lizex
      Member
      • Feb 2011
      • 22

      #32
      RNA-Seq reads mapping very low

      Hi All
      I have been mapping reads generated on a MiSeq instrument (150bp) against the apple genome. Reads were processed (trimmed and filtered using the fastx_toolkit). Reads mapped are over a 100bp long. Using Tophat (version 1.4) to align the reads, I got an average of 12% of reads mapped for e.g. out of 5 Gb of reads (read1 and read2) only 600 Mb of data in the accepted_hits.bam file. I take this as reads mapped. Is there something wrong or am I missing something here? Any suggestions as to how address this very low mapping. I'm sure that the percentage reads should be much better than 12%?

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #33
        Originally posted by Lizex View Post
        Hi All
        I have been mapping reads generated on a MiSeq instrument (150bp) against the apple genome. Reads were processed (trimmed and filtered using the fastx_toolkit). Reads mapped are over a 100bp long. Using Tophat (version 1.4) to align the reads, I got an average of 12% of reads mapped for e.g. out of 5 Gb of reads (read1 and read2) only 600 Mb of data in the accepted_hits.bam file. I take this as reads mapped. Is there something wrong or am I missing something here? Any suggestions as to how address this very low mapping. I'm sure that the percentage reads should be much better than 12%?
        Since you have paired-end data, be careful using fastx_toolkit. I've seen a lot of people desyncing their paired-end reads with it.

        Comment

        • lukas1848
          Member
          • Jun 2011
          • 54

          #34
          Did you check for adapter contamination of your reads?

          Comment

          • Lizex
            Member
            • Feb 2011
            • 22

            #35
            Why low mapping

            Thanks for the reply. I removed adaptors from the reads with cutadapt. I'm also curious how the reads map. Attached please see pic for how these map and also the presence of such a lot of N's. Should I worry about the N's and remove it or should I rather leave it. Any suggestions?
            Attached Files

            Comment

            • Lizex
              Member
              • Feb 2011
              • 22

              #36
              Originally posted by dpryan View Post
              Since you have paired-end data, be careful using fastx_toolkit. I've seen a lot of people desyncing their paired-end reads with it.
              Thanks, dpryan. What do suggest I use?

              Comment

              • dpryan
                Devon Ryan
                • Jul 2011
                • 3478

                #37
                Originally posted by Lizex View Post
                Thanks, dpryan. What do suggest I use?
                trim_galore or trimmomatic are common suggestions. I've had good luck in the past with trim_galore, which is also quite flexible.

                Comment

                • Lizex
                  Member
                  • Feb 2011
                  • 22

                  #38
                  Originally posted by dpryan View Post
                  trim_galore or trimmomatic are common suggestions. I've had good luck in the past with trim_galore, which is also quite flexible.
                  Thanks. I'll give it a try.

                  Comment

                  • Lizex
                    Member
                    • Feb 2011
                    • 22

                    #39
                    Originally posted by Lizex View Post
                    Thanks. I'll give it a try.
                    Hi dpryan

                    I've tried Trimmomatic. The number of reads i.e read1.fq and read2.fq are 1 492 345 for each. After mapping using Tophat 1.4.0, the stats of the accepted_hits.bam file looks like this:

                    samtools flagstat /Data_Analysis/E0.2.3/E0_tophat/accepted_hits.bam 1404454 + 0 in total (QC-passed reads + QC-failed reads)
                    0 + 0 duplicates
                    1404454 + 0 mapped (100.00%:nan%)
                    1404454 + 0 paired in sequencing
                    682904 + 0 read1
                    721550 + 0 read2
                    1200618 + 0 properly paired (85.49%:nan%)
                    1243330 + 0 with itself and mate mapped
                    161124 + 0 singletons (11.47%:nan%)
                    0 + 0 with mate mapped to a different chr
                    0 + 0 with mate mapped to a different chr (mapQ>=5)

                    Is this a good mapping or bad? How should I interpret this result?

                    Comment

                    • dpryan
                      Devon Ryan
                      • Jul 2011
                      • 3478

                      #40
                      That looks pretty reasonable. You started with ~1.5 million reads and aligned ~1.4 million, of which ~85% were properly paired. That's certainly a vast improvement over the original 12% mapping rate that you reported!

                      Comment

                      • Lizex
                        Member
                        • Feb 2011
                        • 22

                        #41
                        Originally posted by dpryan View Post
                        That looks pretty reasonable. You started with ~1.5 million reads and aligned ~1.4 million, of which ~85% were properly paired. That's certainly a vast improvement over the original 12% mapping rate that you reported!
                        Thanks for the reply. This result was for the paired reads (output from Trimmomatic). What should I do for the unpaired reads (output from Trimmomatic) which don't have an even number of reads, read1 has 896 804 reads and read2, 13 476. Should I map them also using Tophat.

                        Comment

                        • dpryan
                          Devon Ryan
                          • Jul 2011
                          • 3478

                          #42
                          Depending on exactly what you want to do with the reads, you can either map read1 as single-ended with tophat or just ignore them (the read2 file will mostly be crap in my experience). Given how many of your pairs became singletons, you might want to go ahead and align read1 just so you have a bit more data (I haven't ever lost many reads).

                          Comment

                          • Lizex
                            Member
                            • Feb 2011
                            • 22

                            #43
                            Originally posted by dpryan View Post
                            Depending on exactly what you want to do with the reads, you can either map read1 as single-ended with tophat or just ignore them (the read2 file will mostly be crap in my experience). Given how many of your pairs became singletons, you might want to go ahead and align read1 just so you have a bit more data (I haven't ever lost many reads).
                            Thanks for the advice.

                            Comment

                            • shrutimish@gmail.com
                              Member
                              • Dec 2012
                              • 12

                              #44
                              Hi, I have run RNA-seq on human samples and got very low alignment percentages in Tophat and RSEM. I had used Illumina ribo zero Truseq kit for library prep. What could be the reason of low alignment? Right now only 11% of my reads are aligning with the transcriptome in RSEM. Can I do something to fix this?

                              Comment

                              • dpryan
                                Devon Ryan
                                • Jul 2011
                                • 3478

                                #45
                                Replying to a ~year old thread is not normally the most efficient route to get help.

                                Did you adapter trim your data? Have you tried aligning to the genome? Have you tried blasting a few unaligned reads?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Pathogen Surveillance with Advanced Genomic Tools
                                  by seqadmin




                                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                  03-24-2025, 11:48 AM
                                • seqadmin
                                  New Genomics Tools and Methods Shared at AGBT 2025
                                  by seqadmin


                                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                  The Headliner
                                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                  03-03-2025, 01:39 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-20-2025, 05:03 AM
                                0 responses
                                49 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-19-2025, 07:27 AM
                                0 responses
                                57 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-18-2025, 12:50 PM
                                0 responses
                                50 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-03-2025, 01:15 PM
                                0 responses
                                201 views
                                0 reactions
                                Last Post seqadmin  
                                Working...