Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • nr23
    Member
    • Oct 2012
    • 42

    Samtools flagstat - low % reads mapping

    Hi,

    I'm working with RNA-Seq and using bowtie and tophat to align 65bp PE reads to a reference genome. My reads were sequenced from X.laevis and I'm attempting to first map to X.tropicalis (X.laevis genome is still draft version).

    After trimming and filtering my reads I am left with 31*2 = 62M reads but running samtools on my accepted_hits.bam file shows that only 12M reads have mapped in total. I'm completely confused about why the number of reads mapping is so low - I've tried fine tuning the options in tophat (-r value, -N value) and using differently trimmed reads - but have seen little improvement on 20% mapping success.

    In addition almost none of my reads pair properly (samtools flagstat 'properly paired' = 0.01%).

    Any help would be hugely appreciated,

    Thanks
  • chadn737
    Senior Member
    • Jan 2009
    • 392

    #2
    How have you trimmed your reads? Have you looked for adaptor sequence in your reads?

    Comment

    • nr23
      Member
      • Oct 2012
      • 42

      #3
      I've trimmed the reads using fastq_quality_trimmer & filter and fastx_trimmer.

      One of the problems I've had is that the RNA fragment size is ~130 bp (post adapter removal) and my 100bp reads therefore overlap considerably. I've been using fastx_trimmer to cut the reads to 65bp to ensure no overlap - but they don't seem to be pairing properly in mapping.

      I haven't checked for adapters - I ran the .txt files through fastqc and there were no over-represented sequences.

      N

      Comment

      • chadn737
        Senior Member
        • Jan 2009
        • 392

        #4
        Thats what I thought.

        Even at 65 bp you may still have overlap and/or adaptor sequence.

        Is it critical that you have paired end data? I had a similar situation with some paired end data. I simply dispensed with the second set of reads and treated it as single end reads. With that amount of overlap, its probably going to be impossible for tophat to get the insert size right.

        Also try adaptor trimming with a trimmer that can handle variable lengths of adaptor sequence, I have used cutadapt with great success. Then try realigning without your paired end and you should have better results.

        Otherwise....make a new library.

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          If your reads are overlapping significantly you may want to try this as an alternative to stitch the two ends together.



          Updated citation:

          Tanja Magoč and Steven L. Salzberg

          FLASH: fast length adjustment of short reads to improve genome assemblies Bioinformatics (2011) 27(21): 2957-2963
          Last edited by GenoMax; 11-01-2012, 08:04 AM.

          Comment

          • nr23
            Member
            • Oct 2012
            • 42

            #6
            I ran the 65bp trimmed reads through FLASH (http://genomics.jhu.edu/software/FLASH/index.shtml) to confirm that, post trim, there's no overlap.

            As I understand it bowtie and tophat map the pairs independently, so I would expect that dispensing of 1/2 of my reads would result in the same % mapped reads, maybe I'm wrong though?

            My primary concern is that the % of reads mapped is so low, I'm less concerned about the pairing of the reads (I'm interested in differential expression rather than resolving isoforms etc) but can't help but feel that the two are linked...

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            12 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            48 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            107 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            125 views
            0 reactions
            Last Post SEQadmin2  
            Working...