Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Same species mapping problem with Tophat

    Hi,

    I'm working with Illumina PE (100bp) RNA-Seq reads from Xenopus laevis. I've previously had a lot of trouble (very low % reads mapped, and almost 0% reads 'properly paired') mapping with bowtie/tophat to the X.tropicalis genome and, assuming that this was due to mismatches (and being unable to overcome this by the limit of N-3 mismatches per segment in bowtie), I switched to using STAMPY (http://www.well.ox.ac.uk/project-stampy), which allows multiple mismatches, and achieved very good results.

    Recently the X.laevis genome has been released - I've tried re-mapping my reads using tophat/bowtie, but still get the same results (<20% reads mapping and ver low fraction 'properly paired'). This is really confusing, I would expect the occasional mismatch due to allelic differences, but should still see almost all of my reads mapping.

    In addition, on inspecting the bowtie log files, I can see that ~75% of both left and right reads map. The trouble seems to be with the way tophat interprets the alignment produced by bowtie, as tophat seems to include a very small fraction (6M reads / ~ 90M) and reports 100% mapped for these reads in samtools flagstat.

    I'll paste some of the stats I'm seeing below:

    Log file from bowtie run (X.laevis reads vs X.laevis genome):

    logs> more bowtie.left_kept_reads.fixmap.log
    # reads processed: 31151246
    # reads with at least one reported alignment: 22576653 (72.47%)
    # reads that failed to align: 8249899 (26.48%)
    # reads with alignments suppressed due to -m: 324694 (1.04%)

    logs> more bowtie.right_kept_reads.fixmap.log
    # reads processed: 33478582
    # reads with at least one reported alignment: 24249964 (72.43%)
    # reads that failed to align: 8880054 (26.52%)
    # reads with alignments suppressed due to -m: 348564 (1.04%)
    Reported 30987873 alignments to 1 output stream(s)


    Samtools flagstat on same tophat run:

    > samtools flagstat accepted_hits.bam
    6401438 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    6401438 + 0 mapped (100.00%:-nan%)
    6401438 + 0 paired in sequencing
    2050216 + 0 read1
    4351222 + 0 read2
    10784 + 0 properly paired (0.17%:-nan%)
    205892 + 0 with itself and mate mapped
    6195546 + 0 singletons (96.78%:-nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)


    I'm really stumped with this - my STAMPY results are great (~85% reads mapped, ~70% reads paired properly) and eyeballing the results in IGV confirms that reads stack up nicely across expressed regions, and contain very few mismatches.

    Any help would be tremendously appreciated!

    Many thanks and all the best,

    Nick

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:47 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Working...
X