Same species mapping problem with Tophat

nr23

Member

Join Date: Oct 2012

Posts: 42
- Share
- Tweet
#1

Same species mapping problem with Tophat

02-07-2013, 03:47 AM

Hi,

I'm working with Illumina PE (100bp) RNA-Seq reads from Xenopus laevis. I've previously had a lot of trouble (very low % reads mapped, and almost 0% reads 'properly paired') mapping with bowtie/tophat to the X.tropicalis genome and, assuming that this was due to mismatches (and being unable to overcome this by the limit of N-3 mismatches per segment in bowtie), I switched to using STAMPY (http://www.well.ox.ac.uk/project-stampy), which allows multiple mismatches, and achieved very good results.

Recently the X.laevis genome has been released - I've tried re-mapping my reads using tophat/bowtie, but still get the same results (<20% reads mapping and ver low fraction 'properly paired'). This is really confusing, I would expect the occasional mismatch due to allelic differences, but should still see almost all of my reads mapping.

In addition, on inspecting the bowtie log files, I can see that ~75% of both left and right reads map. The trouble seems to be with the way tophat interprets the alignment produced by bowtie, as tophat seems to include a very small fraction (6M reads / ~ 90M) and reports 100% mapped for these reads in samtools flagstat.

I'll paste some of the stats I'm seeing below:

Log file from bowtie run (X.laevis reads vs X.laevis genome):

logs> more bowtie.left_kept_reads.fixmap.log
# reads processed: 31151246
# reads with at least one reported alignment: 22576653 (72.47%)
# reads that failed to align: 8249899 (26.48%)
# reads with alignments suppressed due to -m: 324694 (1.04%)

logs> more bowtie.right_kept_reads.fixmap.log
# reads processed: 33478582
# reads with at least one reported alignment: 24249964 (72.43%)
# reads that failed to align: 8880054 (26.52%)
# reads with alignments suppressed due to -m: 348564 (1.04%)
Reported 30987873 alignments to 1 output stream(s)

Samtools flagstat on same tophat run:

> samtools flagstat accepted_hits.bam
6401438 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
6401438 + 0 mapped (100.00%:-nan%)
6401438 + 0 paired in sequencing
2050216 + 0 read1
4351222 + 0 read2
10784 + 0 properly paired (0.17%:-nan%)
205892 + 0 with itself and mate mapped
6195546 + 0 singletons (96.78%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

I'm really stumped with this - my STAMPY results are great (~85% reads mapped, ~70% reads paired properly) and eyeballing the results in IGV confirms that reads stack up nicely across expressed regions, and contain very few mismatches.

Any help would be tremendously appreciated!

Many thanks and all the best,

Nick
Tags: None

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM

Seqanswers Leaderboard Ad

Announcement

Same species mapping problem with Tophat

Latest Articles

ad_right_rmr

News