Unconfigured Ad

**dpryan** · 01-29-2014, 01:06 PM

There are a couple things likely contributing to this. Firstly, bowtie2 does apply its own scoring threshold, modifiable with the --b2-score-min option. Secondly, remember that blast will do local alignments and bowtie2/tophat2 global alignments. So if the ends of the reads don't map then you're unlikely to map them with tophat2.

One interesting thing to do would be to look into the run.log and see if modifying the --read-edit-distance actually changes the --score-min passed to bowtie2, which default to only allowing about 2 mismatch. If it doesn't, then increasing the --read-edit-distance probably won't do much (it'll only have an effect when mismatches occur at positions with low phred scores).

**sphil** · 01-30-2014, 12:38 AM

You can also try to generate local alignments with bowtie2 and check if you get more aligned sequences.
here are the options for local alignments:
Preset options in --local mode

--very-fast-local

Same as: -D 5 -R 1 -N 0 -L 25 -i S,1,2.00

--fast-local

Same as: -D 10 -R 2 -N 0 -L 22 -i S,1,1.75

--sensitive-local

Same as: -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default in --local mode)

--very-sensitive-local

Same as: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

**mparida85** · 02-01-2014, 10:01 PM

tophat/bowtie mapping

Hi Guys
Thanks for your replies.
I never completely understood what --read-edit-distance does actually?
I am using -N 5 --read-edit-dist 5 and --b2-sensitive. All I know is --read-edit-distance has to be >= -N option.

dpryan please elaborate a little on that.
I will highly appreciate your concern.
Also thanks for your suggestion too sphil. I will definitely try that experiment.

I am learning a lot from this blog. Also I just did some more digging into my unmapped.bam file and turns out there are paired end reads of which one read BLATs well and other read maps to nothing. May be the other read is a contaminant of some sort. I read in the manual of trim_galore (a fastq trimmer, simon andrews) that bowtie rejects pairs based on "whenever a start/end coordinate is contained within the other read".

Does anyone has ideas on if that might fit to my issue?

**dpryan** · 02-02-2014, 05:59 AM

An edit distance is a generalization of the concept of number of mismatches (in point of fact, it's a common distance metric for string comparisons). The general idea is that the edit distance is the number of changes to string A required to produce string B. If the only difference between the two is mismatches (e.g. you have an A in one and a T at the same place in another), then the edit distance and number of mismatches are the same. If you have an insertion or deletion between the two strings, then the number of mismatches will be less than the edit distance, as the former lacks any conception of what an insertion or deletion is. Since having insertions/deletions is relatively common when dealing with sequencing data, the concept of an edit distance is rather more useful than the number of mismatches.

Which part of my previous reply would you like me to expound upon?

Regarding what you read in the trim_galore manual, keep in mind that this is dependent on the version of bowtie that you're using. Bowtie1 doesn't deal with overlapping reads well at all. Bowtie2, however, can deal properly with these, provided you allow it to. Bowtie2 defaults to allowing alignment where one mate is contained either partially or entirely within the other. It doesn't allow "dovetail" alignments unless you pass the "--dovetail" flag, which I don't think tophat2 allows.

Relatedly, you might consider allowing "mixed" and "discordant" alignments, if you've told tophat2 to disallow them.

**mparida85** · 02-02-2014, 08:03 AM

reply to dpryan

Hi
dpryan
You already explained read-edit-distance in your first paragraph. That's what I was requesting you to explain. Thank you a lot.

FYI I am using bowtie2/tophat2 for mapping. I have allowed discordant and mixed alignments, because I can see them in my alignment summary reports.
I don't think I allowed dovetail alignment. I can try running some of the unmapped reads to experiment with them.

Question:
a) If there are some overrepresented sequences in my fastq file and most of them are non-coding RNA(rRNA, mitochondrial) is it good practice to allow them to map using bowtie2/tophat2 pipeline because they have good phred quality score but their per base sequence content(a.k.a ATGC plot) shows bias in sequence content?

I think as long as a sequence is of good quality we should allow it to map, doesn't matter where it came from, except adapter contaminantion and poor quality reads.

Again I cannot thank you enough for your time and knowledge that you are sharing with me.
Rocky

**dpryan** · 02-02-2014, 11:44 AM

I would normally only trim off adapter sequence and low-quality bases. Reads will often show bias in the first 10-13 bases, that's nothing to worry too much about. Similarly, having rRNA show up as an over-represented sequence is pretty normal and nothing to worry about. I should add that having a high duplication rate is also normal for RNAseq datasets.

**mparida85** · 02-03-2014, 03:50 PM

rRNA reads

Hi dpryan
Question:
1) do we remove the rRNA reads before calculating FPKM using cufflinks?
The reason I ask this question is because I am seeing some differentially expressed rRNA genes in my significantly differentially expressed genes list.
Please comment.
Rocky

**dpryan** · 02-04-2014, 01:54 AM

It's usually recommended to do so, at least unless you're actually interested in looking at rRNA. I think cufflinks has an option where you can mask some regions from analysis. My understanding, at least, is that that's geared toward avoiding rRNAs or other supper highly expressed transcripts that are likely to suppress FPKM/RPKM scores and increase variance.

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Yesterday, 05:37 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Yesterday, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 52 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 110 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

tophat mapping

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News