Seqanswers Leaderboard Ad

**whsqwghlm** · 01-29-2010, 07:12 AM

We've been using (to get the top 101 exact matches);
bowtie -k 101 -v 0

Our workflow uniquifies the sequences before alignment so we're not concerned about quality values. I'm also guessing that the miRNA sequences are sufficiently conserved for us not to worry about mismatches.

However, I'm very interested in the views of others on this.

**yjhua2110** · 02-02-2010, 07:21 AM

in our deepBase database, we use options: –k 200 –v 0. the Specifying the parameters (–k 200 –v 0) instructs Bowtie to report up to 200 perfect hits for each read.

deepBase is a platform for annotating and discovering small and long ncRNAs from next generation sequencing data. It is available at http://deepbase.sysu.edu.cn

**houhuabin** · 02-02-2010, 07:49 AM

Are you looking for this?

mirTools for microRNA profiling and discovery based on high-throughput sequencing - SEQanswers

http://seqanswers.com/forums/showthread.php?t=3407&highlight=mirtools

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

**whsqwghlm** · 02-02-2010, 07:55 AM

Could well be. However, the link is broken. I would be very grateful if you could fix. Thanks!

**houhuabin** · 02-02-2010, 07:59 AM

Sorry for that, now it is fixed.

Thanks!

**whsqwghlm** · 02-02-2010, 10:37 AM

After a few days of struggling with quality/homeopolymer/adaptor trimming my reads, and reading about 3' RNA edits and so forth, I've decided to try something similar to staylor's original suggestion (similar to the algorithm used by miRanalyzer);

bowtie -n 0 -l 15 --best

This should give the best match(es) for an exact 15bp 5' seed. If anyone is interested in a direct comparison between this and the original (-v 0) parameters, or has another view on this, please let me know.

**bioinfosm** · 02-03-2010, 12:58 PM

so what is your post processing? what is the reference sequence? and how do you summarize the data?

**whsqwghlm** · 02-04-2010, 01:50 AM

In terms of post-processing, We're loading the alignments into an Ensembl database so that we can screen for known genes and repeats. We then predict novel small RNAs, and estimate transcript counts for all loci based on read coverage. It's designed to be a generic pipeline for metazoa. As everything is in an Ensembl database the results can be browsed, and ad-hoc reports generated.

**staylor** · 02-25-2010, 05:23 AM

Originally posted by whsqwghlm View Post

In terms of post-processing, We're loading the alignments into an Ensembl database so that we can screen for known genes and repeats. We then predict novel small RNAs, and estimate transcript counts for all loci based on read coverage. It's designed to be a generic pipeline for metazoa. As everything is in an Ensembl database the results can be browsed, and ad-hoc reports generated.

For some reason I didn't get emailed about the activity on my post so I thought no-one was interested! Looks like people have been thinking about it...

whsqwghlm - how did you get on with the mapping? Did the parameters work?

**whsqwghlm** · 02-25-2010, 05:49 AM

Yes! We ended up using;
bowtie -n 0 -l 15 -e 99999 -k 200 --best --chunkmbs 128

We then post-processed the alignments to take the one with the longest 5' exact match (could not find a way to get bowtie to do this natively). The preparation of our library helped - it had been poly-A filled, and the 3' primer was terminated with a poly-T chain. We did not bother to poly-A trim the reads (i.e. remove the primer) as we did not want to lose any 'real' As of the end of sequences.

I'm still generating comparisons with other bowtie configs, and I also need to test the pipeline against a GEO data set with 'normal' primers.

**staylor** · 02-25-2010, 07:15 AM

Ah excellent. I will try that. Thanks for the tip!

**bioinfosm** · 02-25-2010, 01:55 PM

Originally posted by whsqwghlm View Post

In terms of post-processing, We're loading the alignments into an Ensembl database so that we can screen for known genes and repeats. We then predict novel small RNAs, and estimate transcript counts for all loci based on read coverage. It's designed to be a generic pipeline for metazoa. As everything is in an Ensembl database the results can be browsed, and ad-hoc reports generated.

Are you using the mirBase for mapping, or the whole human genome?

**whsqwghlm** · 02-28-2010, 01:52 PM

We're aligning against the whole genome. Reads that do not align to the genome are aligned to mirBase (all species) just in case the assembly is incomplete.

**staylor** · 03-01-2010, 06:33 AM

So are you filtering on the one with the smallest NM value with the longest read?

If you get multiple matches and they all score equally do you pick one at random?

Topics	Statistics	Last Post
Study Reveals How Bacteria Defend Against Viral Attacks by seqadmin Started by seqadmin, 08-27-2024, 04:40 AM	0 responses 16 views 0 likes	Last Post by seqadmin 08-27-2024, 04:40 AM
New Single-Molecule Sequencing Platform Introduces Advanced Features for High-Throughput Genomics by seqadmin Started by seqadmin, 08-22-2024, 05:00 AM	0 responses 293 views 0 likes	Last Post by seqadmin 08-22-2024, 05:00 AM
New DNA Code Discovered Revealing Complex Gene Regulation Mechanisms by seqadmin Started by seqadmin, 08-21-2024, 10:49 AM	0 responses 135 views 0 likes	Last Post by seqadmin 08-21-2024, 10:49 AM
Epigenetic Clocks Derived from Retroelements Offer New Insights into Aging by seqadmin Started by seqadmin, 08-19-2024, 05:12 AM	0 responses 124 views 0 likes	Last Post by seqadmin 08-19-2024, 05:12 AM

Seqanswers Leaderboard Ad

Announcement

miRNA mapping using BOWTIE

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News