Unconfigured Ad

**zee** · 01-15-2009, 08:05 PM

Hi kmcarr,

I have a similar project on the go with A. thaliana where I initially aligned all my reads to the whole genome and then intersect those results with the known location of mirBASE mature and precursor positions. I found it to be easier this way because at a later stage I can look for potentially novel miRNA

I used novoalign (www.novocraft.com) to simultaneously align and strip off the 3' adaptor sequence. Parameters are

novoalign -d genome -f <reads in fastq|prb formt> -s<adaptor sequence> > output

SOAP2 and MAQ may also be used to for this purpose but I found that novoalign offered favourable performance and sensitivity. Bowtie may also do a good job but I havent tried this tool out for this work.

Once I got the alignments I sort up the read alignments by genome sequence and ascending position. I then cross reference these positions by the location of precursor microRNA with a perl script. At this stage I got counts for each mirBASE miRNA from my short reads and I can convert these to reads/million counts.

Contact me privately if you would like more info.

It would be nice if other people doing similar work could share their protocols for this type of bioinformatics analysis. We could all learn something new.

**myrna** · 01-22-2009, 08:20 AM

miRNA alignment

I use a very similar approach, but I first collapse identical reads before aligning (to avoid aligning the same let-7 and other abundant miRNA reads hundreds of thousands of times. You can then count the number of reads in the original file to generate counts. The only problem with this is that you lose the sequence quality information (if you have a need for that).

Ryan

**chris** · 01-29-2009, 06:18 AM

I agree. Collapsing the reads to unique examples is a very useful step as miRNA solexa runs are very over-sampled. e.g. 3M reads can often only represent 200k unique reads.

I tend to remove adaptor tags and quality filter reads before matching to miRBase. This also reduces the search space significantly.

**zee** · 01-29-2009, 06:24 AM

U guys are correct, I forgot to add that after my first analysis I started to do read collapsing.
When I did my mirBase counts, i have an option to factor in the frequency of that tag.
I recently had a look at software for this purpose of counting tags overlapping miRNA. I found ERANGE and still trying to make it work on my genome of interest.
Anybody care to share what they're using? I have a very crude pipeline in perl that will automate the counting and graph miRNA matches.

**chris** · 01-30-2009, 01:39 AM

I have my own perl scripts for handling the raw data and managing searches of the reads against mirBase, etc.

Then I load the data into MySQL for analysis. It allows the easy tracking of the 'abundance' of each read following collapsing of the data.

**demis001** · 04-16-2009, 12:14 PM

I also use my own script to process the result. I usually predict miRNA first and then group as known or Novel at last step. Alignining to mirbase is trivial issue once you know got candidate miRNA.

DD

**andrea_maso** · 07-14-2009, 02:30 PM

Hi all,
I have a question similar to the one posted by kmcarr. We should align miRNA sequences obtained by the Solexa/Illumina platform and we are not interested (now) to discover new miRNA species. Is there a precompiled or assembled short sequence comprising the sequences of all miRNA species (mature and hairpin) that one can use for alignment instead of using all the genome? I am thinking to something like
----seqMir-1....seqMir2....seqmir3.....-----
I think that the alignment algorithm should work faster.

Have some of you thought to such a solution? Should it work? How can I assemble such a sequence in an automatic way?

Thanks.
Andrea

**kmcarr** · 07-15-2009, 05:26 AM

Andrea,

miRBase has what you are looking for:

miRBase

http://microrna.sanger.ac.uk

Go to the Download tab and you will find fasta files with either the hairpin or mature sequences. They also provide GFF files with the genome coordinates of the miRNAs.

Happy mapping.

**andrea_maso** · 07-15-2009, 08:06 AM

Dear kmcarr,
yes I know that mirbase has the sequences and GFF coordinates but they are multifasta sequences format and not a single sequence file (I am thinking to Mapview that requires a unique fasta sequence...).
I will try to use Bowtie and SAM tool to align and view the sequences and I do not know which format they require.
Do you have an idea?

Thanks and bye for now.
Andrea

**David_H** · 07-21-2009, 12:45 AM

I've also used SOAP to get rid of adapters and map reads, but right now I need something to do a fuzzy identification and trimming of adapters on WINDOWS (for teaching purposes). I've finally found a mapper that works on windows (PASS) but it wont cut the adapters.

Any ideas gratefully received

David

**chris** · 07-21-2009, 01:33 AM

Andrea,

If you're using mirBase to search for miRNAs I'd recommend you use the hairpin.fasta file only as many search algorithms cope badly where the search sequence is shorter than the query as is often the case when searching against the mature sequences. You then need to parse the miRNA.dat file to determine whether your hairpin matches align to known mature regions.

All this is simple to do with the data in a database.
Cheers,

Chris

**chris** · 07-21-2009, 01:36 AM

David,

Have you tried cygwin on Windows? The vast majority of code is available for Linux only, so it's probably best to try that avenue rather than look for things available for Windows as you may miss out the best applications.
Cheers,

Chris

**kmcarr** · 07-21-2009, 05:21 AM

David,

The EMBOSS package contains a program called fuzznuc which does what you want, fuzzy identification of nucleotide sequences (http://embossgui.sourceforge.net/dem...l/fuzznuc.html).

EMBOSS is a huge package and primarily supported for unix and unix like environments but there is a native Windows port (ftp://emboss.open-bio.org/pub/EMBOSS/windows/). I have never used the windows port but if it is anything like the unix versions it will require some commitment to get it installed and running properly.

**sgombar** · 07-30-2009, 12:29 PM

Hello,

What are you guys doing for the actual statistical model once you know the abundance of each miRNA in each sample? Are you doing a pooled comparison like sage or are you taking a linear model approach like limma?

If taking the second one what off the shelf programs are you using?

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Today, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Today, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

miRNA aligning/counting

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News