Hi everybody,
I'm working on HTS microrna data ( Illumina GAIIe ) . I open this thread to put all ideas to analyze such data. I think this can help a lot of people.
Don't hesitate to reply your way to analyze mirna data !
I created a wiki page, it's more simple for collaborative editing : microRNA Analysis
I started :
1. Pre-Processing of raw data
1.1 Trim the 3' adapter
Before analyzing the data, the first thing to do is to trimm the 3' adapter. You can find the adapter sequence here or here. In general it's this sequence you have to trimmed : UCGUAUGCCGUCUUCUGCUUGU
For this purpose, you can use :
- trimLRPatterns from R
- BioPerl script
- Use a alignment program (bowtie, soap, ... ) to align the adapter to the 3' part of the read sequence.
- ...
1.2 Filter reads on size
Because microrna length is between 17-22, you can discard all the reads with a length < 15 .
1.3 Filter on quality
You can discard all the reads with a poor quality. I think under 20 (in phred score) , it's not a good quality ( a score of 20 represent 99% of base call acuracy )
1.4 Alignement to a reference genome
Align the reads to a reference genome and discard the reads who don't perfectly matche.
You can use bowtie, soap, maq,...
1.5 Filtering on other RNA species
Other RNA, like snoRNA, tRNA, piRNA,... , are maybe present in the reads. To discard this RNAs ( or to analyze them later ), you can match the reads with the RFam database.
2. Differential Expression Analysis
Before the DE analysis step, you can align the reads on the mirBase database to find known miRNA. mirAnalyzer can do that.
To compare different sample, you've got to normalize them. Some methods exist. For this step, I don't know a lot of methods, so I ask you to complete this list :
- edgeR : a R package to make DE analyze
- DESeq : an another R package
- T-test
- ANOVA
Some lectures :
- A scaling normalization method for differential expression analysis of RNA-seq data, Mark D Robinson and Alicia Oshlack
- Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments, James H Bullard, Elizabeth Purdom, Kasper D Hansen and Sandrine Dudoit
- Differential expression analysis for sequence count data, Simon Anders and Wolfgang Huber
- Normalization strategies for microRNA profiling experiments: a 'normal' way to a hidden layer of complexity?, Meyer SU, Pfaffl MW, Ulbrich SE.
3. Prediction of novel microRNA
After alignement on the mirBase database, some reads may not matching with any miRNA in mirBase. They are maybe unknown miRNA.
Some methods exist to predict novel miRNAs :
- mirDeep
- miTrap
- ...
4. Prediction of miRNA Targets
To predict the targets of the miRNAs, like the differentially expressed per example, some programs exists :
- TargetScan
- miRanda
- PicTar
- RNAHybird
- miTarget
- DIANAmicroT
- ...
General lecture :
- Next Generation Sequencing of miRNAs – Strategies, Resources and Methods, Susanne Motameny, Stefanie Wolters, Peter Nürnberg and Björn Schumacher
Sorry for my english, I'm not very fluent in English writing
I'm aware that is a very little part of the analyzing process of the HTS miRNA data .
Don't hesitate to reply your way to analyze miRNA data !
Nicolas
I'm working on HTS microrna data ( Illumina GAIIe ) . I open this thread to put all ideas to analyze such data. I think this can help a lot of people.
Don't hesitate to reply your way to analyze mirna data !
I created a wiki page, it's more simple for collaborative editing : microRNA Analysis
I started :
1. Pre-Processing of raw data
1.1 Trim the 3' adapter
Before analyzing the data, the first thing to do is to trimm the 3' adapter. You can find the adapter sequence here or here. In general it's this sequence you have to trimmed : UCGUAUGCCGUCUUCUGCUUGU
For this purpose, you can use :
- trimLRPatterns from R
- BioPerl script
- Use a alignment program (bowtie, soap, ... ) to align the adapter to the 3' part of the read sequence.
- ...
1.2 Filter reads on size
Because microrna length is between 17-22, you can discard all the reads with a length < 15 .
1.3 Filter on quality
You can discard all the reads with a poor quality. I think under 20 (in phred score) , it's not a good quality ( a score of 20 represent 99% of base call acuracy )
1.4 Alignement to a reference genome
Align the reads to a reference genome and discard the reads who don't perfectly matche.
You can use bowtie, soap, maq,...
1.5 Filtering on other RNA species
Other RNA, like snoRNA, tRNA, piRNA,... , are maybe present in the reads. To discard this RNAs ( or to analyze them later ), you can match the reads with the RFam database.
2. Differential Expression Analysis
Before the DE analysis step, you can align the reads on the mirBase database to find known miRNA. mirAnalyzer can do that.
To compare different sample, you've got to normalize them. Some methods exist. For this step, I don't know a lot of methods, so I ask you to complete this list :
- edgeR : a R package to make DE analyze
- DESeq : an another R package
- T-test
- ANOVA
Some lectures :
- A scaling normalization method for differential expression analysis of RNA-seq data, Mark D Robinson and Alicia Oshlack
- Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments, James H Bullard, Elizabeth Purdom, Kasper D Hansen and Sandrine Dudoit
- Differential expression analysis for sequence count data, Simon Anders and Wolfgang Huber
- Normalization strategies for microRNA profiling experiments: a 'normal' way to a hidden layer of complexity?, Meyer SU, Pfaffl MW, Ulbrich SE.
3. Prediction of novel microRNA
After alignement on the mirBase database, some reads may not matching with any miRNA in mirBase. They are maybe unknown miRNA.
Some methods exist to predict novel miRNAs :
- mirDeep
- miTrap
- ...
4. Prediction of miRNA Targets
To predict the targets of the miRNAs, like the differentially expressed per example, some programs exists :
- TargetScan
- miRanda
- PicTar
- RNAHybird
- miTarget
- DIANAmicroT
- ...
General lecture :
- Next Generation Sequencing of miRNAs – Strategies, Resources and Methods, Susanne Motameny, Stefanie Wolters, Peter Nürnberg and Björn Schumacher
Sorry for my english, I'm not very fluent in English writing

I'm aware that is a very little part of the analyzing process of the HTS miRNA data .
Don't hesitate to reply your way to analyze miRNA data !
Nicolas
Comment