Seqanswers Leaderboard Ad

**tinacai** · 09-27-2010, 05:37 PM

hi,Yushan Hsiao
I would like to ask you if the problem have been solved,Whether the process has been smooth,If you are being studied the mirna,Do you have any good software about mirtron (one format of mirna) to find.In this process, I encountered some difficulties,Hope for your help.thanks very much.

Chong Chen

**NicoBxl** · 10-03-2010, 11:20 PM

up

I've the same question .

**davehendrix** · 03-19-2011, 11:18 AM

miRTRAP question

Hi everyone, this is Dave Hendrix. miRTRAP is my software, and I am happy to answer any questions. You can email me directly (my email is in the manuscript as a corresponding author). A description of the steps of miRTRAP is at:

http://flybuzz.berkeley.edu/miRTRAP.html

These instructions have been updated to add more clarity. You can also download a more up-to-date version of the software.

In general, there should be error messages printed out if things don't work with the program. You can post those messages to this thread for more detail. I will attempt to answer these questions one-by-one.

1. "readListFile" - the aligned data in gff format (I've changed mine from Soap2 output to gff format)

The readListFile is a tab separated list of files, with a label and the file name, like this:

tissue1 tissue1_reads.gff
tissue2 tissue2_reads.gff
tissue3 tissue3_reads.gff

where the reads are a size-selected (around 17-25nt) sequencing data in gff format. The file names require a full path to the file if it is not in the directory that you are running the scripts from.

3. "repeatRegionsFile" - What's the difference from genomeFile? (With mask?)

The genome file is the actual fasta file of the genome. Each chromosome/scaffold should be a separate entry of the fasta file. The repeatRegionsFile is a list of the genomic coordinates in the form (chrom start stop) separated by tabs as in:

Scaffold_1631 1739 1818
Scaffold_1631 2189 2258
Scaffold_1631 4125 4178
Scaffold_1631 4369 4415
Scaffold_1631 4505 4588

Please send any other questions my way as I am interested in improving the explanation of the software. Also, in general it doesn't hurt to look at the main perl module miRTRAP.pm and reading through it to become more familiar with how it reads in files and processes them. Best wishes and good luck on your search for microRNAs.

Dave

**jay2008** · 06-23-2012, 03:45 AM

there are several tool to predict miRs such as miRDeep, MIReNA. what is the advantages for different miR prediction tools?

Yu

**davehendrix** · 06-24-2012, 10:27 AM

Originally posted by jay2008 View Post

there are several tool to predict miRs such as miRDeep, MIReNA. what is the advantages for different miR prediction tools?

Yu

There is a new updated version of miRDeep called miRDeep2 that you should try. This is probably the most recent piece of software of this type.

I will say that miRTRAP takes into account a lot of information. It is necessary for you to align the reads allowing a lot of hits to the genome for each read, because the program takes this information into account in its prediction. Loci with reads that have a lot of hits to other places in the genome (greater than the maxHit parameter) are excluded. Furthermore, loci that are surrounded by such repetitive small RNAs are also filtered out. In my experience, miRDeep has very few false negatives, but some false positives. miRDeep has very few false positives, but some false negatives. Depending on your purposes and your available data, either could work.

Another drawback is that miRTRAP takes a lot of RAM, and for large genomes it may require you to split it up into chromosomes.

**jay2008** · 09-28-2012, 04:32 PM

I am tring to use miRTRAP. when I set "repeatRegionsFile" to an empty file. I got error :
could not open 16714.
is "repeatRegionsFile" necessary? for human genome, how can I get repeatRegionsFile?

thanks
Yu

**davehendrix** · 10-03-2012, 10:27 AM

It looks like for some reason, it thinks your repeat regions file is given by the number "16714". Can you paste some of your config file?

It isn't 100% necessary to filter out repeat regions, but I would strongly recommend it to avoid false positives. You can get the data for this at UCSC for example here:

Index of /goldenPath/hg19/database

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/

or whatever works best for your preferred version of the genome. You may look to filter out simple repeats and transposon-associated repeats. The format for the repeat region file is just a simple tab delimited file of chrom start stop:

<chrom> <start> <stop>

so you could map the repeat data from UCSC to such a format with a simple perl script.

**cwn5810** · 12-21-2012, 01:05 AM

Hi,
I don't understand how to convert aligned reads info. (mine is by bowtie, which format should i use?) into gff format and cannot proceed the downstream scripts.

And i cannot produce soap2 output, it skips all reads shorter than 28nt..

Would anyone give some help?

Thanks very much!

Franklin

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

problem with miRTRAP

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News