Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with miRTRAP

    I have a simple question with the usage of miRTRAP. I couldn't find others on the internet facing the same problem too. Hope someone could help.

    My recent work is to discover novel miR of mouse by miRTRAP. I've checked the "Usage Table of Contents" on the miRTRAP website, but still got some problem with the input data. From the website, the "config.txt" and reads.txt" should be prepared. And the "config.txt" should include the following input data:

    1. "readListFile" - the aligned data in gff format (I've changed mine from Soap2 output to gff format)
    2. "genomeFile" - the whole mouse genome in fasta format
    3. "repeatRegionsFile" - What's the difference from genomeFile? (With mask?)

    My first trial was to ignore the "repeatRegionsFile", but the output files of command "printReadRegions.pl config.txt" are all 0kb.

    I guess there might be some mistakes in my understanding.
    Could anyone help me?

    Thanks a lot!

  • #2
    hi,Yushan Hsiao
    I would like to ask you if the problem have been solved,Whether the process has been smooth,If you are being studied the mirna,Do you have any good software about mirtron (one format of mirna) to find.In this process, I encountered some difficulties,Hope for your help.thanks very much.

    Chong Chen

    Comment


    • #3
      up

      I've the same question .

      Comment


      • #4
        miRTRAP question

        Hi everyone, this is Dave Hendrix. miRTRAP is my software, and I am happy to answer any questions. You can email me directly (my email is in the manuscript as a corresponding author). A description of the steps of miRTRAP is at:



        These instructions have been updated to add more clarity. You can also download a more up-to-date version of the software.

        In general, there should be error messages printed out if things don't work with the program. You can post those messages to this thread for more detail. I will attempt to answer these questions one-by-one.

        1. "readListFile" - the aligned data in gff format (I've changed mine from Soap2 output to gff format)

        The readListFile is a tab separated list of files, with a label and the file name, like this:

        tissue1 tissue1_reads.gff
        tissue2 tissue2_reads.gff
        tissue3 tissue3_reads.gff

        where the reads are a size-selected (around 17-25nt) sequencing data in gff format. The file names require a full path to the file if it is not in the directory that you are running the scripts from.

        3. "repeatRegionsFile" - What's the difference from genomeFile? (With mask?)

        The genome file is the actual fasta file of the genome. Each chromosome/scaffold should be a separate entry of the fasta file. The repeatRegionsFile is a list of the genomic coordinates in the form (chrom start stop) separated by tabs as in:

        Scaffold_1631 1739 1818
        Scaffold_1631 2189 2258
        Scaffold_1631 4125 4178
        Scaffold_1631 4369 4415
        Scaffold_1631 4505 4588


        Please send any other questions my way as I am interested in improving the explanation of the software. Also, in general it doesn't hurt to look at the main perl module miRTRAP.pm and reading through it to become more familiar with how it reads in files and processes them. Best wishes and good luck on your search for microRNAs.

        Dave

        Comment


        • #5
          there are several tool to predict miRs such as miRDeep, MIReNA. what is the advantages for different miR prediction tools?

          Yu

          Comment


          • #6
            Originally posted by jay2008 View Post
            there are several tool to predict miRs such as miRDeep, MIReNA. what is the advantages for different miR prediction tools?

            Yu
            There is a new updated version of miRDeep called miRDeep2 that you should try. This is probably the most recent piece of software of this type.

            I will say that miRTRAP takes into account a lot of information. It is necessary for you to align the reads allowing a lot of hits to the genome for each read, because the program takes this information into account in its prediction. Loci with reads that have a lot of hits to other places in the genome (greater than the maxHit parameter) are excluded. Furthermore, loci that are surrounded by such repetitive small RNAs are also filtered out. In my experience, miRDeep has very few false negatives, but some false positives. miRDeep has very few false positives, but some false negatives. Depending on your purposes and your available data, either could work.

            Another drawback is that miRTRAP takes a lot of RAM, and for large genomes it may require you to split it up into chromosomes.

            Comment


            • #7
              I am tring to use miRTRAP. when I set "repeatRegionsFile" to an empty file. I got error :
              could not open 16714.
              is "repeatRegionsFile" necessary? for human genome, how can I get repeatRegionsFile?

              thanks
              Yu

              Comment


              • #8
                It looks like for some reason, it thinks your repeat regions file is given by the number "16714". Can you paste some of your config file?

                It isn't 100% necessary to filter out repeat regions, but I would strongly recommend it to avoid false positives. You can get the data for this at UCSC for example here:



                or whatever works best for your preferred version of the genome. You may look to filter out simple repeats and transposon-associated repeats. The format for the repeat region file is just a simple tab delimited file of chrom start stop:

                <chrom> <start> <stop>

                so you could map the repeat data from UCSC to such a format with a simple perl script.

                Comment


                • #9
                  Hi,
                  I don't understand how to convert aligned reads info. (mine is by bowtie, which format should i use?) into gff format and cannot proceed the downstream scripts.

                  And i cannot produce soap2 output, it skips all reads shorter than 28nt..

                  Would anyone give some help?

                  Thanks very much!


                  Franklin
                  Last edited by cwn5810; 12-21-2012, 01:07 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 10:49 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-25-2024, 11:49 AM
                  0 responses
                  23 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  62 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X