Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cwn5810
    replied
    Hi,
    I don't understand how to convert aligned reads info. (mine is by bowtie, which format should i use?) into gff format and cannot proceed the downstream scripts.

    And i cannot produce soap2 output, it skips all reads shorter than 28nt..

    Would anyone give some help?

    Thanks very much!


    Franklin
    Last edited by cwn5810; 12-21-2012, 01:07 AM.

    Leave a comment:


  • davehendrix
    replied
    It looks like for some reason, it thinks your repeat regions file is given by the number "16714". Can you paste some of your config file?

    It isn't 100% necessary to filter out repeat regions, but I would strongly recommend it to avoid false positives. You can get the data for this at UCSC for example here:



    or whatever works best for your preferred version of the genome. You may look to filter out simple repeats and transposon-associated repeats. The format for the repeat region file is just a simple tab delimited file of chrom start stop:

    <chrom> <start> <stop>

    so you could map the repeat data from UCSC to such a format with a simple perl script.

    Leave a comment:


  • jay2008
    replied
    I am tring to use miRTRAP. when I set "repeatRegionsFile" to an empty file. I got error :
    could not open 16714.
    is "repeatRegionsFile" necessary? for human genome, how can I get repeatRegionsFile?

    thanks
    Yu

    Leave a comment:


  • davehendrix
    replied
    Originally posted by jay2008 View Post
    there are several tool to predict miRs such as miRDeep, MIReNA. what is the advantages for different miR prediction tools?

    Yu
    There is a new updated version of miRDeep called miRDeep2 that you should try. This is probably the most recent piece of software of this type.

    I will say that miRTRAP takes into account a lot of information. It is necessary for you to align the reads allowing a lot of hits to the genome for each read, because the program takes this information into account in its prediction. Loci with reads that have a lot of hits to other places in the genome (greater than the maxHit parameter) are excluded. Furthermore, loci that are surrounded by such repetitive small RNAs are also filtered out. In my experience, miRDeep has very few false negatives, but some false positives. miRDeep has very few false positives, but some false negatives. Depending on your purposes and your available data, either could work.

    Another drawback is that miRTRAP takes a lot of RAM, and for large genomes it may require you to split it up into chromosomes.

    Leave a comment:


  • jay2008
    replied
    there are several tool to predict miRs such as miRDeep, MIReNA. what is the advantages for different miR prediction tools?

    Yu

    Leave a comment:


  • davehendrix
    replied
    miRTRAP question

    Hi everyone, this is Dave Hendrix. miRTRAP is my software, and I am happy to answer any questions. You can email me directly (my email is in the manuscript as a corresponding author). A description of the steps of miRTRAP is at:



    These instructions have been updated to add more clarity. You can also download a more up-to-date version of the software.

    In general, there should be error messages printed out if things don't work with the program. You can post those messages to this thread for more detail. I will attempt to answer these questions one-by-one.

    1. "readListFile" - the aligned data in gff format (I've changed mine from Soap2 output to gff format)

    The readListFile is a tab separated list of files, with a label and the file name, like this:

    tissue1 tissue1_reads.gff
    tissue2 tissue2_reads.gff
    tissue3 tissue3_reads.gff

    where the reads are a size-selected (around 17-25nt) sequencing data in gff format. The file names require a full path to the file if it is not in the directory that you are running the scripts from.

    3. "repeatRegionsFile" - What's the difference from genomeFile? (With mask?)

    The genome file is the actual fasta file of the genome. Each chromosome/scaffold should be a separate entry of the fasta file. The repeatRegionsFile is a list of the genomic coordinates in the form (chrom start stop) separated by tabs as in:

    Scaffold_1631 1739 1818
    Scaffold_1631 2189 2258
    Scaffold_1631 4125 4178
    Scaffold_1631 4369 4415
    Scaffold_1631 4505 4588


    Please send any other questions my way as I am interested in improving the explanation of the software. Also, in general it doesn't hurt to look at the main perl module miRTRAP.pm and reading through it to become more familiar with how it reads in files and processes them. Best wishes and good luck on your search for microRNAs.

    Dave

    Leave a comment:


  • NicoBxl
    replied
    up

    I've the same question .

    Leave a comment:


  • tinacai
    replied
    hi,Yushan Hsiao
    I would like to ask you if the problem have been solved,Whether the process has been smooth,If you are being studied the mirna,Do you have any good software about mirtron (one format of mirna) to find.In this process, I encountered some difficulties,Hope for your help.thanks very much.

    Chong Chen

    Leave a comment:


  • Yushan Hsiao
    started a topic problem with miRTRAP

    problem with miRTRAP

    I have a simple question with the usage of miRTRAP. I couldn't find others on the internet facing the same problem too. Hope someone could help.

    My recent work is to discover novel miR of mouse by miRTRAP. I've checked the "Usage Table of Contents" on the miRTRAP website, but still got some problem with the input data. From the website, the "config.txt" and reads.txt" should be prepared. And the "config.txt" should include the following input data:

    1. "readListFile" - the aligned data in gff format (I've changed mine from Soap2 output to gff format)
    2. "genomeFile" - the whole mouse genome in fasta format
    3. "repeatRegionsFile" - What's the difference from genomeFile? (With mask?)

    My first trial was to ignore the "repeatRegionsFile", but the output files of command "printReadRegions.pl config.txt" are all 0kb.

    I guess there might be some mistakes in my understanding.
    Could anyone help me?

    Thanks a lot!

Latest Articles

Collapse

  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM
  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
22 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
18 views
0 likes
Last Post seqadmin  
Working...
X