Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • miRDeep2 arf format error

    Hi,

    I'm using miRDeep2 to find new miRs in Mouse genome. I have RNA seq data from Illumina.

    I used mapper.pl to generate .fa and .arf files. Now I'm using miRDeep2.pl to find new miR's. But it shows that the arf format is incorrect.

    Here are the commands I have used:

    Code:
    mapper.pl SampleC.fastq -e -h -p ~/refs/mm9 -s SC.fa -t SC.arf
    Code:
    miRDeep2.pl Sample.fa ~/refs/mm9.fa Sample.arf miRBase_mmu_v20.fa none none -t Mouse
    I get this error when I ran miRDeep2.pl:

    #####################################
    # #
    # miRDeep2.0.0.5 #
    # #
    # last change: 25/06/2012 #
    # #
    #####################################

    miRDeep2 started at 13:57:10


    #Starting miRDeep2
    #Starting miRDeep2
    /opt/sam/miRDeep/2.0.5/miRDeep2.pl SampleD.fa
    /home/abc/refs/mm9.fa SampleD.arf ../miRBase_mmu_v20.fa none none -t Mouse

    miRDeep2 started at 13:57:10


    mkdir mirdeep_runs/run_07_05_2014_t_13_57_10

    Error: Mapping file SampleD.arf is not in arf format

    Each line of the mapping file must consist of the following fields
    readID_wo_whitespaces length start end read_sequence genomicID_wo_whitspaces length start end genomic_sequence strand #mismatches editstring
    The editstring is optional and must not be contained
    The readID must end with _xNumber and is not allowed to contain whitespaces.
    The genomeID is not allowed to contain whitespaces.

    This is the arf file of the sample:

    [abc mirdeep2]$ head Sample.arf
    seq_2 51 1 51 ggcaagggaagaagacttcaggccactgggaaatgcttggttctgttcagt chr4 51 133526049 133526099 ggcaagggaagaagacttcaggccactgggaaatgcttggttctgttcagt + 0 mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
    seq_5 51 1 51 gccattcttgcgaaaatcacggcccaaggtctggatatagttattggagac chr2 51 84460474 84460524 gccattcttgcgaaaatcacggcccaaggtctggatatagttattggagac + 0 mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
    seq_6 51 1 51 ccagcttttaatacacatatgtacatacatatgtgcataacaaattatagc chr12 51 45602556 45602606 ccagcttttaatacacatatgtacatacatatgtgcataacaaattatagc - 0 mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
    seq_7 51 1 51 ttgagcttgaacgctttctttattggtggctgcttttaggcctacaatggt chr10 51 95540393 95540443 ttgagcttgaacgctttctttgttggtggctgcttttaggcctacaatggt + 1 mmmmmmmmmmmmmmmmmmmmmMmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
    seq_8 51 1 51 ggtgtccctgggatactcatagcctcagactggttacagagttggggcttt chr15 51 102043274 102043324 ggtgtccctgggatactcatagcctcagactggttacagagttggggcttt - 0 mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
    seq_9 51 1 51 accattgtcgtccagagctccgtccacaccatagctccatcccttgccaca chr5 51 53673136 53673186 accattgtcgtccagagctccgtccacaccatagctccatcccttgccaca - 0 mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
    seq_11 51 1 51 ctggagtcttggaagcttgactaccctacgttctcctacaatggaccttga chr9 51 78023154 78023204 ctggagtcttggaagcttgactaccctacgttctcctacaatggaccttga + 0 mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
    seq_12 51 1 51 tgtggggattgccctgatgctgacaaagtcagggcaggagtcagaagaaat chr15 51 8585628 8585678 tgtggggattgccctgatgctgacaaagtcagggcaggagtcagaagaaat + 0mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
    seq_13 51 1 51 ctttctttacaatgaccaagttgagaacactgagattagcgtccacaatgc chrX 51 66859072 66859122 ctttctttacaatgaccaagttgagaacactgagattagcgtccacaatgc + 0 mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
    seq_13 51 1 51 ctttctttacaatgaccaagttgagaacactgagattagcgtccacaatgc chr2 51 75029531 75029581 ctttctttacaatgaccaagttgagaacactgagattagcgtccacaatgc + 0 mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
    I'm not sure what the error is. The arf file looks alright to me.
    Does anyone know what the issue is? Any help appreciated. Thanks!

  • #2
    Had the same problem

    I fixed it.

    The problem is that in arf file first field (readID) is seq_2 not seq_x2. So you have to change all readID's adding x before number.

    And when you use mapper.pl try to add -m option to your commandline, it makes correct files.

    Something like this for collapsed.fa :
    >seq_7064753_x43185
    GGCTGGTCCGATGGTAGTGGGTTATCAGAACT

    And vs_genome.arf :

    seq_0_x3041305 22 1 22 tcctgtactgagctgccccgag chr8 22 41518002 41518023 tcctgtactgagctgccccgag - 0 mmmmmmmmmmmmmmmmmmmmmm



    Good luck!
    Last edited by Hell-Panther; 06-09-2014, 03:56 AM.

    Comment


    • #3
      Hi.. thanks for the -m tip. I included it and ran the mapper.pl script again. Now, when I run the mirDeep2.pl I get a new error. This time not about the .arf file, but the reference file.

      Error: miRNA reference this species file ../miRBase/miRBase_mmu_v20.fa has not allowed whitespaces in its first identifier
      Do you know what it means?

      Comment


      • #4
        Probably read names contain whitespaces. Something like: >hsa-let-7a-1 MI0000060
        Simply replace the whitespace with underscore, so the read name lookse like this >hsa-let-7a-1_MI0000060
        If you use linux server, run in commandline
        Code:
        sed -i 's/ /_/g' filename.fa
        this replaces whitespace with underscore.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 02:46 PM
        0 responses
        6 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        23 views
        0 likes
        Last Post seqadmin  
        Working...
        X