Header Leaderboard Ad

Collapse

BFAST- match quick question

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST- match quick question

    Hello all,

    I have 30 million RNA-Seq data (illumina), 75bp in length and wish to map it to the chicken genome. I did make the necessary index files (10), using the instruction in "7.1" and started the "bfast match" process to search the index. The problem i see is that the read out, which says...for the first index...
    Reads processed: 1
    Cleaning.....
    Complete.....
    Found 1 match.
    My question is what is this 1 match? Did it find only 1 match out of 30 million reads or just some kind of output. The reason i ask i just don't want to waste machine time. Kindly help

  • #2
    That is what it means. You should save that output for the posterity .

    Something may have went wrong. Post the full commands and outputs (use gist or pastie to avoid flooding SA).

    Also, for testing, you may want to reduce your initial dataset to just a few hundred of thousands of reads.
    -drd

    Comment


    • #3
      This what the read out looks like:

      /bfast+bwa-0.6.4e$ bfast match -f chicken.fa -n 10 -r 1F_4L_pf.fastq > bfast.matches.1F_4L_pf.bmf
      ************************************************************
      Checking input parameters supplied by the user ...
      Validating fastaFileName chicken.fa.
      Validating readsFileName 1F_4L_pf.fastq.
      Validating tmpDir path ./.
      **** Input arguments look good!
      ************************************************************
      ************************************************************
      Printing Program Parameters:
      programMode: [ExecuteProgram]
      fastaFileName: chicken.fa
      mainIndexes [Auto-recognizing]
      secondaryIndexes [Not Using]
      readsFileName: 1F_4L_pf.fastq
      offsets: [Using All]
      loadAllIndexes: [Not Using]
      compression: [Not Using]
      space: [NT Space]
      startReadNum: 1
      endReadNum: 2147483647
      keySize: [Not Using]
      maxKeyMatches: 8
      maxNumMatches: 384
      whichStrand: [Both Strands]
      numThreads: 10
      queueLength: 250000
      tmpDir: ./
      timing: [Not Using]
      ************************************************************
      Searching for main indexes...
      Found 10 index (10 total files).
      Not using secondary indexes.
      ************************************************************
      Reading in reference genome from chicken.fa.nt.brg.
      In total read 1 contigs for a total of 1100480441 bases
      ************************************************************
      Reading 1F_4L_pf.fastq into a temp file.
      Will process 1 reads.
      ************************************************************
      Searching index file 1/10 (index #1, bin #1)...
      Reading index from chicken.fa.nt.1.1.bif.
      Read index from chicken.fa.nt.1.1.bif.
      Reads processed: 1
      Cleaning up index.
      Searching index file 1/10 (index #1, bin #1) complete...
      Found 1 matches.
      ************************************************************
      Searching index file 2/10 (index #2, bin #1)...
      Reading index from chicken.fa.nt.2.1.bif.
      Read index from chicken.fa.nt.2.1.bif.
      Reads processed: 1
      Cleaning up index.
      Searching index file 2/10 (index #2, bin #1) complete...
      Found 1 matches.
      ************************************************************
      Searching index file 3/10 (index #3, bin #1)...
      Reading index from chicken.fa.nt.3.1.bif.
      Read index from chicken.fa.nt.3.1.bif.
      Reads processed: 0


      Thanks for the help.......

      Comment


      • #4
        Can you post the first few reads that would be great.

        Comment


        • #5
          Notice:

          Code:
          Reads processed: 1
          Against all the indexes.
          Most likely there is something wrong in your fastq file.
          Yes, can you post a few entries from your fastq?
          -drd

          Comment


          • #6
            these are few lines from the fastq file:

            @No name
            AAAGCCACGTGCAACCATCATCAAACCAGTTGGTGGAGATAAGAATGGAGGCAG
            +No name
            [email protected]@@?CCCCCCBCCCCCCCCCCCBCC;[email protected]:<<0<=;[email protected]>[email protected]@[email protected]
            @No name
            TTTCTAAGGTCACGTTAACTGTAAACCAGTTCAATATTGAACTTCCTTTTCAATTTGGTT
            +No name
            [email protected]@[email protected][email protected][email protected]@[email protected]?BBBBBBBBBB
            @No name
            GGCAAATACACCATAGACAAGGTTCAGCCAGAGGATGCAGGAAAATATGAGTGCACATT
            +No name
            [email protected]=:[email protected][email protected][email protected]@@[email protected]
            @No name
            TTTTAGGGGCAGACTCAGAAGAGCTGGATTCTGATGATCTGGATGAAGAGGAGGAGTTTA
            +No name
            [email protected]?CCCCCC?CCAAC>CCCCC>[email protected][email protected][email protected]<?>:?C
            @No name
            AGTCTCACACAACAGTTTGAGGAAAAAGCTGCTTCTTATGACAAACTGGAAAAAACCAAG
            +No name
            CCACCBCCCC><[email protected]><:B>B>>>7>DBB5>B
            @No name
            ACGTACAAATTCAGTATGTGTAAGTGACTTATGCTTCATTAAGGCAAAAGTAGATCATGC
            +No name
            [email protected]?B>[email protected]@@[email protected][email protected]@[email protected]<@BBBB<@AB>@A>>@B>[email protected]>>@@BBB
            @No name
            CCCTAAATGCAGCAACATCAAGCAGATATACTTCACAGATTGCTGCTGTGTATCTTTGTG
            +No name
            [email protected][email protected]@

            Comment


            • #7
              The reads appear odd, they all have the same name. I would imaging BFAST would not like this.

              Comment


              • #8
                Do you really have "No name" as read ids for all the reads? Where is that data coming from? Try to change the reads id on a few reads and make them unique (test_#), then rerun match.
                -drd

                Comment


                • #9
                  The data was from illumina reads from the sequence center. I only know that they are 75 bp long and was sequenced using the GA system (i think). I am not sure what pipeline was used. I will give it a try (change names) but i have used this for other aligners such as bwa, bowtie etc and have not encountered any problems with that.

                  Comment


                  • #10
                    So it turns out its the header @____ made the mistake. So i got the original fastq files with unique names and the BFAST index worked. So thanks for all your help

                    Comment

                    Working...
                    X