Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Michael.James.Clark
    Senior Member
    • Apr 2009
    • 207

    BWA, mostly unmapped reads

    Hi all,
    Hoping someone may know exactly what I did wrong off the bat here since I think we have a lot of BWA gurus here. This is my first time using BWA. Previously I've used Novoalign on the same exome-seq data to great success, aligning the majority of the reads. So I was surprised after running BWA on it that less than 1% of the data mapped and most of it was unmapped.

    The data in question are single lanes of HiSeq human exome-seq data.

    I indexed the reference genome:
    Code:
    bwa index -a bwtsw human_g1k_v37.fasta
    That created (in the same folder):
    Code:
    human_g1k_v37.fasta.amb
    human_g1k_v37.fasta.ann
    human_g1k_v37.fasta.pac
    human_g1k_v37.fasta.rpac
    (I also indexed for colorspace in the same directory since I have SOLiD data I need to align in a few days.)

    Then I ran BWA as follows:
    Code:
    $bwa aln -t 8 $ref $f1 > $out.aln_sa1.sai
    $bwa aln -t 8 $ref $f2 > $out.aln_sa2.sai
    $bwa sampe -r "$rg" $ref $out.aln_sa1.sai $out.aln_sa2.sai $f1 $f2 > $out.sam
    There were no errors while it ran except it mapped almost nothing.

    Can anyone see a glaring problems in my commands here that would lead to tons of unmapped reads? Any help appreciated!
    Last edited by Michael.James.Clark; 03-02-2011, 12:45 PM.
    Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
    Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
    Projects: U87MG whole genome sequence [Website] [Paper]
  • Jon_Keats
    Senior Member
    • Mar 2010
    • 279

    #2
    Looks like some of the index files are missing. This is an example of what I have in my bwa index directory.

    Code:
    -rw-r--r-- 1 jkeats domainuser 3142044949 Feb  3 16:23 hg18.fasta
    -rw-r--r-- 1 jkeats domainuser       6152 Feb  3 16:23 hg18.fasta.amb
    -rw-r--r-- 1 jkeats domainuser        946 Feb  3 16:23 hg18.fasta.ann
    -rw-r--r-- 1 jkeats domainuser 1155163564 Feb  3 16:23 hg18.fasta.bwt
    -rw-r--r-- 1 jkeats domainuser  770109014 Feb  3 16:23 hg18.fasta.pac
    -rw-r--r-- 1 jkeats domainuser 1155163564 Feb  3 16:23 hg18.fasta.rbwt
    -rw-r--r-- 1 jkeats domainuser  770109014 Feb  3 16:23 hg18.fasta.rpac
    -rw-r--r-- 1 jkeats domainuser  385054532 Feb  3 16:23 hg18.fasta.rsa
    -rw-r--r-- 1 jkeats domainuser  385054532 Feb  3 16:23 hg18.fasta.sa

    Comment

    • Michael.James.Clark
      Senior Member
      • Apr 2009
      • 207

      #3
      Thanks Jon!

      I think I do have those. Here's my whole (top secret) reference folder:
      Code:
      lrwxrwxrwx 1 mjclark rpm   56 Jan 18 22:02 human_g1k_v37.dict -> ../GATK/human_g1k_v37.dict
      lrwxrwxrwx 1 mjclark rpm   57 Jan 18 22:02 human_g1k_v37.fasta -> ../GATK/human_g1k_v37.fasta
      -rw-r--r-- 1 mjclark rpm 6.5K Feb 28 20:15 human_g1k_v37.fasta.amb
      -rw-r--r-- 1 mjclark rpm 6.7K Feb 28 20:15 human_g1k_v37.fasta.ann
      -rw-r--r-- 1 mjclark rpm 1.1G Feb 28 21:04 human_g1k_v37.fasta.bwt
      lrwxrwxrwx 1 mjclark rpm   61 Jan 18 22:02 human_g1k_v37.fasta.fai -> ../GATK/human_g1k_v37.fasta.fai
      -rw-r--r-- 1 mjclark rpm 6.5K Feb 28 20:14 human_g1k_v37.fasta.nt.amb
      -rw-r--r-- 1 mjclark rpm 6.7K Feb 28 20:14 human_g1k_v37.fasta.nt.ann
      -rw-r--r-- 1 mjclark rpm 740M Feb 28 20:14 human_g1k_v37.fasta.nt.pac
      -rw-r--r-- 1 mjclark rpm 740M Feb 28 20:15 human_g1k_v37.fasta.pac
      -rw-r--r-- 1 mjclark rpm 1.1G Feb 28 21:05 human_g1k_v37.fasta.rbwt
      -rw-r--r-- 1 mjclark rpm 740M Feb 28 20:15 human_g1k_v37.fasta.rpac
      -rw-r--r-- 1 mjclark rpm 370M Feb 28 21:22 human_g1k_v37.fasta.rsa
      -rw-r--r-- 1 mjclark rpm 370M Feb 28 21:13 human_g1k_v37.fasta.sa
      -rwxr--r-- 1 mjclark rpm 6.1G Oct 15 16:35 human_g1k_v37.nix
      Maybe it's that I did the colorspace indexing in the same folder. I'll try re-doing it seperate from one-another in lieu of another idea.
      Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
      Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
      Projects: U87MG whole genome sequence [Website] [Paper]

      Comment

      • Michael.James.Clark
        Senior Member
        • Apr 2009
        • 207

        #4
        Pretty straightforward answer, but I'll post it anyway in case anyone else encounters this in the future (not that there are that many people out there dealing with both Illumina and SOLiD, but here you go).

        It was indeed the indexes. When indexing the first time, I indexed normal and colorspace in the same folder, colorspace second, using default output. It seems some of the resulting indexes, therefore, overwrite. Of course the colorspace indexes don't work with Illumina data.

        Second time around, I indexed them with different names (in different folders, actually), and now things are aligning beautifully.
        Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
        Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
        Projects: U87MG whole genome sequence [Website] [Paper]

        Comment

        • swbarnes2
          Senior Member
          • May 2008
          • 910

          #5
          Thanks for posting the answer on the thread.

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          12 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          23 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          28 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          22 views
          0 reactions
          Last Post SEQadmin2  
          Working...