Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mediator
    Member
    • Nov 2010
    • 27

    mini tutorial: build reference mouse genome and SNP for GATK

    GATK is a standard tool for calling SNPs however their authors did not provide any reference genomes or reference SNPs for non-human organism, such as mouse. Here is my quick tutorial for building a mm10 reference mouse genome and dbSNP reference SNP from scratch. It's not automated. I appreciate any inputs to make this workflow more efficient.

    1. Build reference mm10 genome.
    1.1 Download reference here:http://ccb.jhu.edu/software/tophat/igenomes.shtml, make sure you are downloading the "Mus musculus UCSC MM10" reference.
    1.2 Untar the file, find the directory which contains the sequence for each individual chromosomes. The directory looks like this "Mus_musculus_UCSC_mm10\Mus_musculus\UCSC\mm10\Sequence\Chromosomes"
    Enter the directory.
    1.3 Change the chromosome header:
    sed -i -- "s/chr//g" #.fa
    1.4 Combine the chromosomes into a full genome:
    cat ch1.fa chr2.fa...chrX.fa chr.Y.fa > mm10.fa #Make sure you are combining the chromosomes in karyotypic order and you are not including random or unmapped chromosomes.
    1.5 index the genome and build dictionary file:
    samtools faidx mm10.fa
    java -jar CreateSequenceDictionary.jar R=mm10.fa O=mm10.dict
    1.6 Create BWA index
    bwa index -a bwtsw mm10.fa

    2. Build reference mouse SNP
    2.1 Download VCF (reference mouse SNP)
    wget ftp://ftp.ncbi.nih.gov/snp/organisms...f_chr_*.vcf.gz
    #Discard un and MT and randome chromosome, then unzip
    #Remove excessive header (delete first 14 rows):
    sed "1,14d" chr2.vcf #do all except chr1
    #merge all vcf
    cat chr1.vcf chr2.vcf... chrX.vcf chrY.vcf > dbsnp.vcf

    Now you can use BWA to align the raw reads first, and then use GATK to call the SNPs.
  • id0
    Senior Member
    • Sep 2012
    • 130

    #2
    I am not sure why you are editing the chromosome names or merging multiple files. iGenomes already comes with a combined genome FASTA file (Sequence/WholeGenomeFasta) that is already indexed.

    Comment

    • mediator
      Member
      • Nov 2010
      • 27

      #3
      Originally posted by id0 View Post
      I am not sure why you are editing the chromosome names or merging multiple files. iGenomes already comes with a combined genome FASTA file (Sequence/WholeGenomeFasta) that is already indexed.
      That genome is not sorted in karyotypic order
      chr10 130694993 7 50 51
      chr11 122082543 133308907 50 51
      chr12 120129022 257833108 50 51
      chr13 120421639 380364718 50 51
      chr14 124902244 503194797 50 51
      chr15 104043685 630595093 50 51
      chr16 98207768 736719659 50 51
      chr17 94987271 836891590 50 51
      chr18 90702639 933778614 50 51
      chr19 61431566 1026295313 50 51
      chr1 195471971 1088955517 50 51
      chr2 182113224 1288336934 50 51
      chr3 160039680 1474092429 50 51
      chr4 156508116 1637332909 50 51
      chr5 151834684 1796971194 50 51
      chr6 149736546 1951842578 50 51
      chr7 145441459 2104573861 50 51
      chr8 129401213 2252924156 50 51
      chr9 124595110 2384913400 50 51
      chrM 16299 2512000419 50 51
      chrX 171031299 2512017050 50 51
      chrY 91744698 2686468981 50 51

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      15 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      34 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      36 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      23 views
      0 reactions
      Last Post SEQadmin2  
      Working...