Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best way to detect SNV / InDels against reference genome?

    So I've done a bit of homework but I'm still a little confused on the best way to go, hoping someone can point me in the right direction.

    I recently sequenced 11 mutant strains and 1 reference strain of Bacillus (~4MB) by NextSeq Mid 2 x 75. I'm ultimately looking to compare the mutants to the reference to detect SNV/MNV as well as InDels.

    So far, my understanding is to start with a de novo assembly using something like SPADes. Once I have the contigs, I can use OSLay to map these back to a reference strain, either the one I sequenced, or a previously available (master) one, of which there are several. I still have a few questions though:

    1) What application should I be looking at for detecting the mutations? I imagine I'd essentially need a large alignment tool, though one that can take in to account coverage and probability of mutation would be a great feature to have (assuming not all reads give the same SNV, etc).

    2) When detecting mutations, is it better to do so against the reference strain that I sequenced or against a downloaded one and just compare my reference to it as well, ignoring any commonalities?

    3) Are there better programs to use than the two I listed above? Are there any that are specifically built for my purpose, and that aren't CLC Genomics that would cost me $5k (grad student budget here)?

    Much appreciated everyone!

  • #2
    For #2, it depends how far distant the reference sequence is to your strains. It is usually best to do mapping instead of de novo assembly and if appropriate I would do that first. BBMap or Bowtie2 or BWA followed by Samtools or GATK is a good way to call SNVs. BEDtools can be used for coverage maps which will indicate longer deletions. Assembling unmapped reads can indicate longer insertions.

    There are lots of programs out there and the people on the forum may suggest other, and better, ones. But in general you should be able to do your analysis on a "grad student's budget"; e.g., not much cash but lots of time.


    • #3
      Originally posted by camhabib View Post
      I recently sequenced 11 mutant strains and 1 reference strain of Bacillus (~4MB) by NextSeq Mid 2 x 75. I'm ultimately looking to compare the mutants to the reference to detect SNV/MNV as well as InDels.
      I assume that the 11 mutants are offsprings of the "reference strain" which have been generated by some mutagenic treatment. Now they show different phenotype and you want to find the genetic basis for that.

      You should first assemble the genome of the parent. Unfortunately, 75-nt reads are suboptimal for this. Nevertheless, I would recomment to assemble them with spades. That will take about 5 to 10 minutes on a desktop PC, just try it out. If you are lucky the contigs will cover about 90 percent of the whole genome.

      Then you can map the reads of the mutants to the contigs of the parent. Inspect the mapping in a viewer like Tablet. Its always amazing to see how clearly SNP differ from random sequencing errors.

      To identify SNP programatically, you have to compute VCF files from your read mappings. A VCF file is kind of a human readable ASCII table, which lists all the SNP. My favorite to generate VCF files is freebayes.

      If there is a finished genome available for your parent strain or from a very closely related strain, then you should use that genome as recommended by Westerman.
      Last edited by piet; 10-22-2015, 11:50 AM.


      Latest Articles


      • seqadmin
        Understanding Genetic Influence on Infectious Disease
        by seqadmin

        During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

        Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
        09-09-2024, 10:59 AM
      • seqadmin
        Addressing Off-Target Effects in CRISPR Technologies
        by seqadmin

        The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
        08-27-2024, 04:44 AM





      Topics Statistics Last Post
      Started by seqadmin, Today, 06:25 AM
      0 responses
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 01:02 PM
      0 responses
      Last Post seqadmin  
      Started by seqadmin, 09-18-2024, 06:39 AM
      0 responses
      Last Post seqadmin  
      Started by seqadmin, 09-11-2024, 02:44 PM
      0 responses
      Last Post seqadmin  