Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best way to detect SNV / InDels against reference genome?

    So I've done a bit of homework but I'm still a little confused on the best way to go, hoping someone can point me in the right direction.

    I recently sequenced 11 mutant strains and 1 reference strain of Bacillus (~4MB) by NextSeq Mid 2 x 75. I'm ultimately looking to compare the mutants to the reference to detect SNV/MNV as well as InDels.

    So far, my understanding is to start with a de novo assembly using something like SPADes. Once I have the contigs, I can use OSLay to map these back to a reference strain, either the one I sequenced, or a previously available (master) one, of which there are several. I still have a few questions though:

    1) What application should I be looking at for detecting the mutations? I imagine I'd essentially need a large alignment tool, though one that can take in to account coverage and probability of mutation would be a great feature to have (assuming not all reads give the same SNV, etc).

    2) When detecting mutations, is it better to do so against the reference strain that I sequenced or against a downloaded one and just compare my reference to it as well, ignoring any commonalities?

    3) Are there better programs to use than the two I listed above? Are there any that are specifically built for my purpose, and that aren't CLC Genomics that would cost me $5k (grad student budget here)?

    Much appreciated everyone!

  • #2
    For #2, it depends how far distant the reference sequence is to your strains. It is usually best to do mapping instead of de novo assembly and if appropriate I would do that first. BBMap or Bowtie2 or BWA followed by Samtools or GATK is a good way to call SNVs. BEDtools can be used for coverage maps which will indicate longer deletions. Assembling unmapped reads can indicate longer insertions.

    There are lots of programs out there and the people on the forum may suggest other, and better, ones. But in general you should be able to do your analysis on a "grad student's budget"; e.g., not much cash but lots of time.

    Comment


    • #3
      Originally posted by camhabib View Post
      I recently sequenced 11 mutant strains and 1 reference strain of Bacillus (~4MB) by NextSeq Mid 2 x 75. I'm ultimately looking to compare the mutants to the reference to detect SNV/MNV as well as InDels.
      I assume that the 11 mutants are offsprings of the "reference strain" which have been generated by some mutagenic treatment. Now they show different phenotype and you want to find the genetic basis for that.

      You should first assemble the genome of the parent. Unfortunately, 75-nt reads are suboptimal for this. Nevertheless, I would recomment to assemble them with spades. That will take about 5 to 10 minutes on a desktop PC, just try it out. If you are lucky the contigs will cover about 90 percent of the whole genome.

      Then you can map the reads of the mutants to the contigs of the parent. Inspect the mapping in a viewer like Tablet. Its always amazing to see how clearly SNP differ from random sequencing errors.

      To identify SNP programatically, you have to compute VCF files from your read mappings. A VCF file is kind of a human readable ASCII table, which lists all the SNP. My favorite to generate VCF files is freebayes.

      If there is a finished genome available for your parent strain or from a very closely related strain, then you should use that genome as recommended by Westerman.
      Last edited by piet; 10-22-2015, 11:50 AM.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Choosing Between NGS and qPCR
        by seqadmin



        Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
        10-18-2024, 07:11 AM
      • seqadmin
        Non-Coding RNA Research and Technologies
        by seqadmin




        Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

        Nobel Prize for MicroRNA Discovery
        This week,...
        10-07-2024, 08:07 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 11-01-2024, 06:09 AM
      0 responses
      20 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 10-30-2024, 05:31 AM
      0 responses
      20 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 10-24-2024, 06:58 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 10-23-2024, 08:43 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X