Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Does a reliable consensus mean more reliable SNPs?

    Dear All,

    I'm relatively new to WGS analysis so please excuse any naivety on my part.

    Before getting the WGS sequences I have confirmed the presence or absence of certain oligonucleotides in various bacterial DNA samples. So I know I should see these sequences in the final consensus sequence.

    Is it true to think that if I can produce a more reliable consensus sequence then the SNP calls are also likely to be more reliable. I appreciate that there are many SNP quality filters etc that will also be applied that can lead to difference between a consensus and a SNP call, but I just wanted to get an idea of the overall correlation between the consensus and SNPs.

    If there is a high correlation between the two then surely if I make sure that my consensus sequences are as reliable as possible, when I come to calling the SNPs from the same mapped reads they will be more reliable???

    Apologies if I'm totally wrong about this.

    Best wishes lg36

  • #2
    Hi lg36,

    you're not wrong about this at all -- this is in fact a pretty important factor in SNP discovery.

    Your SNPs can only ever be as good as your reference and your mapping. If your reference contains errors, this will propagate right through into your SNP calls, and similarly if you mismap lots of reads you will also increase your false positive SNP rate.

    I routinely map the reads from the individual used to make the reference back to the reference before I do any mapping of other individuals onto that reference for SNP discovery. I then call SNPs on that mapping first, and I always get SNPs here.

    In a homozygous or haploid organism this will give you a list of positions where there reference most likely contains errors -- in an ideal case there should be zero SNPs when I map the reads back onto the reference that was made from the same reads. I don't know what you work with but I am fortunate in that I do a lot of work with cultivated barley which is essentially homozygous and that simplifies matters obviously.

    I then subtract the list of SNPs called there from any list of SNPs generated with reads from a different individual -- it's essentially a way of removing background noise. I guess if you have a heterozygous organism and it's well curated you could probably use a public, curated list of SNPs instead.

    This gives you much cleaner SNP sets and reduces the false positive rate but the caveat is that potentially you may be increasing your false negative rate (I don't have any data on this yet). It all depends on what your SNPs are for - if reliability is key, then this works well. You may also want to remove duplicates from the mapping -- that also reduces your FP rate.

    cheers

    Micha

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 12:08 PM
    0 responses
    11 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    17 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    14 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    43 views
    0 likes
    Last Post seqadmin  
    Working...
    X