Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mapping quality scores in samtools view?

    Hello!

    I am using samtools view to map short reads (-phred33) against a reference genome.
    The problem is that I don't know how to properly filter the mappings based on their quality since I'm unsure of how to choose proper value for --min-MQ parameter (that is, the minimum mapping quality).

    The reads I intend to map belong to an organism of the same genus but a different species than the organism to which the reference genome belongs. In both species, the rates of heterozygosity and hemizygosity are relatively high, so I believe I need some flexibility when filtering the mappings. I would need to find a quality value that is not too low but also not so restrictive as to discard a large number of valid mappings.

    I'm not sure if 20 or 30 would be a good --min-MQ value for this purpose...
    May I ask for some help?

    Thanks a lot.​
    Last edited by ampsevilla; 05-18-2023, 03:23 AM.

  • #2
    Hello again ampsevilla,

    I'm a little rusty at samtools but I'll do my best to give my opinion. Your particular case sounds tricky because you have to balance between retaining enough valid mappings and filtering out potential false positives.

    I think a value of 20 or 30 for --min-MQ could be a reasonable starting point for your analysis, but it's always a good idea to assess the impact of different thresholds on your specific data. I would start by evaluating the distribution of mapping qualities. Use a tool like samtools view or other utilities to extract the mapping qualities from your alignment file. Plot a histogram of these values to understand their distribution. This should give you an idea of the quality range and how many reads fall into different bins.

    You also mentioned not wanting to be too restrictive and discard a large number of valid mappings. I would evaluate how changing the --min-MQ value affects the number of retained mappings. You can gradually increase the threshold and observe how the number of retained mappings changes. But keep an eye on the point where the number of valid mappings significantly drops, and assess if that value is acceptable for your analysis.

    Another thing you should do is take into account the downstream analysis you plan to perform. Are you looking for highly confident mappings for variant calling or more exploratory analysis? The requirements of your downstream analysis will likely influence the threshold you choose.

    Lastly, I would try to validate against any known data. If you have access to a set of validated mappings you can use it to evaluate the performance of different --min-MQ values. Then you can assess the sensitivity and specificity of different thresholds to identify the one that balances between retaining true positives and filtering false positives.

    Again, I'm not an expert at this type of analysis so it wouldn't hurt to also ask some individuals that have done this particular type of work before.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    14 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    19 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    43 views
    0 likes
    Last Post seqadmin  
    Working...
    X