Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • In SAM, how can I know a read is mapped ambiguously?

    In .sam file, how can I know if an alignment is ambiguous. (A read is mapped to multiple places)?
    Forgive me if this can be found in SAM's Spec.

    In flag 0x0100 shows the alignment is not primary, which should have a different meaning. Even it is primary, does it mean it is the only mapping or there are other place that the read mapped.

    I haven't found other field that provide the information. It should be there, right?

  • #2
    oops, I think I post in the wrong place

    Comment


    • #3
      no worries, moving it now.

      Comment


      • #4
        In short, what you should look at is the MAPQ column, the mapping quality. So far as I know, it is the only universal way to define the reliability of a generic alignment. For a much longer answer, see here.

        Comment


        • #5
          Thank you very much for the quick reply.This is actually complicated than I thought.

          Comment


          • #6
            I had a SAM file (generated by TopHat) with only four kinds of MAPQ value (0, 1, 3 and 255). It is so strange to me. I am not sure the reason that no mapping with value 2? The spec said 255 is unknown. Is there a way to guess the uniqueness of mappings?

            Comment


            • #7
              Same problem as you sorrychen

              I'm working for a lab and they need to map some reads to the genome, but they are looking at short regions from the gene deserts therefore LOTS of repeats. I have the same problem, I don't seem to find a reliable mapping quality formula, I have tried using MIRA, BWA and just plain blat, they give me pretty much the same alignment but none of them will give me a probabilistically derived understandable mapping quality index. Has anybody seen any papers on deriving mapping quality, I saw the PHRED one but they don't seem to use suboptimal alignments which doesn't make any sense to me.... I mean I can have a perfect match in one chromosome but if I have a suboptimal hit of the same read with one mismatch the read should have a really low mapping score right? Has anybody seen any papers/formulas deriving mapping scores using the 2nd best alignment?

              Comment


              • #8
                Started to search and I think I found the answer, in the MAQ paper they have a nice formula for calculating mapping quality just have to change a few things here and there to make it fit 454 data.

                Comment


                • #9
                  Hi all, I'm new to the forums. Do forgive me if this has been posted, but I couldn't find the answer.

                  The flag 0x100, means the alignment is not primary. What is a primary alignment? I'm finding reads with all its alignments having this flag, and some reads with all but one alignment having this flag. I'm quite confused.

                  Comment


                  • #10
                    Originally posted by Haneko View Post
                    Hi all, I'm new to the forums. Do forgive me if this has been posted, but I couldn't find the answer.

                    The flag 0x100, means the alignment is not primary. What is a primary alignment? I'm finding reads with all its alignments having this flag, and some reads with all but one alignment having this flag. I'm quite confused.
                    A read may have multiple alignments given the sensitivity of the aligner. The primary is typically the first or best alignment (depends on the aligner) although this does not have to be the case. Again, depending on the aligner you may be able to iterate through all the hits if CC and CP are specified.

                    Comment


                    • #11
                      Thanks for the feedback!

                      I found a read with 10 alignments, all having the same alignment length, but having different number of mismatches. If I calculate the score for each alignment, I can find one with the best score. Yet all the alignments have this flag, even the one with the best score. In addition, looking at the original map file, the best scoring alignment is the first to be reported. Does this mean my aligner does not see 'primary alignments' in the same definition?

                      Apparently I need to use this flag to filter reads when trying to reconstruct a GFF file from this SAM file for unique alignments (single alignments+multiple alignments fulfilling unique criteria).

                      Comment


                      • #12
                        Originally posted by Haneko View Post
                        Thanks for the feedback!

                        I found a read with 10 alignments, all having the same alignment length, but having different number of mismatches. If I calculate the score for each alignment, I can find one with the best score. Yet all the alignments have this flag, even the one with the best score. In addition, looking at the original map file, the best scoring alignment is the first to be reported. Does this mean my aligner does not see 'primary alignments' in the same definition?

                        Apparently I need to use this flag to filter reads when trying to reconstruct a GFF file from this SAM file for unique alignments (single alignments+multiple alignments fulfilling unique criteria).
                        I am curious, what aligner are you using? It may be good to give feedback to the aligner's developer(s).

                        Comment


                        • #13
                          Originally posted by lh3 View Post
                          In short, what you should look at is the MAPQ column, the mapping quality. So far as I know, it is the only universal way to define the reliability of a generic alignment. For a much longer answer, see here.
                          Heng Li's page has moved here

                          Comment


                          • #14
                            I am trying to do the same thing: extract unique alignments (single alignments+multiple alignments fulfilling unique criteria).

                            How exactly could I accomplish this by using SAM view -f (or -F)? I searched everywhere I could, but still confused.

                            Help please!!!!!
                            clariet

                            Originally posted by Haneko View Post
                            Thanks for the feedback!

                            I found a read with 10 alignments, all having the same alignment length, but having different number of mismatches. If I calculate the score for each alignment, I can find one with the best score. Yet all the alignments have this flag, even the one with the best score. In addition, looking at the original map file, the best scoring alignment is the first to be reported. Does this mean my aligner does not see 'primary alignments' in the same definition?

                            Apparently I need to use this flag to filter reads when trying to reconstruct a GFF file from this SAM file for unique alignments (single alignments+multiple alignments fulfilling unique criteria).

                            Comment


                            • #15
                              MAPQ values in Tophat output

                              Originally posted by sorrychen View Post
                              I had a SAM file (generated by TopHat) with only four kinds of MAPQ value (0, 1, 3 and 255). It is so strange to me. I am not sure the reason that no mapping with value 2? The spec said 255 is unknown. Is there a way to guess the uniqueness of mappings?

                              Can anyone address the MAPQ output from Tophat(1.4.1), where the accepted_hits.bam file, when viewed in sam format -- the MAPQ only values are 0, 1, 3, 255. I've looked through the whole file, and there are excellent perfect matches on genes, but no values above 3 (which isn't a good MAPQ score from what I can determine). Any suggestions or pointers to where this might be described? I've tried all the manuals etc.

                              Thanks!

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 11:49 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-24-2024, 08:47 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              61 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X