Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mhayes
    Member
    • Aug 2011
    • 11

    Inconsistency with SAM flag output?

    Hi all.

    I'm very confused about the output that I'm getting here.

    Consider the below SAM output. These are the mapping results of a single read pair. You can see that the first read maps to chromosome 6 at 5007, and the second maps to 5149.

    However, the SAM flags suggest that the first read is the *second* in the pair, and that the second read is the *first* in the pair. Also, the first read maps to the forward strand, while the second read maps to the reverse strand.

    -----------------
    k_2_6_11305011 163 chr6 5007 0 75M * 0 217 ATATAACTGCGAGATTAATCTCAGACAATGACACAAAATATAGCGAAGTTGGTAAGTTATTTAGTAAAGCTCATG BBB;CBBC4)7B8B=-BB;B?BB?2*;BB-BBBBBBBB?C-;B-@>AC8=B909BB0@4<8-B;-=B0B@+;C--
    MF:i:18 AM:i:0 SM:i:0 NM:i:2 UQ:i:21 H0:i:0 H1:i:0

    k_2_6_11305011 83 chr6 5149 0 75M * 0 -217 TTTATCTTTCAACAACTTGTGTGTTATATTTTGGAATACAGATACAAAGTTATTATGCTTTCAAAATATTCTTTT ?BB?BBB?BB8BBB0=-=BBBBB?==BB?BBB?B=B?-0?BBB8B--B8BBBBB-C8C=?=BBBB8?BBBCB=8B
    MF:i:18 AM:i:0 SM:i:0 NM:i:0 UQ:i:0 H0:i:4 H1:i:0

    -----------------

    The SAM flags suggest that this pair is 'everted' (i.e. the first strand is reverse, while the second strand is forward). However this is not really the case.

    Am I interpreting this output correctly?
  • swbarnes2
    Senior Member
    • May 2008
    • 910

    #2
    The read at 5007 has a flag of 163, which = 128+32+2+1 = second read, forward direction.
    The read at 5149 has a flag of 83, which = 64+16+2+1 = first read, reverse direction.

    That's a proper pair; the two ends point in towards each other. There's nothing wrong here.

    I'm not sure how you determined that the read at 5007 was read 1; does the sequence from the read1 fastq blat to 5007? Maybe there was a mix up in the order that the files were given to the software that did the alignment or made the .sam, because the only way to know from looking at the sam output alone which was the first read and which was the second is to look at the flags.

    Comment

    • mhayes
      Member
      • Aug 2011
      • 11

      #3
      My assumption was that the second read would be the one mapped to the more distal location.

      Per the output I provided, the "second" read is actually the one that comes first in mapping (at 5007). That's why I'm confused.

      Comment

      • jay2008
        Member
        • Sep 2010
        • 44

        #4
        the flag in sam file is really confusing to me as well.
        what is the meaning of "second read"? does it mean from the second fastq file?
        I am using tophat.

        Comment

        • swbarnes2
          Senior Member
          • May 2008
          • 910

          #5
          Yes, the second read is the second fastq you give to your mapping software, from the reads originating at adaptor 2.

          It's not like the DNA molecules and the adaptor molecules know which end of the DNA is closer to what your reference has arbitrarily designtated the beginning of the DNA sequence. So of course there can't be a correlation between read 1 and the read closer to the beginning of your reference.

          Comment

          • nilshomer
            Nils Homer
            • Nov 2008
            • 1283

            #6
            See http://picard.sourceforge.net/explain-flags.html to help translate flags. Also check out the SAM spec for more explanations.

            Comment

            • jay2008
              Member
              • Sep 2010
              • 44

              #7
              if a pair is mapped into genome as below,
              -------> <-------
              read1 read2
              does it mean the pair is located in + strand?

              otherwise, if a pair is mapped into genome as below,
              -------> <-------
              read2 read1
              does it mean the pair is located in - strand?

              Comment

              • swbarnes2
                Senior Member
                • May 2008
                • 910

                #8
                Originally posted by jay2008 View Post
                if a pair is mapped into genome as below,
                -------> <-------
                read1 read2
                does it mean the pair is located in + strand?

                otherwise, if a pair is mapped into genome as below,
                -------> <-------
                read2 read1
                does it mean the pair is located in - strand?
                This question makes no sense.

                The DNA that went onto the flow cell is double stranded. If your two reads overlapped perfectly, one would be a rev comp of the other, not the reverse.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Pathogen Surveillance with Advanced Genomic Tools
                  by seqadmin




                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                  03-24-2025, 11:48 AM
                • seqadmin
                  New Genomics Tools and Methods Shared at AGBT 2025
                  by seqadmin


                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                  The Headliner
                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                  03-03-2025, 01:39 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-20-2025, 05:03 AM
                0 responses
                49 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-19-2025, 07:27 AM
                0 responses
                57 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-18-2025, 12:50 PM
                0 responses
                50 views
                0 reactions
                Last Post seqadmin  
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                201 views
                0 reactions
                Last Post seqadmin  
                Working...