Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • identical QNAME in a SAM file

    Hi all
    I got pair-end Miseq data. and looking at the SAM file I got strange results:
    as i understand a QNAME should appear once (if no pair) or twice (in both pairs) but not 3 times? (I bring 3 rows from the data)

    can one explain this results?


    M01015:3:000000000-A30BE:1:1112:17179:12693 99 chrI 173 60 25M = 230 82 CTCCGAACCACCATCCATCCCTCTA AAAAAAABDDDDDEEEGGGGGGIII RG:Z:3 XT:A:U NM:i:0 SM:i:23 AM:i:23 X0:i:1 X1:i:1 XM:i:0 XO:i:0 XG:i:0 MD:Z:25 XA:Z:chrII,+5961,25M,1;

    M01015:3:000000000-A30BE:1:1112:17179:12693 99 chrI 173 60 25M = 230 82 CTCCGAACCACCATCCATCCCTCTA AAAAAAABDDDDDEEEGGGGGGIII RG:Z:3 XT:A:U NM:i:0 SM:i:23 AM:i:23 X0:i:1 X1:i:1 XM:i:0 XO:i:0 XG:i:0 MD:Z:25 XA:Z:chrII,+5961,25M,1;

    M01015:3:000000000-A30BE:1:1112:17179:12693 147 chrI 230 60 25M = 173 82 TTACCCATATCCAACCCACTGCCAC HFCGGGGGGBDDDBDDDB?B????? RG:Z:3 XT:A:U NM:i:0 SM:i:37 AM:i:23 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:25

    The first two are identical and only the third is the paired end read of the previous two.

    Best,
    yishai

  • #2
    It's not unusual for the two ends to have exactly the same name. The flag tells you if you are looking at read 1 or read 2. The 99 means (among other things) read 1, 147 means (among other things) read 2.

    I don't know why read 1 is being reported twice aligning to the exact same place. It looks like it aligns equally well to a place on ChrII, some software might have reported those two sites, but reporting the same site twice is a little odd. It's almost like the software reported both positions, then realizes that the mapping of the other end meant that the ChrI position was the accurate one, and rather than get rid of the wrong position, it just changed it to match the right position, even though another line already had the correct position.

    Reporting the command lines used always helps to troubleshoot problems.
    Last edited by swbarnes2; 03-05-2013, 11:25 AM.

    Comment


    • #3
      thanks, but I still don't understand

      Thank you for your reply and sorry for the time taken to answer.

      I expect that if MAPQ is 60, the sequence will be unique?
      If there are 2 areas in the genome with the same sequence, still they should get a different QNAME.
      I assume this is an error in the sam file I got?

      Comment


      • #4
        Originally posted by yishai View Post
        Thank you for your reply and sorry for the time taken to answer.

        I expect that if MAPQ is 60, the sequence will be unique?
        No, the presence of another chromosome and position in the XA tag suggests otherwise, as far as that sequence goes. However, the mate maps uniquely, and that allowed the software to be confident that read one really belongs on ChrI, not ChrII

        If there are 2 areas in the genome with the same sequence, still they should get a different QNAME.
        The QNAME belongs to the read, not the location in the genome. Your software is doing something a bit odd, but since you don't care to say what the software is, or what command line you used, you are on your own there.

        Comment


        • #5
          Thank you swbarnes2,

          I am new in the field, I got the sam file from somone else, and he got a bam file from the MiSeq, that looks the same. It looked odd, as you say, and I thought that maybe the file is not good. From your answer I understand that you agree to that, I will go back to understand why MiSeq made this kind of a file.
          I will find out what software was used and which command lines.
          Thanks,
          Yishai

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Advances in Sequencing Analysis Tools
            by seqadmin


            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
            05-06-2024, 07:48 AM
          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 05-14-2024, 07:03 AM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-10-2024, 06:35 AM
          0 responses
          44 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-09-2024, 02:46 PM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 05-07-2024, 06:57 AM
          0 responses
          42 views
          0 likes
          Last Post seqadmin  
          Working...
          X