Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • identical QNAME in a SAM file

    Hi all
    I got pair-end Miseq data. and looking at the SAM file I got strange results:
    as i understand a QNAME should appear once (if no pair) or twice (in both pairs) but not 3 times? (I bring 3 rows from the data)

    can one explain this results?


    M01015:3:000000000-A30BE:1:1112:17179:12693 99 chrI 173 60 25M = 230 82 CTCCGAACCACCATCCATCCCTCTA AAAAAAABDDDDDEEEGGGGGGIII RG:Z:3 XT:A:U NM:i:0 SM:i:23 AM:i:23 X0:i:1 X1:i:1 XM:i:0 XO:i:0 XG:i:0 MD:Z:25 XA:Z:chrII,+5961,25M,1;

    M01015:3:000000000-A30BE:1:1112:17179:12693 99 chrI 173 60 25M = 230 82 CTCCGAACCACCATCCATCCCTCTA AAAAAAABDDDDDEEEGGGGGGIII RG:Z:3 XT:A:U NM:i:0 SM:i:23 AM:i:23 X0:i:1 X1:i:1 XM:i:0 XO:i:0 XG:i:0 MD:Z:25 XA:Z:chrII,+5961,25M,1;

    M01015:3:000000000-A30BE:1:1112:17179:12693 147 chrI 230 60 25M = 173 82 TTACCCATATCCAACCCACTGCCAC HFCGGGGGGBDDDBDDDB?B????? RG:Z:3 XT:A:U NM:i:0 SM:i:37 AM:i:23 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:25

    The first two are identical and only the third is the paired end read of the previous two.

    Best,
    yishai

  • #2
    It's not unusual for the two ends to have exactly the same name. The flag tells you if you are looking at read 1 or read 2. The 99 means (among other things) read 1, 147 means (among other things) read 2.

    I don't know why read 1 is being reported twice aligning to the exact same place. It looks like it aligns equally well to a place on ChrII, some software might have reported those two sites, but reporting the same site twice is a little odd. It's almost like the software reported both positions, then realizes that the mapping of the other end meant that the ChrI position was the accurate one, and rather than get rid of the wrong position, it just changed it to match the right position, even though another line already had the correct position.

    Reporting the command lines used always helps to troubleshoot problems.
    Last edited by swbarnes2; 03-05-2013, 11:25 AM.

    Comment


    • #3
      thanks, but I still don't understand

      Thank you for your reply and sorry for the time taken to answer.

      I expect that if MAPQ is 60, the sequence will be unique?
      If there are 2 areas in the genome with the same sequence, still they should get a different QNAME.
      I assume this is an error in the sam file I got?

      Comment


      • #4
        Originally posted by yishai View Post
        Thank you for your reply and sorry for the time taken to answer.

        I expect that if MAPQ is 60, the sequence will be unique?
        No, the presence of another chromosome and position in the XA tag suggests otherwise, as far as that sequence goes. However, the mate maps uniquely, and that allowed the software to be confident that read one really belongs on ChrI, not ChrII

        If there are 2 areas in the genome with the same sequence, still they should get a different QNAME.
        The QNAME belongs to the read, not the location in the genome. Your software is doing something a bit odd, but since you don't care to say what the software is, or what command line you used, you are on your own there.

        Comment


        • #5
          Thank you swbarnes2,

          I am new in the field, I got the sam file from somone else, and he got a bam file from the MiSeq, that looks the same. It looked odd, as you say, and I thought that maybe the file is not good. From your answer I understand that you agree to that, I will go back to understand why MiSeq made this kind of a file.
          I will find out what software was used and which command lines.
          Thanks,
          Yishai

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Exploring the Dynamics of the Tumor Microenvironment
            by seqadmin




            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
            07-08-2024, 03:19 PM
          • seqadmin
            Exploring Human Diversity Through Large-Scale Omics
            by seqadmin


            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
            06-25-2024, 06:43 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 06:53 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-10-2024, 07:30 AM
          0 responses
          34 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-03-2024, 09:45 AM
          0 responses
          204 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-03-2024, 08:54 AM
          0 responses
          213 views
          0 likes
          Last Post seqadmin  
          Working...
          X