Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove multiple QNAME's?

    I'm trying to use a software called "strelka" to look for somatic mutation in 2 .bam files.
    Strelka will raise error if there are multiple QNAME's in one .bam file.

    How can I remove those extra entries with the same QNAME (with samtools)?

    Thanks.

  • #2
    SAM/BAM allows multiple lines with the same QNAME for several reasons, the most common being paired end reads, and to record alternative mapping positions. Is either of those the problem?

    Comment


    • #3
      Originally posted by maubp View Post
      SAM/BAM allows multiple lines with the same QNAME for several reasons, the most common being paired end reads, and to record alternative mapping positions. Is either of those the problem?
      Yes, alternative mapping position seems to be a problem for a 3rd-party software (Strelka). Although, after looking through ~12 bam files, only one of them seems to have it.
      To make that application proceed, I comverted bam to sam, grep'ed out of that QNAME, and then converted the sam back to bam.

      Comment


      • #4
        Originally posted by lethalfang View Post
        To make that application proceed, I comverted bam to sam, grep'ed out of that QNAME, and then converted the sam back to bam.
        Next time try this:

        Code:
        samtools view -bh -F 256 -o <output.bam> <input.bam>
        The '-F 256' flag tells samtools to not include any alignments that are not primary. Thus if a read has multiple alignments reported in the bam file this will keep the one, primary alignment and remove all others. Your method removed all alignments for the read.

        Comment


        • #5
          Originally posted by kmcarr View Post
          Next time try this:

          Code:
          samtools view -bh -F 256 -o <output.bam> <input.bam>
          The '-F 256' flag tells samtools to not include any alignments that are not primary. Thus if a read has multiple alignments reported in the bam file this will keep the one, primary alignment and remove all others. Your method removed all alignments for the read.
          Thanks. Definitely a much much better way.

          Comment


          • #6
            Note not all mapping tools follow this (setting the FLAG for secondary alignments), although it is the documented 'best practice' in the SAM/BAM file format so most should.

            Comment


            • #7
              Originally posted by kmcarr View Post
              Next time try this:

              Code:
              samtools view -bh -F 256 -o <output.bam> <input.bam>
              The '-F 256' flag tells samtools to not include any alignments that are not primary. Thus if a read has multiple alignments reported in the bam file this will keep the one, primary alignment and remove all others. Your method removed all alignments for the read.
              I gave that flag a try..... well, it didn't quite work.

              The problem QNAMEs are the following. Wondering if you can "diagnose" if there's anything unusual with them? Thanks.


              Code:
              98_377_61	163	chr2	234255031	56	35M	=	234255094	137	TTTATTCCATGCTTACAGCTAAGGAAAGGTGAGTG	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJEED	XC:Z:AAA	RG:Z:Library2_2	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@946?..>@8@@:@@6@?@@N@/@@@@A@@@6-@@	CS:Z:T00033020131320311232302020020112211
              98_377_61	147	chr2	234255061	56	35M	=	234254897	-198	GAGTGAGCCTCTTGAATGTGGCCCTGATTTGTCCT	HJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library2_2	NH:i:1	CM:i:0	NM:i:0	CQ:Z:?@<@?@F?@@N@?@@@@/@@;@@2@@@?@;6=6@?	CS:Z:T32021100321200301113021022203221122
              98_377_61	83	chr2	234255094	56	75M	=	234255031	-137	CTATGCTCTGGGACCTTCTCCTCCAGCACAAAACCCTCTTTGAGTCTTTGCACATATCCCAAGCTCTCCTGCCGG	5>JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library2_2	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@<@@;@<@@@@@@@;@/@@@@>@@@?<8=@68>@8@@@@@;?<@2/N6@628N/2@N=?-<@=?0=/2@//	CS:Z:T203031202222320100233311131002212210022200100011132102202220201200122231332

              Comment


              • #8
                Do you have the original read pair pre-mapping for 98_377_61 (your example)? Which tool did you use for the mapping (and what options)?

                Comment


                • #9
                  The bam is aligned and generated by LifeScope 2.5.1 from the Solid 5500's XSQ color space file.

                  Comment


                  • #10
                    Anyone has ways to remove multiple conflicting alignments in the bam file?
                    Because LifeScope 2.5.1 is giving them to me again, and I had to manually remove them, which isn't too efficient.

                    The conflicting entries look like this:
                    Code:
                    21_1887_1875	99	chr3	63594795	56	75M	=	63594973	212	CTGTAGGACTTCATTGTCACTGTAGTCTTTTCGTTCTTGTGGTTGGTGGGTTCATCAGCTTGGGCAAAAGGACTG	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJG	XC:Z:AAA	RG:Z:Library16_16	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@/N@@@@@@@@@@@@@@@@@@=@@?@>=?@@@??6==@<@;@8=>	CS:Z:T221132021202130112112113212200023102201110101011001021321232010031000202121
                    21_1887_1875	163	chr3	63594816	56	35M	=	63594891	149	GTAGTCTTTTCGTTCTTGTGGTTGGTGGGTTCATC	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?>	XC:Z:AAA	RG:Z:Library15_15	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@/@	CS:Z:T11321220002310220111010101100102132
                    21_1887_1875	83	chr3	63594891	56	75M	=	63594816	-149	GATCCCAAGGAAGTATTGCTACTAATGTCAGCTTGCAAACAGCCATCCAACAAATGCATGGCTATCCACAGCCCT	CGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library15_15	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@>@/@@@@@@>@@?@;<<@@<@>?@;@<;@=@2<?><@@@@?<6<@??6<	CS:Z:T320032111023323013131300110102310321100131023212113032132310331202020100232
                    21_1887_1875	147	chr3	63594973	56	35M	=	63594795	-212	CAAGCTTAAACACTTCTGTGGTAAGTATTTTTCAT	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library16_16	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@/@@@@@@@@@@@@@@@@@@@8@>@@@@@@	CS:Z:T33120000331203101112202111003023201
                    
                    
                    293_1017_155	163	chr3	132207147	56	35M	=	132207228	155	GCTGACCTTTGACCCTATCCTTGTTGAGAAGGTTG	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library15_15	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@	CS:Z:T13212102001210023320201101222020101
                    293_1017_155	99	chr3	132207203	46	2H72M1H	=	132207335	166	TGCAAGATAACCCACAGTTACCCCGCCTTTATCTGAGTGGAGTATTTTTCTTTATGTTGATGTACACGGGTT	-DEJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHH=8FHJJJJH?66/0044	XC:Z:AAA	RG:Z:Library16_16	NH:i:1	CM:i:3	NM:i:3	CQ:Z:;22/?62/=;@@;;<@8?@N@=@>@@;/@N@=@?2==@@N@@@@N@6<<>6@/@/@6/@/@68@@@@@@;/@/<<	CS:Z:T210131022330100111210310003302003322122110221330000220033110123113111320100
                    293_1017_155	83	chr3	132207228	56	75M	=	132207147	-155	CCTTTATCTGAGTGGAGTATTTTTCTTTATCATGATGTACACAGGTTCCAATGTGCTTCCTGTTGCTCGGTAAGA	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library15_15	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@@@@@@@<@@@@@?@@@@@@@@@@@@@@<@@@@@@@?@8@@@@@@@@@@@@?@@@@<@@@@@@@6/<?>@	CS:Z:T022031032231011202023111301020102111131132131233002200003312201122122330020
                    Thanks.
                    Last edited by lethalfang; 11-22-2012, 03:30 PM.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 12:08 PM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    17 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    14 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    43 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X