Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove multiple QNAME's?

    I'm trying to use a software called "strelka" to look for somatic mutation in 2 .bam files.
    Strelka will raise error if there are multiple QNAME's in one .bam file.

    How can I remove those extra entries with the same QNAME (with samtools)?

    Thanks.

  • #2
    SAM/BAM allows multiple lines with the same QNAME for several reasons, the most common being paired end reads, and to record alternative mapping positions. Is either of those the problem?

    Comment


    • #3
      Originally posted by maubp View Post
      SAM/BAM allows multiple lines with the same QNAME for several reasons, the most common being paired end reads, and to record alternative mapping positions. Is either of those the problem?
      Yes, alternative mapping position seems to be a problem for a 3rd-party software (Strelka). Although, after looking through ~12 bam files, only one of them seems to have it.
      To make that application proceed, I comverted bam to sam, grep'ed out of that QNAME, and then converted the sam back to bam.

      Comment


      • #4
        Originally posted by lethalfang View Post
        To make that application proceed, I comverted bam to sam, grep'ed out of that QNAME, and then converted the sam back to bam.
        Next time try this:

        Code:
        samtools view -bh -F 256 -o <output.bam> <input.bam>
        The '-F 256' flag tells samtools to not include any alignments that are not primary. Thus if a read has multiple alignments reported in the bam file this will keep the one, primary alignment and remove all others. Your method removed all alignments for the read.

        Comment


        • #5
          Originally posted by kmcarr View Post
          Next time try this:

          Code:
          samtools view -bh -F 256 -o <output.bam> <input.bam>
          The '-F 256' flag tells samtools to not include any alignments that are not primary. Thus if a read has multiple alignments reported in the bam file this will keep the one, primary alignment and remove all others. Your method removed all alignments for the read.
          Thanks. Definitely a much much better way.

          Comment


          • #6
            Note not all mapping tools follow this (setting the FLAG for secondary alignments), although it is the documented 'best practice' in the SAM/BAM file format so most should.

            Comment


            • #7
              Originally posted by kmcarr View Post
              Next time try this:

              Code:
              samtools view -bh -F 256 -o <output.bam> <input.bam>
              The '-F 256' flag tells samtools to not include any alignments that are not primary. Thus if a read has multiple alignments reported in the bam file this will keep the one, primary alignment and remove all others. Your method removed all alignments for the read.
              I gave that flag a try..... well, it didn't quite work.

              The problem QNAMEs are the following. Wondering if you can "diagnose" if there's anything unusual with them? Thanks.


              Code:
              98_377_61	163	chr2	234255031	56	35M	=	234255094	137	TTTATTCCATGCTTACAGCTAAGGAAAGGTGAGTG	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJEED	XC:Z:AAA	RG:Z:Library2_2	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@946?..>@8@@:@@6@?@@N@/@@@@A@@@6-@@	CS:Z:T00033020131320311232302020020112211
              98_377_61	147	chr2	234255061	56	35M	=	234254897	-198	GAGTGAGCCTCTTGAATGTGGCCCTGATTTGTCCT	HJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library2_2	NH:i:1	CM:i:0	NM:i:0	CQ:Z:?@<@?@F?@@N@?@@@@/@@;@@2@@@?@;6=6@?	CS:Z:T32021100321200301113021022203221122
              98_377_61	83	chr2	234255094	56	75M	=	234255031	-137	CTATGCTCTGGGACCTTCTCCTCCAGCACAAAACCCTCTTTGAGTCTTTGCACATATCCCAAGCTCTCCTGCCGG	5>JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library2_2	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@<@@;@<@@@@@@@;@/@@@@>@@@?<8=@68>@8@@@@@;?<@2/N6@628N/2@N=?-<@=?0=/2@//	CS:Z:T203031202222320100233311131002212210022200100011132102202220201200122231332

              Comment


              • #8
                Do you have the original read pair pre-mapping for 98_377_61 (your example)? Which tool did you use for the mapping (and what options)?

                Comment


                • #9
                  The bam is aligned and generated by LifeScope 2.5.1 from the Solid 5500's XSQ color space file.

                  Comment


                  • #10
                    Anyone has ways to remove multiple conflicting alignments in the bam file?
                    Because LifeScope 2.5.1 is giving them to me again, and I had to manually remove them, which isn't too efficient.

                    The conflicting entries look like this:
                    Code:
                    21_1887_1875	99	chr3	63594795	56	75M	=	63594973	212	CTGTAGGACTTCATTGTCACTGTAGTCTTTTCGTTCTTGTGGTTGGTGGGTTCATCAGCTTGGGCAAAAGGACTG	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJG	XC:Z:AAA	RG:Z:Library16_16	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@/N@@@@@@@@@@@@@@@@@@=@@?@>=?@@@??6==@<@;@8=>	CS:Z:T221132021202130112112113212200023102201110101011001021321232010031000202121
                    21_1887_1875	163	chr3	63594816	56	35M	=	63594891	149	GTAGTCTTTTCGTTCTTGTGGTTGGTGGGTTCATC	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ?>	XC:Z:AAA	RG:Z:Library15_15	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@/@	CS:Z:T11321220002310220111010101100102132
                    21_1887_1875	83	chr3	63594891	56	75M	=	63594816	-149	GATCCCAAGGAAGTATTGCTACTAATGTCAGCTTGCAAACAGCCATCCAACAAATGCATGGCTATCCACAGCCCT	CGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library15_15	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@>@/@@@@@@>@@?@;<<@@<@>?@;@<;@=@2<?><@@@@?<6<@??6<	CS:Z:T320032111023323013131300110102310321100131023212113032132310331202020100232
                    21_1887_1875	147	chr3	63594973	56	35M	=	63594795	-212	CAAGCTTAAACACTTCTGTGGTAAGTATTTTTCAT	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library16_16	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@/@@@@@@@@@@@@@@@@@@@8@>@@@@@@	CS:Z:T33120000331203101112202111003023201
                    
                    
                    293_1017_155	163	chr3	132207147	56	35M	=	132207228	155	GCTGACCTTTGACCCTATCCTTGTTGAGAAGGTTG	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library15_15	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@	CS:Z:T13212102001210023320201101222020101
                    293_1017_155	99	chr3	132207203	46	2H72M1H	=	132207335	166	TGCAAGATAACCCACAGTTACCCCGCCTTTATCTGAGTGGAGTATTTTTCTTTATGTTGATGTACACGGGTT	-DEJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHH=8FHJJJJH?66/0044	XC:Z:AAA	RG:Z:Library16_16	NH:i:1	CM:i:3	NM:i:3	CQ:Z:;22/?62/=;@@;;<@8?@N@=@>@@;/@N@=@?2==@@N@@@@N@6<<>6@/@/@6/@/@68@@@@@@;/@/<<	CS:Z:T210131022330100111210310003302003322122110221330000220033110123113111320100
                    293_1017_155	83	chr3	132207228	56	75M	=	132207147	-155	CCTTTATCTGAGTGGAGTATTTTTCTTTATCATGATGTACACAGGTTCCAATGTGCTTCCTGTTGCTCGGTAAGA	JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ	XC:Z:AAA	RG:Z:Library15_15	NH:i:1	CM:i:0	NM:i:0	CQ:Z:@@@@@@@@@@@@<@@@@@?@@@@@@@@@@@@@@<@@@@@@@?@8@@@@@@@@@@@@?@@@@<@@@@@@@6/<?>@	CS:Z:T022031032231011202023111301020102111131132131233002200003312201122122330020
                    Thanks.
                    Last edited by lethalfang; 11-22-2012, 03:30 PM.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Addressing Off-Target Effects in CRISPR Technologies
                      by seqadmin






                      The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                      08-27-2024, 04:44 AM
                    • seqadmin
                      Selecting and Optimizing mRNA Library Preparations
                      by seqadmin



                      Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
                      08-07-2024, 12:11 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 08-27-2024, 04:40 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 08-22-2024, 05:00 AM
                    0 responses
                    293 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 08-21-2024, 10:49 AM
                    0 responses
                    135 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 08-19-2024, 05:12 AM
                    0 responses
                    124 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X