Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • kjngo
    replied
    Originally posted by bioinfosm View Post
    what would a flag 0 imply in sam output? I used novo align and the only flags I see are 0 4 and 16. 4 is unmapped read. 16 is for the strand, but how to interpret 0.

    Also, How can I ascertain the reads that are *not* uniquely mapped. I read that the 5th column MAPQ should be of help to determine multiply-mapped reads. Is MAPQ=0 an indication that the read is multiply-mapped?

    Thanks
    I have a similar question. All I see are 0, 4, 16, and 20s. I do not understand how to interpret 20. I know the hexadecimal would be 14, which should mean this is a combination of both strand and not mapped. Please correct me if i'm wrong. I see a MAPQ score and also reference hit, so does it still mean it mapped? Thanks

    Leave a comment:


  • aurelielaugraud
    replied
    would you please explain what xt:a:n and xt:a:M means ?
    (I haven't found any xt:a:n yet but some xt:a:m ...
    I understand the xt:a:r and xt:a:u

    Leave a comment:


  • aurelielaugraud
    replied
    thanks a lot for the replies for the replies

    Leave a comment:


  • lh3
    replied
    as I understand the flag meaning for 69, the query is unmapped, but the mate is mapped
    This seems to be an unfixed bug.

    Moreover, I get some lines with MAPQ to 0 and flag to 99/147 with coherent insert size
    I am not sure why you think this is problematic.

    how can one pair have a MAPQ at 37 and the other at 0
    I have answered this question several times in the mailing list. I now put it in FAQ on the bwa homepage.

    Leave a comment:


  • aurelielaugraud
    replied
    Hello again,
    I have another question about this :
    HWI-EAS255_8282_FC30G79_PE:4:1:402:627 97 chr4 49179087 0 75M chr21 9663639 0 GAACTGGATTCAAGCCAGTTAACTACTGGATCAAATCTGATCCTGGGCCCTGTCCCGTTTCTGTCATAACTTCTA BBB?BB>6@B<;<BBBBBB?BCBB6?BB=BB;@<B?BBBB0B6BB=BBBCBB8BBBBB;CBBB;BBB=8;BB?C? XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:4 X1:i:7 XM:i:2 XO:i:0 XG:i:0 MD:Z:46A3G24
    HWI-EAS255_8282_FC30G79_PE:4:1:402:627 145 chr21 9663639 37 75M chr4 49179087 0 TGGAACAGAAAATTGCTCAAGGAAACTCAGAGCTCAAAACACAAATCCATGGAGCTATGAACTCCGAGAGAGAAT @BBB?BB8BBBBBBBBB;;;CCBB?@@AB;B<BBBBBBBBBB8B@9ABBBBBBB=??BB;6'3BBBB;0=BBBBB XT:A:U NM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:20A40A13


    how can one pair have a MAPQ at 37 and the other at 0 ..

    and I have different flag value for other pairs mapped on different chr.
    HWI-EAS255_8282_FC30G79_PE:4:1:24:1302 65 chr12 82786110 0 75M chr4 22777722 0 CACCTATGGGTGAGAATATGTGGTGTTTGGTTTTTTGTTCTTGTGATAGTTTACTGAGAATGATGATTTCCAATT BBBBBBBBBBBBB=BBA>ABBBBBBBBBBBBBCBBBBBBBCBBBBBBBBBBBBBB?+?=BBBBBBBBBBBBBBBB XT:A:R NM:i:1 SM:i:0 AM:i:0 X0:i:71 XM:i:1 XO:i:0 XG:i:0 MD:Z:8A66
    HWI-EAS255_8282_FC30G79_PE:4:1:24:1302 129 chr4 22777722 0 75M chr12 82786110 0 TCCAAAAATGATAGACTGGATTAAGAAACTGTGGCACATATACACCATGGAATACTATGCAGCCATAAAAAATGA BCBBBBBBBBBBBBBBBBBBCCCBBBBBBBBBBBCCBBCCBCBBBBBBB?BBBBBCCBBBBBBBB;?2?BBBBBB XT:A:R NM:i:1 SM:i:0 AM:i:0 X0:i:55 XM:i:1 XO:i:0 XG:i:0 MD:Z:28A46

    thanks
    Aurélie

    Leave a comment:


  • aurelielaugraud
    replied
    Hello, still on the flag meaning ...
    I am trying to understand those (by bwa):

    HWI-EAS255_8282_FC30G79_PE:4:1:1:1979 69 * 0 0 * * 0 0 TAGATGGCAAAATGAACTTCCATAAANNNNNCTGTATGCTCTGGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN BBBBB*2??9<BA@@BBB;BBB8@=;'''''<BCCB6+?-B??;3''''''''''''''''''''''''''''''
    HWI-EAS255_8282_FC30G79_PE:4:1:1:1979 133 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    ^C

    HWI-EAS255_8282_FC30G79_PE:4:1:1:1423 69 * 0 0 * * 0 0 TACACCAAATATATATTATATGCTGTACATAATATATTAAAGTCGACCAAATATATATTATATACTGTACATAAA BBBBBBBBBBBBBBBBBBBBBBCBBBB?BBBBBBBBBCBBBBBBBBBB9;ABBBBBBBB=B=>>>BBBB?;<BB=
    HWI-EAS255_8282_FC30G79_PE:4:1:1:1423 133 * 0 0 * * 0 0 TATACTTTGGGTATTTTGATATTTTATTTACAGTATATAATACATACTTTGGGTACTTCGATATTTTATGTGCAG BBBBCBCBBBBBBBCCBCBBBBCCBBBBBBBBBBBBCCBBCBBBBBBBBBBBBBBBB<*<BBBBBBBBB8BBBBB

    HWI-EAS255_8282_FC30G79_PE:4:1:1:1908 69 * 0 0 * * 0 0 ATATTTTCGTTGGAAAAAAGCTCTTGTACTACAGAACAGATCCAGTTTTACGGATACGTGGCTTGAGAGGTAGAT B909B-6<'.?A;A>;6><24BBCBBB@.?*4+?+<B=22=2?42+*9?-''326B'''')3)6,'66'.3.72?
    HWI-EAS255_8282_FC30G79_PE:4:1:1:1908 133 * 0 0 * * 0 0 AAGGGATCCATCCGACGTAGCGTCCGAAACAGTTAAGATGAGAGCACTTAGAAAACTCATCCAGCGAAGGATACA B92<?+<446'.9B;BB<4?7*4,1'?B6'39'.',<6),94;6,'=9'96=BBB.')3-<@+3),>49',B2*9

    as I understand the flag meaning for 69, the query is unmapped, but the mate is mapped ... ????????

    Moreover, I get some lines with MAPQ to 0 and flag to 99/147 with coherent insert size ... what could it be ??

    Many Thanks

    Aurélie

    Leave a comment:


  • Nix
    replied
    Yes I did try Picard's SAMFileReader but it has a major memory management problem when trying to parse RNA-Seq data that contains alignments to chromosomes and splice junctions. These alignment files contain headers with 2.3 million splice junction 'chromosomes' that cause the SAMFileReader to quickly throw an out of memory error. The SAM format requires one '@SQ' line for each splice junction. Ouch.

    Thus, I had to create a light weight file parser that would skip such header info. I also wanted to get a feel for whether sam could be used as a universal short read alignment format. I'm part of two projects (GenoViz and a $100M NHLBI cardiac project) that are looking to see if SAM could serve that function.

    I should say that now I understand bitwise flags, they are a pretty clever trick for compressing a bunch of boolean flags in a binary file. For SAM spec 2 though, they should be removed from the text format.

    Leave a comment:


  • nilshomer
    replied
    Originally posted by Nix View Post
    OK, 4hrs later and I can parse the strand.....

    Here's some java code so folks don't have to go through this again:

    <pre>
    /**
    * Index Index^2 MethodName Flag DescriptionFromSamSpec
    * 0 1 isReadPartOfAPairedAlignment 0x0001 the read is paired in sequencing, no matter whether it is mapped in a pair
    * 1 2 isReadAProperPairedAlignment 0x0002 the read is mapped in a proper pair (depends on the protocol, normally inferred during alignment) 1
    * 2 4 isQueryUnmapped 0x0004 the query sequence itself is unmapped
    * 3 8 isMateUnMapped 0x0008 the mate is unmapped
    * 4 16 isQueryReverseStrand 0x0010 strand of the query (false for forward; true for reverse strand)
    * 5 32 isMateReverseStrand 0x0020 strand of the mate
    * 6 64 isReadFirstPair 0x0040 the read is the first read in a pair
    * 7 128 isReadSecondPair 0x0080 the read is the second read in a pair
    * 8 256 isAlignmentNotPrimary 0x0100 the alignment is not primary (a read having split hits may have multiple primary alignment records)
    * 9 512 doesReadFailVendorQC 0x0200 the read fails platform/vendor quality checks
    * 10 1024 isReadADuplicate 0x0400 the read is either a PCR duplicate or an optical duplicate
    */
    public boolean testBitwiseFlag(int testValue){
    return ((flags & testValue) == testValue);
    }
    /**The read is paired in sequencing, no matter whether it is mapped in a pair*/
    public boolean isPartOfAPairedAlignment(){
    return testBitwiseFlag(1);
    }
    /**The read is mapped in a proper pair (depends on the protocol, normally inferred during alignment)*/
    public boolean isAProperPairedAlignment(){
    return testBitwiseFlag(2);
    }
    /**The query sequence itself is unmapped*/
    public boolean isUnmapped(){
    return testBitwiseFlag(4);
    }
    /**The mate is unmapped*/
    public boolean isMateUnMapped(){
    return testBitwiseFlag(8);
    }
    /**Strand of the query (false for forward; true for reverse strand)*/
    public boolean isReverseStrand(){
    return testBitwiseFlag(16);
    }
    /**Strand of the mate*/
    public boolean isMateReverseStrand(){
    return testBitwiseFlag(32);
    }
    /**The read is the first read in a pair*/
    public boolean isFirstPair(){
    return testBitwiseFlag(64);
    }
    /**The read is the second read in a pair*/
    public boolean isSecondPair(){
    return testBitwiseFlag(128);
    }
    /**The alignment is not primary (a read having split hits may have multiple primary alignment records)*/
    public boolean isNotAPrimaryAlignment(){
    return testBitwiseFlag(265);
    }
    /**The read fails platform/vendor quality checks*/
    public boolean failedQC(){
    return testBitwiseFlag(512);
    }
    /**The read is either a PCR duplicate or an optical duplicate*/
    public boolean isADuplicate(){
    return testBitwiseFlag(1024);
    }
    </pre>
    Have you tried picard for a Java API? Here's the link to the javadoc.

    Leave a comment:


  • Nix
    replied
    OK, 4hrs later and I can parse the strand.....

    Here's some java code so folks don't have to go through this again:

    <pre>
    /**
    * Index Index^2 MethodName Flag DescriptionFromSamSpec
    * 0 1 isReadPartOfAPairedAlignment 0x0001 the read is paired in sequencing, no matter whether it is mapped in a pair
    * 1 2 isReadAProperPairedAlignment 0x0002 the read is mapped in a proper pair (depends on the protocol, normally inferred during alignment) 1
    * 2 4 isQueryUnmapped 0x0004 the query sequence itself is unmapped
    * 3 8 isMateUnMapped 0x0008 the mate is unmapped
    * 4 16 isQueryReverseStrand 0x0010 strand of the query (false for forward; true for reverse strand)
    * 5 32 isMateReverseStrand 0x0020 strand of the mate
    * 6 64 isReadFirstPair 0x0040 the read is the first read in a pair
    * 7 128 isReadSecondPair 0x0080 the read is the second read in a pair
    * 8 256 isAlignmentNotPrimary 0x0100 the alignment is not primary (a read having split hits may have multiple primary alignment records)
    * 9 512 doesReadFailVendorQC 0x0200 the read fails platform/vendor quality checks
    * 10 1024 isReadADuplicate 0x0400 the read is either a PCR duplicate or an optical duplicate
    */
    public boolean testBitwiseFlag(int testValue){
    return ((flags & testValue) == testValue);
    }
    /**The read is paired in sequencing, no matter whether it is mapped in a pair*/
    public boolean isPartOfAPairedAlignment(){
    return testBitwiseFlag(1);
    }
    /**The read is mapped in a proper pair (depends on the protocol, normally inferred during alignment)*/
    public boolean isAProperPairedAlignment(){
    return testBitwiseFlag(2);
    }
    /**The query sequence itself is unmapped*/
    public boolean isUnmapped(){
    return testBitwiseFlag(4);
    }
    /**The mate is unmapped*/
    public boolean isMateUnMapped(){
    return testBitwiseFlag(8);
    }
    /**Strand of the query (false for forward; true for reverse strand)*/
    public boolean isReverseStrand(){
    return testBitwiseFlag(16);
    }
    /**Strand of the mate*/
    public boolean isMateReverseStrand(){
    return testBitwiseFlag(32);
    }
    /**The read is the first read in a pair*/
    public boolean isFirstPair(){
    return testBitwiseFlag(64);
    }
    /**The read is the second read in a pair*/
    public boolean isSecondPair(){
    return testBitwiseFlag(128);
    }
    /**The alignment is not primary (a read having split hits may have multiple primary alignment records)*/
    public boolean isNotAPrimaryAlignment(){
    return testBitwiseFlag(265);
    }
    /**The read fails platform/vendor quality checks*/
    public boolean failedQC(){
    return testBitwiseFlag(512);
    }
    /**The read is either a PCR duplicate or an optical duplicate*/
    public boolean isADuplicate(){
    return testBitwiseFlag(1024);
    }
    </pre>

    Leave a comment:


  • nilshomer
    replied
    Originally posted by lh3 View Post
    If SAM were designed by one person, it would not be widely accepted. There are pros and cons. As to the bit flag, have you tried "samtools view -X"?
    It is widely accepted because good tools (i.e. lh3's implementation in SAMTools and now Picard) were available early and was used by the first major sequencing efforts (aka 1000 Genomes). I no way mean to lambaste the creators of SAM, only to be honest about its development and implementation.

    Leave a comment:


  • drio
    replied
    I particularly like the TCGA BAM specification for the BAM headers. It uses most of the header types and their associated tags to capture things like: sequencing platform, reference contig details(MD5s, url..), samples, institution, library, etc...

    Leave a comment:


  • lh3
    replied
    If SAM were designed by one person, it would not be widely accepted. There are pros and cons. As to the bit flag, have you tried "samtools view -X"?

    Leave a comment:


  • nilshomer
    replied
    Originally posted by Nix View Post
    Wow, the more I dig into sam the more it looks like a horse put together by a committee. These bit flags are ridiculous for a human readable text alignment format.
    It was put together by a committee. The flag field, in my opinion, is left over from the internal C representation, which was created for compactness (then compressed into SAM). You will encounter the internal format if you program using the SAMTools C API but nonetheless the internal (compact) data structure should not have been shown to the user when viewing SAM (not BAM files). Just wait until you try to get anything clarified/changed/included in the specification.

    Nils

    Leave a comment:


  • Nix
    replied
    Wow, the more I dig into sam the more it looks like a horse put together by a committee. These bit flags are ridiculous for a human readable text alignment format.

    Leave a comment:


  • totalnew
    replied
    Currently in most of sam/bam files, the header is empty. However, this could potentially be a good place to store all the info people are interested. From the SAM format specification, HEADER SECTION has four default record types - @HD, @SQ, @RG, @RG.
    Can we define our own record type, such as @MN.... or it won't be recognized by samtools.

    In addition, each record type has predefined data field, such as VN, SO, GO in @HD. Can we add more data field?

    thanks

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X