Originally posted by bioinfosm
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
would you please explain what xt:a:n and xt:a:M means ?
(I haven't found any xt:a:n yet but some xt:a:m ...
I understand the xt:a:r and xt:a:u
Leave a comment:
-
as I understand the flag meaning for 69, the query is unmapped, but the mate is mapped
Moreover, I get some lines with MAPQ to 0 and flag to 99/147 with coherent insert size
how can one pair have a MAPQ at 37 and the other at 0
Leave a comment:
-
Hello again,
I have another question about this :
HWI-EAS255_8282_FC30G79_PE:4:1:402:627 97 chr4 49179087 0 75M chr21 9663639 0 GAACTGGATTCAAGCCAGTTAACTACTGGATCAAATCTGATCCTGGGCCCTGTCCCGTTTCTGTCATAACTTCTA BBB?BB>6@B<;<BBBBBB?BCBB6?BB=BB;@<B?BBBB0B6BB=BBBCBB8BBBBB;CBBB;BBB=8;BB?C? XT:A:R NM:i:2 SM:i:0 AM:i:0 X0:i:4 X1:i:7 XM:i:2 XO:i:0 XG:i:0 MD:Z:46A3G24
HWI-EAS255_8282_FC30G79_PE:4:1:402:627 145 chr21 9663639 37 75M chr4 49179087 0 TGGAACAGAAAATTGCTCAAGGAAACTCAGAGCTCAAAACACAAATCCATGGAGCTATGAACTCCGAGAGAGAAT @BBB?BB8BBBBBBBBB;;;CCBB?@@AB;B<BBBBBBBBBB8B@9ABBBBBBB=??BB;6'3BBBB;0=BBBBB XT:A:U NM:i:2 SM:i:37 AM:i:0 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:20A40A13
how can one pair have a MAPQ at 37 and the other at 0 ..
and I have different flag value for other pairs mapped on different chr.
HWI-EAS255_8282_FC30G79_PE:4:1:24:1302 65 chr12 82786110 0 75M chr4 22777722 0 CACCTATGGGTGAGAATATGTGGTGTTTGGTTTTTTGTTCTTGTGATAGTTTACTGAGAATGATGATTTCCAATT BBBBBBBBBBBBB=BBA>ABBBBBBBBBBBBBCBBBBBBBCBBBBBBBBBBBBBB?+?=BBBBBBBBBBBBBBBB XT:A:R NM:i:1 SM:i:0 AM:i:0 X0:i:71 XM:i:1 XO:i:0 XG:i:0 MD:Z:8A66
HWI-EAS255_8282_FC30G79_PE:4:1:24:1302 129 chr4 22777722 0 75M chr12 82786110 0 TCCAAAAATGATAGACTGGATTAAGAAACTGTGGCACATATACACCATGGAATACTATGCAGCCATAAAAAATGA BCBBBBBBBBBBBBBBBBBBCCCBBBBBBBBBBBCCBBCCBCBBBBBBB?BBBBBCCBBBBBBBB;?2?BBBBBB XT:A:R NM:i:1 SM:i:0 AM:i:0 X0:i:55 XM:i:1 XO:i:0 XG:i:0 MD:Z:28A46
thanks
Aurélie
Leave a comment:
-
Hello, still on the flag meaning ...
I am trying to understand those (by bwa):
HWI-EAS255_8282_FC30G79_PE:4:1:1:1979 69 * 0 0 * * 0 0 TAGATGGCAAAATGAACTTCCATAAANNNNNCTGTATGCTCTGGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN BBBBB*2??9<BA@@BBB;BBB8@=;'''''<BCCB6+?-B??;3''''''''''''''''''''''''''''''
HWI-EAS255_8282_FC30G79_PE:4:1:1:1979 133 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
^C
HWI-EAS255_8282_FC30G79_PE:4:1:1:1423 69 * 0 0 * * 0 0 TACACCAAATATATATTATATGCTGTACATAATATATTAAAGTCGACCAAATATATATTATATACTGTACATAAA BBBBBBBBBBBBBBBBBBBBBBCBBBB?BBBBBBBBBCBBBBBBBBBB9;ABBBBBBBB=B=>>>BBBB?;<BB=
HWI-EAS255_8282_FC30G79_PE:4:1:1:1423 133 * 0 0 * * 0 0 TATACTTTGGGTATTTTGATATTTTATTTACAGTATATAATACATACTTTGGGTACTTCGATATTTTATGTGCAG BBBBCBCBBBBBBBCCBCBBBBCCBBBBBBBBBBBBCCBBCBBBBBBBBBBBBBBBB<*<BBBBBBBBB8BBBBB
HWI-EAS255_8282_FC30G79_PE:4:1:1:1908 69 * 0 0 * * 0 0 ATATTTTCGTTGGAAAAAAGCTCTTGTACTACAGAACAGATCCAGTTTTACGGATACGTGGCTTGAGAGGTAGAT B909B-6<'.?A;A>;6><24BBCBBB@.?*4+?+<B=22=2?42+*9?-''326B'''')3)6,'66'.3.72?
HWI-EAS255_8282_FC30G79_PE:4:1:1:1908 133 * 0 0 * * 0 0 AAGGGATCCATCCGACGTAGCGTCCGAAACAGTTAAGATGAGAGCACTTAGAAAACTCATCCAGCGAAGGATACA B92<?+<446'.9B;BB<4?7*4,1'?B6'39'.',<6),94;6,'=9'96=BBB.')3-<@+3),>49',B2*9
as I understand the flag meaning for 69, the query is unmapped, but the mate is mapped ... ????????
Moreover, I get some lines with MAPQ to 0 and flag to 99/147 with coherent insert size ... what could it be ??
Many Thanks
Aurélie
Leave a comment:
-
Yes I did try Picard's SAMFileReader but it has a major memory management problem when trying to parse RNA-Seq data that contains alignments to chromosomes and splice junctions. These alignment files contain headers with 2.3 million splice junction 'chromosomes' that cause the SAMFileReader to quickly throw an out of memory error. The SAM format requires one '@SQ' line for each splice junction. Ouch.
Thus, I had to create a light weight file parser that would skip such header info. I also wanted to get a feel for whether sam could be used as a universal short read alignment format. I'm part of two projects (GenoViz and a $100M NHLBI cardiac project) that are looking to see if SAM could serve that function.
I should say that now I understand bitwise flags, they are a pretty clever trick for compressing a bunch of boolean flags in a binary file. For SAM spec 2 though, they should be removed from the text format.
Leave a comment:
-
Originally posted by Nix View PostOK, 4hrs later and I can parse the strand.....
Here's some java code so folks don't have to go through this again:
<pre>
/**
* Index Index^2 MethodName Flag DescriptionFromSamSpec
* 0 1 isReadPartOfAPairedAlignment 0x0001 the read is paired in sequencing, no matter whether it is mapped in a pair
* 1 2 isReadAProperPairedAlignment 0x0002 the read is mapped in a proper pair (depends on the protocol, normally inferred during alignment) 1
* 2 4 isQueryUnmapped 0x0004 the query sequence itself is unmapped
* 3 8 isMateUnMapped 0x0008 the mate is unmapped
* 4 16 isQueryReverseStrand 0x0010 strand of the query (false for forward; true for reverse strand)
* 5 32 isMateReverseStrand 0x0020 strand of the mate
* 6 64 isReadFirstPair 0x0040 the read is the first read in a pair
* 7 128 isReadSecondPair 0x0080 the read is the second read in a pair
* 8 256 isAlignmentNotPrimary 0x0100 the alignment is not primary (a read having split hits may have multiple primary alignment records)
* 9 512 doesReadFailVendorQC 0x0200 the read fails platform/vendor quality checks
* 10 1024 isReadADuplicate 0x0400 the read is either a PCR duplicate or an optical duplicate
*/
public boolean testBitwiseFlag(int testValue){
return ((flags & testValue) == testValue);
}
/**The read is paired in sequencing, no matter whether it is mapped in a pair*/
public boolean isPartOfAPairedAlignment(){
return testBitwiseFlag(1);
}
/**The read is mapped in a proper pair (depends on the protocol, normally inferred during alignment)*/
public boolean isAProperPairedAlignment(){
return testBitwiseFlag(2);
}
/**The query sequence itself is unmapped*/
public boolean isUnmapped(){
return testBitwiseFlag(4);
}
/**The mate is unmapped*/
public boolean isMateUnMapped(){
return testBitwiseFlag(8);
}
/**Strand of the query (false for forward; true for reverse strand)*/
public boolean isReverseStrand(){
return testBitwiseFlag(16);
}
/**Strand of the mate*/
public boolean isMateReverseStrand(){
return testBitwiseFlag(32);
}
/**The read is the first read in a pair*/
public boolean isFirstPair(){
return testBitwiseFlag(64);
}
/**The read is the second read in a pair*/
public boolean isSecondPair(){
return testBitwiseFlag(128);
}
/**The alignment is not primary (a read having split hits may have multiple primary alignment records)*/
public boolean isNotAPrimaryAlignment(){
return testBitwiseFlag(265);
}
/**The read fails platform/vendor quality checks*/
public boolean failedQC(){
return testBitwiseFlag(512);
}
/**The read is either a PCR duplicate or an optical duplicate*/
public boolean isADuplicate(){
return testBitwiseFlag(1024);
}
</pre>
Leave a comment:
-
OK, 4hrs later and I can parse the strand.....
Here's some java code so folks don't have to go through this again:
<pre>
/**
* Index Index^2 MethodName Flag DescriptionFromSamSpec
* 0 1 isReadPartOfAPairedAlignment 0x0001 the read is paired in sequencing, no matter whether it is mapped in a pair
* 1 2 isReadAProperPairedAlignment 0x0002 the read is mapped in a proper pair (depends on the protocol, normally inferred during alignment) 1
* 2 4 isQueryUnmapped 0x0004 the query sequence itself is unmapped
* 3 8 isMateUnMapped 0x0008 the mate is unmapped
* 4 16 isQueryReverseStrand 0x0010 strand of the query (false for forward; true for reverse strand)
* 5 32 isMateReverseStrand 0x0020 strand of the mate
* 6 64 isReadFirstPair 0x0040 the read is the first read in a pair
* 7 128 isReadSecondPair 0x0080 the read is the second read in a pair
* 8 256 isAlignmentNotPrimary 0x0100 the alignment is not primary (a read having split hits may have multiple primary alignment records)
* 9 512 doesReadFailVendorQC 0x0200 the read fails platform/vendor quality checks
* 10 1024 isReadADuplicate 0x0400 the read is either a PCR duplicate or an optical duplicate
*/
public boolean testBitwiseFlag(int testValue){
return ((flags & testValue) == testValue);
}
/**The read is paired in sequencing, no matter whether it is mapped in a pair*/
public boolean isPartOfAPairedAlignment(){
return testBitwiseFlag(1);
}
/**The read is mapped in a proper pair (depends on the protocol, normally inferred during alignment)*/
public boolean isAProperPairedAlignment(){
return testBitwiseFlag(2);
}
/**The query sequence itself is unmapped*/
public boolean isUnmapped(){
return testBitwiseFlag(4);
}
/**The mate is unmapped*/
public boolean isMateUnMapped(){
return testBitwiseFlag(8);
}
/**Strand of the query (false for forward; true for reverse strand)*/
public boolean isReverseStrand(){
return testBitwiseFlag(16);
}
/**Strand of the mate*/
public boolean isMateReverseStrand(){
return testBitwiseFlag(32);
}
/**The read is the first read in a pair*/
public boolean isFirstPair(){
return testBitwiseFlag(64);
}
/**The read is the second read in a pair*/
public boolean isSecondPair(){
return testBitwiseFlag(128);
}
/**The alignment is not primary (a read having split hits may have multiple primary alignment records)*/
public boolean isNotAPrimaryAlignment(){
return testBitwiseFlag(265);
}
/**The read fails platform/vendor quality checks*/
public boolean failedQC(){
return testBitwiseFlag(512);
}
/**The read is either a PCR duplicate or an optical duplicate*/
public boolean isADuplicate(){
return testBitwiseFlag(1024);
}
</pre>
Leave a comment:
-
Originally posted by lh3 View PostIf SAM were designed by one person, it would not be widely accepted. There are pros and cons. As to the bit flag, have you tried "samtools view -X"?
Leave a comment:
-
I particularly like the TCGA BAM specification for the BAM headers. It uses most of the header types and their associated tags to capture things like: sequencing platform, reference contig details(MD5s, url..), samples, institution, library, etc...
Leave a comment:
-
If SAM were designed by one person, it would not be widely accepted. There are pros and cons. As to the bit flag, have you tried "samtools view -X"?
Leave a comment:
-
Originally posted by Nix View PostWow, the more I dig into sam the more it looks like a horse put together by a committee. These bit flags are ridiculous for a human readable text alignment format.
Nils
Leave a comment:
-
Wow, the more I dig into sam the more it looks like a horse put together by a committee. These bit flags are ridiculous for a human readable text alignment format.
Leave a comment:
-
Currently in most of sam/bam files, the header is empty. However, this could potentially be a good place to store all the info people are interested. From the SAM format specification, HEADER SECTION has four default record types - @HD, @SQ, @RG, @RG.
Can we define our own record type, such as @MN.... or it won't be recognized by samtools.
In addition, each record type has predefined data field, such as VN, SO, GO in @HD. Can we add more data field?
thanks
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...-
Channel: Articles
Today, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
37 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
41 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
35 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Leave a comment: