So, I am using the following version of BWA
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.12-r1039</pre>
I got through a lot threads read answers and so on... still I am confused.
this is the command I am using:
and those are the explanation for every string I included:
What are your opinion on the -c and -r strings?
then I get to see the produced file and it has sequences that are marked with the optional field of XA:Z and SA:Z
XA: Alternative hits; format: (chr,pos,CIGAR,NM* in anny case I have failed to find the XT:A:U optional flag
SA: Z, not sure what it is but, it almost always coincide with the 256 flag = not primary alignment so I believe that this is NOT a unique aligned read.
The question is mostly on XA:. there is no XA:U if it exist it is XA:Z and is followed by 3 or more genomic coordinates, and that makes me believe that this is not a uniquely mapped read.
To wrap it up I remove from the SAM filewith the following:
-F 4 : remove not mapped
-F 256 : remove not primary aligned
-q 1: remove reads with MAPQ quality less than 1 (I thing that this is quite necessary)
QUESTION: Am, I missing something, Do I do something wrong?
here are some reads as an example:
those include both the SA: and the XA optional flag
those got only the XA: optional flag
Any help would be appreciated
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.12-r1039</pre>
I got through a lot threads read answers and so on... still I am confused.
this is the command I am using:
Code:
bwa mem -t 8 -M -k 19 -r 1 -c 1 $ref_genome $1/$base1".trimmed" > $3/$NOEX
and those are the explanation for every string I included:
#-c keep those that have unique alignment (Discard a MEM if it has more than INT occurence in the genome. This is an insensitive parameter. [10000])
#-r by reducing it to 1, I get bwa mem to work things out more intensively, like the --best command for bowtie (Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy)
#-k seed length (I thing that 32 is to big????)
#-M keep it compatible with picard tools
#-t threads to use
#-r by reducing it to 1, I get bwa mem to work things out more intensively, like the --best command for bowtie (Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy)
#-k seed length (I thing that 32 is to big????)
#-M keep it compatible with picard tools
#-t threads to use
then I get to see the produced file and it has sequences that are marked with the optional field of XA:Z and SA:Z
XA: Alternative hits; format: (chr,pos,CIGAR,NM* in anny case I have failed to find the XT:A:U optional flag
SA: Z, not sure what it is but, it almost always coincide with the 256 flag = not primary alignment so I believe that this is NOT a unique aligned read.
The question is mostly on XA:. there is no XA:U if it exist it is XA:Z and is followed by 3 or more genomic coordinates, and that makes me believe that this is not a uniquely mapped read.
To wrap it up I remove from the SAM filewith the following:
Code:
samtools view -h -q 1 -F 4 -F 256 $3/$NOEXT1.sam | grep -v "XA:Z" | grep -v "SA:Z" > $3/$NOEXT1_mem.sam;
-F 256 : remove not primary aligned
-q 1: remove reads with MAPQ quality less than 1 (I thing that this is quite necessary)
QUESTION: Am, I missing something, Do I do something wrong?
here are some reads as an example:
those include both the SA: and the XA optional flag
Code:
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2115:14837:88360 16 chr12 87311997 1 34M17S * 0 0 TCTATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA BBBBBBABB>A<B>B>A?7BBB?A=0BBA?A?BBBBBBBA=BBBBBBBB?B NM:i:1 MD:Z:0A33 AS:i:33 XS:i:31 SA:Z:chr2,161581931,+,32M19S,1,0; XA:Z:chr9,-104599379,51M,5;chr3,+170653467,51M,5; BIOMICS-HISEQHI:522:HCYM5ADXX:2:1102:3525:68438 16 chr12 87311999 0 32M17S * 0 0 TATAAACACCTCTAAGAAAATAAACTAGAAAATCCAGATGAAATGGATA AA==5.DB8BB8=/??*<D99D??9?D<BEBD@FDD????E<E?E<CAC NM:i:0 MD:Z:32 AS:i:32 XS:i:30 SA:Z:chr2,161581931,+,32M17S,0,0; XA:Z:chr9,-104599381,49M,4;chr3,+170653467,49M,4;chr12,+46991828,49M,5; BIOMICS-HISEQHI:522:HCYM5ADXX:2:1209:6904:71888 256 chr17 59076122 0 33M18H * 0 0 TTTTCCCATTCTGTAGATTGTCTGTTTACTCTG IGGIIIIIIIIGA@GGGHFIIIIIFFHEIHIHF NM:i:0 MD:Z:33 AS:i:33 XS:i:32 SA:Z:chr11,22585338,+,18S33M,0,0; XA:Z:chr4,-151801895,19S32M,0;
Code:
BIOMICS-HISEQHI:522:HCYM5ADXX:2:2104:11639:55952 0 chr1 8120580 1 15S31M5S * 0 0 TCTTGGGATGGACATGTGAGCTGAAATGGCGCCATTGCACTCCAGCTTGGG GIIIGCGFGCGHEGFHIICCGIHDD>HB<CHHGAHHIIHHEHHFFFFFEDD NM:i:0 MD:Z:31 AS:i:31 XS:i:29 XA:Z:chr8,+98304845,30M21S,1;chr20,+51943978,23S28M,0;chr16,+70407077,14S37M,2;chr8,+55962122,15S36M,2;chrX,-70173856,26M25S,0; BIOMICS-HISEQHI:522:HCYM5ADXX:2:2105:16548:32340 0 chr1 8120580 6 15S34M * 0 0 CCTTGGGATGGACATGTGAGCTGAAATGGCGCCATTGCACTCCAGCCTG HHIIHIEIBF1?DHGGDDDF<FB?BGIHI>FGI<EHC@CCHEFE>@BCC NM:i:0 MD:Z:34 AS:i:34 XS:i:30 XA:Z:chr8,+98304845,30M19S,0;chr16,+70407077,14S35M,1;chr8,+55962122,15S34M,1; BIOMICS-HISEQHI:522:HCYM5ADXX:2:2106:5931:90491 0 chr1 8120580 6 14S32M * 0 0 CTTGGGATGTACATGTGAGCTGAAATGGCGCCATTGCACTCCAGCC @H4EC381)*1*:?B:*?:0:?BD?G*?8'-<-5C4=;;D.?7A>E NM:i:0 MD:Z:32 AS:i:32 XS:i:28 XA:Z:chr16,+70407077,13S33M,1;chr8,+55962122,14S32M,1; BIOMICS-HISEQHI:522:HCYM5ADXX:2:2108:7261:70911 0 chr1 8120580 0 15S36M * 0 0 TCTTGGGATGGACATGTGAGCTGAAATGGCGCCATTGCACTCCAGTCTGGG :>33@@1(.=?1>99?3.=33.6=3:>)=;533-5=9;;3>9?;?8=?337 NM:i:2 MD:Z:30C3A1 AS:i:30 XS:i:29 XA:Z:chr8,+98304845,30M21S,1;chr1,-8936944,28M23S,0;chr16,+70407077,14S37M,2;chr8,+55962122,15S36M,2;chr10,-95309935,26M25S,0; BIOMICS-HISEQHI:522:HCYM5ADXX:2:2113:12753:98209 0 chr1 8120580 9 15S36M * 0 0 CCTTGGGATGGACATGTGAGCTGAAATGGCGCCATTGCACTCCAGCCTGAG ><6)<2:86-97;3:?>?==<?>?1=?3>75;?3=?;?;=???;?==>7;> NM:i:0 MD:Z:36 AS:i:36 XS:i:30 XA:Z:chr8,+98304845,30M21S,0;chr16,+70407077,14S37M,2