Hi,
I am quite a newbie in Bioinformatic and maybe asking something stupid.
I am trying to generate a four-column file for assembling overlapped sequencing reads into longer contigs from a sorted bam file. The file needs to contain the following information in this format:
chr#: start position of alignment - stop position of alignment: strand (+/-)
Currently I am trying to awk the bam files to obtain the information, my code is here:
samtools mpileup input.sorted.bam |\
cut -d '\t' -f 1,2 |\
awk -F '\t' 'BEGIN {chr="";start=-1;end=-1} {if(chr!=$1 || int($2)!=end+1) { if(chr!="") {printf("%s:%d-%d\n",chr,start,end);} chr=$1;start=int($2);end=int($2);} else { end=end+1;}} END {if(chr!="") {printf("%s:%d-%d\n",chr,start,end); } }'>output.txt
It can work except with strand info. Actually, it ignores the strand info. If a read from + strand overlapped with a read from - strand, it will form a contig and that's not what I want. I want to assemble contigs in the same strand.
How can I improve my code to take in strand info and make the assembly according to strand?
Please help. Thank you very much.
Dadi Gao
I am quite a newbie in Bioinformatic and maybe asking something stupid.
I am trying to generate a four-column file for assembling overlapped sequencing reads into longer contigs from a sorted bam file. The file needs to contain the following information in this format:
chr#: start position of alignment - stop position of alignment: strand (+/-)
Currently I am trying to awk the bam files to obtain the information, my code is here:
samtools mpileup input.sorted.bam |\
cut -d '\t' -f 1,2 |\
awk -F '\t' 'BEGIN {chr="";start=-1;end=-1} {if(chr!=$1 || int($2)!=end+1) { if(chr!="") {printf("%s:%d-%d\n",chr,start,end);} chr=$1;start=int($2);end=int($2);} else { end=end+1;}} END {if(chr!="") {printf("%s:%d-%d\n",chr,start,end); } }'>output.txt
It can work except with strand info. Actually, it ignores the strand info. If a read from + strand overlapped with a read from - strand, it will form a contig and that's not what I want. I want to assemble contigs in the same strand.
How can I improve my code to take in strand info and make the assembly according to strand?
Please help. Thank you very much.
Dadi Gao
Comment