Hello Everyone,
I'm new to seqanswers, this field, and samtools.
My google fu is failing me and I have what I think will be an easy question.
I started with a reference seq, and some reads. I aligned those reads to the reference and created a contig of the longest continuous overlaps. I used this position information to cut the reference seq down to the length and coverage of this contig.
So now I have a portion of the reference sequence that is covered by the most reads. I go back and realign the reads to this contig and call snps using samtools(1.1) and bcftools(2.2.2). In the end I get my .vcf with a bunch of info. I am interested in the depth of coverage, I see the DP field but, here is the question, I don't know how to determine the total number of reads that successfully aligned. Some will have to miss since this is only a portion of the reference sequence so I can't just do math based on total number of reads I started with. I would like to know the depth in terms of percent of reads aligned.
It seems like cufflinks may hold the answer to this question however I am using DNA rather than RNA and am unsure if this functionality will work and if it will be reliable.
Any help would be greatly appreciated. I have read the documentation, its likely my lack of lingo is keeping me from seeing the answer when I read it.
Thanks in advance,
Mike
I'm new to seqanswers, this field, and samtools.
My google fu is failing me and I have what I think will be an easy question.
I started with a reference seq, and some reads. I aligned those reads to the reference and created a contig of the longest continuous overlaps. I used this position information to cut the reference seq down to the length and coverage of this contig.
So now I have a portion of the reference sequence that is covered by the most reads. I go back and realign the reads to this contig and call snps using samtools(1.1) and bcftools(2.2.2). In the end I get my .vcf with a bunch of info. I am interested in the depth of coverage, I see the DP field but, here is the question, I don't know how to determine the total number of reads that successfully aligned. Some will have to miss since this is only a portion of the reference sequence so I can't just do math based on total number of reads I started with. I would like to know the depth in terms of percent of reads aligned.
It seems like cufflinks may hold the answer to this question however I am using DNA rather than RNA and am unsure if this functionality will work and if it will be reliable.
Any help would be greatly appreciated. I have read the documentation, its likely my lack of lingo is keeping me from seeing the answer when I read it.
Thanks in advance,
Mike