Hi!
1) I have a bam-file (sorted, with duplicates removed using samtools). It is important that the reads do not completely cover the reference genome.
2) I also have a vcf-file obtained from this bam-file (using bcftools tools) with options (consensus calling)
3) For the reference genome there is a gtf-file with annotation (from ensembl).
I would like to find out some obvious things:
a) What genes have been sequenced in this case (number and name of genes)?
b) I want to analyze the variants for these genes. (How different are these versions from the reference ones, are there any problems - stop codons, deletions, inversions, etc.).
Tell me, please examples of specific commands, what tools is better to achieve this (bedtools? bedops?)
It is worth noting that my data is not on the human genome, so I want to know the solution in general, and not with the help of specific for humans.
That is, how to connect the variant-calling data for these reads and the gene annotation of the reference genome?
Thanks
1) I have a bam-file (sorted, with duplicates removed using samtools). It is important that the reads do not completely cover the reference genome.
2) I also have a vcf-file obtained from this bam-file (using bcftools tools) with options (consensus calling)
3) For the reference genome there is a gtf-file with annotation (from ensembl).
I would like to find out some obvious things:
a) What genes have been sequenced in this case (number and name of genes)?
b) I want to analyze the variants for these genes. (How different are these versions from the reference ones, are there any problems - stop codons, deletions, inversions, etc.).
Tell me, please examples of specific commands, what tools is better to achieve this (bedtools? bedops?)
It is worth noting that my data is not on the human genome, so I want to know the solution in general, and not with the help of specific for humans.
That is, how to connect the variant-calling data for these reads and the gene annotation of the reference genome?
Thanks