Hello everyone,
I am extremely new at bioinformatics, genome sequencing and working with the output data, so please excuse any naive questions (I also just leanred working in Linux for samtools/bcftools).
Our lab has recently sequenced the genome of a laboratory strain from which the type strain genome is known. The genome was sequenced using illumina and output was already processed for us using the DRAGEN pipeline.
I have received all output from the sequencing, including .bam and .vcf files. I am starting to figure out what these files are, what kind of information they contain and how to work with them (yes, I am still at this level, sorry
)
Our end goal here is to first of all have a complete consensus sequence of the genome of our lab strain. Secondly, we would like to identify SNPs and identify their position compared to the annotated genome of our reference strain.
I have already been able to use IGV, input the genome of our reference strain and import the vcf file to find the SNPs. I know there are 60 SNPs/indels. Is there some "easy" automated way to get a list of all variations without me having to scroll through the IGV and going over them one by one?
I also tried using bcftools to get a consensus sequence using the a reference .fasta and the .bam file from the sequencing, but I get a sequence that is much smaller than my genome. I followed this guide: http://samtools.github.io/bcftools/h...-sequence.html
Is there an easy basic guide that could first of all explain the file formats, where they come from and how they are connected to eachother? I think understanding this would get me started using samtools/bcftools more easily, since its tutorials assume knowledge about these things. Other nice information sources concerning my problems and goals are always welcome.
I am extremely new at bioinformatics, genome sequencing and working with the output data, so please excuse any naive questions (I also just leanred working in Linux for samtools/bcftools).
Our lab has recently sequenced the genome of a laboratory strain from which the type strain genome is known. The genome was sequenced using illumina and output was already processed for us using the DRAGEN pipeline.
I have received all output from the sequencing, including .bam and .vcf files. I am starting to figure out what these files are, what kind of information they contain and how to work with them (yes, I am still at this level, sorry

Our end goal here is to first of all have a complete consensus sequence of the genome of our lab strain. Secondly, we would like to identify SNPs and identify their position compared to the annotated genome of our reference strain.
I have already been able to use IGV, input the genome of our reference strain and import the vcf file to find the SNPs. I know there are 60 SNPs/indels. Is there some "easy" automated way to get a list of all variations without me having to scroll through the IGV and going over them one by one?
I also tried using bcftools to get a consensus sequence using the a reference .fasta and the .bam file from the sequencing, but I get a sequence that is much smaller than my genome. I followed this guide: http://samtools.github.io/bcftools/h...-sequence.html
Is there an easy basic guide that could first of all explain the file formats, where they come from and how they are connected to eachother? I think understanding this would get me started using samtools/bcftools more easily, since its tutorials assume knowledge about these things. Other nice information sources concerning my problems and goals are always welcome.