Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • haploid variant calling in SAMtools

    We’re using the standard SAMtools SNP calling protocol (samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf) to look for sequence variants in bacterial genomes.

    I’m really new to this, but it sounds like the protocol is geared towards diploid organisms by default. I hoping someone might know if there’s anything that should be changed to use it on haploid genomes? The –s or –p options in SAMtools mpileup look like they might be useful, but I’ve no feel for how to apply them correctly for this. Any help would be really appreciated!

  • #2
    I use samtools on bacteria all the time, and it's not really a problem. Remember that just because your organism is haploid doesn't mean that you have been giving a clonal culture to analyze.

    Comment


    • #3
      We found that the following workflow works well for variant calling in haploids using samtools. To generate a consensus sequence from the realigned BAM file in FASTQ format, we used:

      samtools mpileup -uf ref.fasta realigned.bam | bcftools view -cg -s sample.txt - | perl vcfutils.pl vcf2fq > consensus.fq

      where sample.txt is a plain text file with the sample name in the first column and the ploidy level in the second column. For example,

      sample_name 1

      The columns should be tab-delimited.

      Comment


      • #4
        Keep in mind that vcfutils vcf2fq will not handle indels. It just creates a small window of lowercase letters around the putative indel.

        Comment


        • #5
          Why not use freebayes? It's designed to support haploids and polyploids (use the --ploidy option).

          Comment


          • #6
            Also, you can use vcfgeno2haplo (https://github.com/ekg/vcflib/blob/m...geno2haplo.cpp) with an arbitrarily large window size to generate a consensus fasta sequence from the output.

            If you input a file with one sample, the recorded consensus sequence will be the ALT in the single VCF record in the output.

            It works for indels as well as SNPs. It does not generate quality estimates.

            Comment


            • #7
              This paper (apologies for self-publicity) might be of interest to you:

              High-throughput microbial population genomics using the Cortex variation assembler. Z Iqbal, I Turner, G McVean, Bioinformatics 2012


              Fast highly specific and sensitive variant discovery and genotyping for microbial genomes by multi-sample de novo assembly, allowing direct genome comparison without using a reference. The paper contains examples with short and long reads, reproducing results from published papers (drug-resistant/susceptible S.aureus, in-host evolution) with a fraction of the time and effort.

              Comment


              • #8
                @Zam, this looks great! I'm looking forward to reading it.

                Comment


                • #9
                  Hi,

                  I've been reading through a number of threads and manuals about this and still don't understand how to identify if SNPs are homozygous or heterozygous when viewing the vcf file. Has anyone seen any issues with using the -s parameter in the bcftools view for setting the ploidy as mentioned above? Not sure if it matters or not, but my samples aren't bacteria, but are haploid males. Thanks!

                  Oh and yes, we are looking into Freebayes too.

                  Comment


                  • #10
                    Hi again. So I decided to do a test a merged sample (merged from three runs of the same sample) and got an error I'm not sure about.

                    Here is the command I used: samtools mpileup –Dsuf ref.fa Sample.bam | bcftools view –bcgv –s Sample.txt - > Sample.raw.bcf

                    Then got this:
                    [mpileup] 1 samples in 1 input file
                    <mpileup> set max per-file depth to 8000
                    <bcf_hdr_subsam> 1 samples in the list but not in BCF.

                    So it's something wrong with the text file I used for the -s parameter in bcftools for ploidy(I think?). The text file seems like it would be pretty straight forward, but I don't know what could cause the issue for the "samples in list but not in BCF" error. Any ideas? Thanks.

                    Comment


                    • #11
                      Hey kjm,
                      Assuming you haven't answered this on your own yet (or assuming someone else is having a similar error and needs the thread to not die at your question), I think I have an answer.

                      If you are piping it the way you are in your post, making the "Sample.txt" file simply a dash, tab, and 1 should cover it. That way it reads your file name as the piped filename and continues as usual. Got rid of my "<bcf_hdr_subsam> 1 samples in the list but not in BCF." error anyway... Hope that helps someone.

                      Sample.txt should read:
                      - 1

                      Comment


                      • #12
                        variant calling in SAMtools

                        Dear ZLounsberry,

                        I saw your answer , but I didn't understand how the text file should be .

                        I am running the following command:
                        samtools mpileup –uf ucsc.hg19.fasta child.bam father.bam mother.bam | bcftools view -bvcgT trioauto -s family.txt - > femvar.vcf &

                        My txt file is:

                        child.bam
                        father.bam
                        mother.bam


                        I get an error:
                        [mpileup] 3 samples in 3 input files
                        <mpileup> Set max per-file depth to 2666
                        <bcf_hdr_subsam> 3 samples in the list but not in BCF.


                        Do you have any suggestion how to fix the problem?
                        Thank you for your help,

                        Comment


                        • #13
                          Hello polpol,
                          Try changing your "family.txt" file to read:

                          - 1

                          instead of:

                          child.bam
                          father.bam
                          mother.bam

                          If that does not work, try:

                          - 3

                          (so a dash, a tab character, and a 1 or, if that doesn't work, maybe a 3). It's a bit off topic for this thread, so feel free to email me (my username AT gmail DOT com) and let me know if you need a hand.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Best Practices for Single-Cell Sequencing Analysis
                            by seqadmin



                            While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                            06-06-2024, 07:15 AM
                          • seqadmin
                            Latest Developments in Precision Medicine
                            by seqadmin



                            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                            Somatic Genomics
                            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                            05-24-2024, 01:16 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 06-14-2024, 07:24 AM
                          0 responses
                          12 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 06-13-2024, 08:58 AM
                          0 responses
                          13 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 06-12-2024, 02:20 PM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 06-07-2024, 06:58 AM
                          0 responses
                          184 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X