Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Empty VCF file with bcftools call

    Hello,

    I am trying to produce a vcf file using bcftools call but it produces an empty vcf file containing only the header. In short, here is what I do:

    1. Alignment with BWA
    2. With samtools, make sorted.bam files
    3. With samtools, index the sorted.bam files
    4. Run samtools mpileup in the following way:
    samtools mpileup -C 50 -E -t SP -t DP -u -I –f /genome/refgenome.fa -b bam_list.txt > output.bcf
    5. Run bcftools call:
    bcftools call -v -c output.bcf > output.vcf

    I am using versions 1.3.1 of samtools, bcftools and htslib. I tried reinstalling these programs but it did not change the issue. I also tried with versions 1.2. Same problem. As far as I know, the bcf file seems fine, it contains lots of data and is 20GB.

    I tried producing a basic vcf file using bcftools view: bcftools view output.cf > output.vcf and it works. The vcf file seems completely normal.

    Could anyone help me with this? Why would bcftools call produce an empty output?

    Thanks

  • #2
    For reference cross-posted: https://www.biostars.org/p/189996

    Comment


    • #3
      Thanks, yes I also asked the question on biostars as it may hit more people. If it's not appropriate to cross post, I will remove it from seqanswers.

      Comment


      • #4
        Originally posted by AP38 View Post
        Thanks, yes I also asked the question on biostars as it may hit more people. If it's not appropriate to cross post, I will remove it from seqanswers.
        It is ok to cross-post on SeqAnswers. I included a link to your post on Biostars for reference.

        If you get an answer over @Biostars then please come back and indicate that here.

        Comment


        • #5
          Shouldn't you have the -g option with mpileup to compute genotype likelihoods?
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment


          • #6
            Yes indeed, if one wants to compute genotype likelihoods, the -g option is required. However, this should not help solve the problem and make a difference.

            Comment


            • #7
              Sorry... don't know what I was thinking!

              I tried your commands one a recent project and it also gave a header-only vcf file.

              This minimal pipeline worked fine and produced a normal vcf from the same data:
              mpileup -gu -Q 10 -t DP,DPR -f ref.fasta -b samples.txt | bcftools call -cv - > test.vcf
              (it also worked without the -g!)

              So from that I would conclude it is not a problem with your versions, file list or reference.

              When I generate a bcf file using the minimal pipeline, it reports the reference allele. Your mpileup does not for some reason.
              Last edited by SNPsaurus; 05-05-2016, 10:35 AM.
              Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

              Comment


              • #8
                Thanks! This is very good to know. It is possible that the -C 50 option is causing the issue because it downgrades mapping quality for excessive mismatches. I am working here with libraries of fairly small coverage so I might want to remove that option. I'll try that and stay in touch about the results.

                Comment


                • #9
                  For some reasons, an empty line was added at the bottom of the index genome file. Removing it solved the problem…

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Technologies
                    by seqadmin







                    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                    Long-Read Sequencing
                    Long-read sequencing has...
                    12-02-2024, 01:49 PM
                  • seqadmin
                    Genetic Variation in Immunogenetics and Antibody Diversity
                    by seqadmin



                    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                    11-06-2024, 07:24 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 12-02-2024, 09:29 AM
                  0 responses
                  139 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-02-2024, 09:06 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 12-02-2024, 08:03 AM
                  0 responses
                  38 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 11-22-2024, 07:36 AM
                  0 responses
                  69 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X