Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mpile up problem

    I'm trying to derive a vcf file from a bam using:

    samtools view -bS test.sam | samtools sort - test_sort
    samtools index test_sorted.bam
    samtools mpileup -E -uf Ref.fna test_sorted.bam > test.pileup
    bcftools view -cg test.pileup > test.vcf

    The sam and bam look fine but the mpileup command runs too quickly and gives me a small file without sequence data.

    bcftools gives me:
    [bcf_sync] incorrect number of fields (6 != 5) at 0.0

    I've re-generated the bowtie index with the Ref.fna file to confirm the files match but that doesn't help. I'm using samtools 1.2, bcftools 0.1.17.

    Any idea what's wrong? It worked fine a couple of weeks ago.

  • #2
    Advisable not to mix old and new versions of samtools and bcftools. You may want to look at the new "call" option in new bcftools: http://www.htslib.org/doc/bcftools.html#call

    Comment


    • #3
      Thanks, I'll upgrade the bcftools.

      However, I think something is wrong with the mpileup output too.
      It begins readable but then turns into binary. This doesn't look like any examples of pileup format I've seen. Could it be behind the problem?

      BCF^B^Bf<^@^@##fileformat=VCFv4.2
      ##FILTER=<ID=PASS,Description="All filters passed",IDX=0>
      ##samtoolsVersion=1.2+htslib-1.2.1
      ##samtoolsCommand=samtools mpileup -E -uf Ref_new.fna test_sorted.bam
      ##reference=file://Ref_new.fna
      ##contig=<ID=comp39600_c0_seq2,length=1517,IDX=0>
      ##contig=<ID=comp39985_c0_seq4,length=1303,IDX=1>
      ##contig=<ID=comp40415_c0_seq2,length=873,IDX=2>

      .......


      ##contig=<ID=comp44856_c1_seq1,length=608,IDX=263>
      ##ALT=<ID=X,Description="Represents allele(s) other than observed.">
      ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.",IDX=1>
      ##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel",IDX=2>
      ##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel",IDX=3>
      ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth",IDX=4>
      ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3",IDX=5>
      ##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)",IDX=6>
      ##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)",IDX=7>
      ##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)",IDX=8>
      ##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)",IDX=9>
      ##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.",IDX=10>
      ##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)",IDX=11>
      ##INFO=<ID=I16,Number=16,Type=Float,Description="Auxiliary tag used for calling, see description of bcf_callret1_t in bam2bcf.h",IDX=12>
      ##INFO=<ID=QS,Number=R,Type=Float,Description="Auxiliary tag used for calling",IDX=13>
      ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods",IDX=14>
      #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT test_sorted.bam
      ^@{^@^@^@^F^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^D^@^B^@^A^@^@^A^G^WC7<X>^@^Q^D^Q^U^Q^L<F5>^Q^P^@^@<A8>A^@^@^@^@^@^@^@^@^@^@^@^@^@^@@D^@8<DC>F^@^@^@^@^@^@^@^@^@^@<A8>C^@<A0><D5>E^@^@^@^@^@^@^@^@^@<80><A7>C^@H<D9>E^@^@^@^@^@^@^@^@^Q^M%^@^@<80>?^@^@^@^@^Q^K^U^@^@^@^@^Q^N1^@?f{^@^@^@^F^@^@^@^@^@^@^@^A^@^@^@^A^@^@^@^@^@^@^@^D^@^B^@^A^@^@^A^G^WG7<X>^@^Q^D^Q^U^Q^L<F5>^Q^P^@^@<A8>A^@^@^@^@^@^@^@^@^@^@^@^@^@@;D^@*<D5>F^@^@^@^@^@^@^@^@^@^@<A8>C^@<A0><D5>E^@^@^@^@^@^@^@^@^@<80><AE>C^@<B8><E3>E^@^@^@^@^@^@^@^@^Q^M%^@^@<80>?^@^@^@^@^Q^K^U^@^@^@^@^Q^N1^@?f{^@^@^@^F^@^@^@^@^@^@^@^B^@^@^@^A^@^@^@^@^@^@^@^D^@^B^@^A^@^@^A^G^WG7<X>^@^Q^D^Q^U^Q^L<F5>^Q^P^@^@<A8>A^@^@^@^@^@^@^@
      ^@^@^@^@^@^@<80>6D^@\<CE>F^@^@^@^@^@^@^@^@^@^@<A8>C^@<A0><D5>E^@^@^@^@^@^@^@^@^@<80><B5>C^@^H<EF>E^@^@^@^@^@^@^@^@^Q^M%^@^@<80>?^@^@^@^@^Q^K^U^@^@^@^@^Q^N1^@?j{^@^@^@

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X