Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools mpileup on multiple samples

    I have multiple samples from which I would like to call SNP. I've been trying either to call them separately or all at once

    samtools mpileup -uf hg19.fa sample1.bam | bcftools view -bvcg - > out.bcf
    samtools mpileup -uf hg19.fa sample1.bam sample2.bam | bcftools view -bvcg - > out.bcf

    In the first case, I can get coverage (DP) at each position where a SNP is encountered, but only for the sample that has the SNP. That is, I would like to get the coverage at that position in all samples, but I could not.

    In the second case, for SNP I know are present in both samples, I found only 1 value for DP, which I assume is the combined coverage of both samples (see below).
    Is there any way I can get DP and DP4 for all samples even if the SNP is only in one?

    chr1 15211 . T G 51 . DP=16;AF1=1;CI95=0.8333,1;DP4=3,0,10,3;MQ=30;FQ=-37.5;PV4=1,1.3e-05,1,1 GT:PL:GQ 1/1:46,9,0:21 1/1:23,9,0:21

    Second question, I've been having a hard time trying to annotate the SNP calls in the vcf file (produced by bcftools view) with its rsID. I tried vcftools

    >bgzip UCSC_SNP_v132
    >zcat input.vcf.gz | vcftools_0.1.5/bin/fill-rsIDs -r UCSC_SNP_v132.gz | tabix-0.2.5/bgzip -c > out

    but kept getting error below which suggests to me that I'm not using the right format for the input dbSNP. I used the BED format downloaded from UCSC. What am I doing wrong here?

    tabix ./UCSC_SNP_v132.gz chr1 2>&1 |: No such file or directory at vcftools_0.1.5/bin/fill-rsIDs line 20
    main::error('tabix ./UCSC_SNP_v132.gz chr1 2>&1 |: No such file or directory') called at vcftools_0.1.5/bin/fill-rsIDs line 93
    main::fill_rsids('HASH(0x169602a0)', './UCSC_SNP_v132.gz') called at vcftools_0.1.5/bin/fill-rsIDs line 11


  • #2
    Regarding your first question: The -D option keeps per sample read depth with mpileup.


    • #3
      Thanks, DerSeb!

      Regarding my 2nd question, I realized that error was because tabix (called inside fill-rsIDs) was not on my path.


      • #4
        how to annotate VCF files created by bcftools view?

        I have 1 big vcf files which compares 12 samples by using samtools mpileup. And I would like annotate them? my reference genome is Hg19. How can annotate VCf files ?

        Could you write commands and set up and where to download vcftools?

        Thanks in advance.

        Are you using only vcftools? What about ANNOVAR?


        • #5
          No, here's no way to run multiple sample .bams together and get the DP4 of each sample. I Which is a pity, since that would be useful. What you get instead are other stats, which can give you an idea of how good the SNP is in each sample.


          • #6
            For annotating the VCF file you can use the Ensembl Variant Effect Predictor. It is very helpful and can also produce SIFT, PolyPhen and Condel consensus.

            I would recommend using the Perl script with a pre-built cache.

            Here is the command that I use to annotate on a Mac:
            perl --species homo_sapiens -i SNP.vcf --format vcf -o SNP_effects.txt --sift b --polyphen b --condel b --gene --hgnc --canonical --regulatory --coding_only --no_intergenic --cache --compress gzcat --check_existing
            Variant Effect Predictor can also be used to check the SNPs against the 1000 Genomes and HapMap projects but I was not able to get it running. It was always throwing an error.

            Last edited by DineshCyanam; 12-28-2011, 01:26 PM.


            • #7
              Thank you. Sorry for my late. I will try them.


              Latest Articles


              • seqadmin
                Exploring the Dynamics of the Tumor Microenvironment
                by seqadmin

                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                07-08-2024, 03:19 PM
              • seqadmin
                Exploring Human Diversity Through Large-Scale Omics
                by seqadmin

                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                06-25-2024, 06:43 AM





              Topics Statistics Last Post
              Started by seqadmin, 07-19-2024, 07:20 AM
              0 responses
              Last Post seqadmin  
              Started by seqadmin, 07-16-2024, 05:49 AM
              0 responses
              Last Post seqadmin  
              Started by seqadmin, 07-15-2024, 06:53 AM
              0 responses
              Last Post seqadmin  
              Started by seqadmin, 07-10-2024, 07:30 AM
              0 responses
              Last Post seqadmin