Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ramirob
    Member
    • Apr 2012
    • 16

    Samtools on Chromosomes Versus Genome

    Hi everyone,

    Does anybody have experience running samtools on CASAVA alignments?

    I am trying to call SNPs using samtools on a CASAVA/illumina alignment.

    CASAVA outputs an alignment of the type name_export.txt.gz

    and it automatically converts to BAM format when using their SNP caller.

    Now I want to call SNPs using Samtools.

    The alignment was done using each chromosome individually:

    genomes/Homo_sapiens/UCSC/hg18/Sequence/Chromosomes/Chr...

    However, the BAM alignment provided by CASAVA is for the whole genome, so then should I use the whole reference genome?:

    samtools mpileup -f ../genomes/Homo_sapiens/UCSC/hg18/Sequence/WholeGenomeFasta/genome.fa sorted.bam > sorted.mpileupOutput

    the problem is that, in that case the mpileup output doesn't seem to recognize it as I get:

    bash-3.2$ head -n 100 sorted.mpileupOutput
    chr1.fa 3309 N 1 ^NT C
    chr1.fa 3310 N 1 G C
    chr1.fa 3311 N 1 C F
    chr1.fa 3312 N 1 C F
    chr1.fa 3313 N 1 C F


    Any ideas??

    Thanks in advance,

    Ramiro
  • sdriscoll
    I like code
    • Sep 2009
    • 436

    #2
    try this. mpileup doesn't do too much. bcftools is the program that actually does the SNP calling.

    Code:
    samtools mpileup -uf ../genomes/Homo_sapiens/UCSC/hg18/Sequence/WholeGenomeFasta/genome.fa sorted.bam | bcftools view -bvcg - > sorted.bcf
    bcftools view sorted.bcf > sorted.vcf
    head sorted.vcf
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment

    • bioinfosm
      Senior Member
      • Jan 2008
      • 483

      #3
      or you can try splitting the bam file by chromosome and then calling SNP on them separately.. you could also try starting from the fastq or qseq files, run alignment using something like bwa and then call variants
      --
      bioinfosm

      Comment

      • sdriscoll
        I like code
        • Sep 2009
        • 436

        #4
        indeed...i align data using BWA when i'm looking for SNPs.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment

        • AllanLindh
          Junior Member
          • Apr 2012
          • 5

          #5
          How to combine .vcf files into whole genome

          This discussion is very useful, but when I run mpileup plus bcftools to produce .vcf files for individual Chrm, I have found no way to combine them into whole genome .vcf file for further processing. vcf-concat seems like it might do it, but it never runs due to diverse errors. Most recent is:
          ****************************
          The use of -1 for unknown number of values is deprecated, please use '.' instead.
          FORMAT=<ID=PL,Number=-1,Type=Integer,Description="List of Phred-scaled genotype likelihoods, number of values is (#ALT+1)*(#ALT+2)/2">
          ****************************
          I am running most recent version of samtools (0.1.18) -- has anyone else noticed a mismatch in the FORMAT specs between samtools and vcftools??

          Any suggestions at all would be very welcom, this is driving me nuts.

          Thanks
          Allan Lindh

          Comment

          • ramirob
            Member
            • Apr 2012
            • 16

            #6
            sdriscoll,

            It seems that mpileup looking like:

            chr1.fa 3309 N 1 ^NT C
            chr1.fa 3310 N 1 G C
            chr1.fa 3311 N 1 C F
            chr1.fa 3312 N 1 C F

            is a bad sign, it looks to me that it is not mapping to the ref. genome properly. I did as you suggested and everything was a SNP.

            Comment

            • swbarnes2
              Senior Member
              • May 2008
              • 910

              #7
              Double check that the chromosome names in your .bam file match the chromosome names in your fasta. And I'd remake the fa.fai with samtools faidx, see if that completes without an error.

              Comment

              • frc1230
                Member
                • Jul 2011
                • 10

                #8
                I am currently trying to deal with exactly this problem: CASAVA alignments that choke mpileup. We use mpileup output directly for analysis with our own SNP finders, and it works very well with output from a BWA-based aligner (stampy). But with CASAVA output, as stated in another response, mpileup thinks everything is a SNP, all reads with phred scores of 0. This is not the case, fortunately, as the same read data run through stampy produces an alignment file that behaves well with mpileup.

                Comment

                • ramirob
                  Member
                  • Apr 2012
                  • 16

                  #9
                  I must admit I just gave up on Casava alignments, just use BWA and I am exploring other aligners.

                  Comment

                  • frc1230
                    Member
                    • Jul 2011
                    • 10

                    #10
                    I like stampy-BWA, but it does take a lot of cpu time

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Pathogen Surveillance with Advanced Genomic Tools
                      by seqadmin




                      The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                      Yesterday, 11:48 AM
                    • seqadmin
                      New Genomics Tools and Methods Shared at AGBT 2025
                      by seqadmin


                      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                      The Headliner
                      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                      03-03-2025, 01:39 PM
                    • seqadmin
                      Investigating the Gut Microbiome Through Diet and Spatial Biology
                      by seqadmin




                      The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                      02-24-2025, 06:31 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-20-2025, 05:03 AM
                    0 responses
                    26 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-19-2025, 07:27 AM
                    0 responses
                    33 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-18-2025, 12:50 PM
                    0 responses
                    25 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-03-2025, 01:15 PM
                    0 responses
                    190 views
                    0 reactions
                    Last Post seqadmin  
                    Working...