Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools on Chromosomes Versus Genome

    Hi everyone,

    Does anybody have experience running samtools on CASAVA alignments?

    I am trying to call SNPs using samtools on a CASAVA/illumina alignment.

    CASAVA outputs an alignment of the type name_export.txt.gz

    and it automatically converts to BAM format when using their SNP caller.

    Now I want to call SNPs using Samtools.

    The alignment was done using each chromosome individually:

    genomes/Homo_sapiens/UCSC/hg18/Sequence/Chromosomes/Chr...

    However, the BAM alignment provided by CASAVA is for the whole genome, so then should I use the whole reference genome?:

    samtools mpileup -f ../genomes/Homo_sapiens/UCSC/hg18/Sequence/WholeGenomeFasta/genome.fa sorted.bam > sorted.mpileupOutput

    the problem is that, in that case the mpileup output doesn't seem to recognize it as I get:

    bash-3.2$ head -n 100 sorted.mpileupOutput
    chr1.fa 3309 N 1 ^NT C
    chr1.fa 3310 N 1 G C
    chr1.fa 3311 N 1 C F
    chr1.fa 3312 N 1 C F
    chr1.fa 3313 N 1 C F


    Any ideas??

    Thanks in advance,

    Ramiro

  • #2
    try this. mpileup doesn't do too much. bcftools is the program that actually does the SNP calling.

    Code:
    samtools mpileup -uf ../genomes/Homo_sapiens/UCSC/hg18/Sequence/WholeGenomeFasta/genome.fa sorted.bam | bcftools view -bvcg - > sorted.bcf
    bcftools view sorted.bcf > sorted.vcf
    head sorted.vcf
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      or you can try splitting the bam file by chromosome and then calling SNP on them separately.. you could also try starting from the fastq or qseq files, run alignment using something like bwa and then call variants
      --
      bioinfosm

      Comment


      • #4
        indeed...i align data using BWA when i'm looking for SNPs.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment


        • #5
          How to combine .vcf files into whole genome

          This discussion is very useful, but when I run mpileup plus bcftools to produce .vcf files for individual Chrm, I have found no way to combine them into whole genome .vcf file for further processing. vcf-concat seems like it might do it, but it never runs due to diverse errors. Most recent is:
          ****************************
          The use of -1 for unknown number of values is deprecated, please use '.' instead.
          FORMAT=<ID=PL,Number=-1,Type=Integer,Description="List of Phred-scaled genotype likelihoods, number of values is (#ALT+1)*(#ALT+2)/2">
          ****************************
          I am running most recent version of samtools (0.1.18) -- has anyone else noticed a mismatch in the FORMAT specs between samtools and vcftools??

          Any suggestions at all would be very welcom, this is driving me nuts.

          Thanks
          Allan Lindh

          Comment


          • #6
            sdriscoll,

            It seems that mpileup looking like:

            chr1.fa 3309 N 1 ^NT C
            chr1.fa 3310 N 1 G C
            chr1.fa 3311 N 1 C F
            chr1.fa 3312 N 1 C F

            is a bad sign, it looks to me that it is not mapping to the ref. genome properly. I did as you suggested and everything was a SNP.

            Comment


            • #7
              Double check that the chromosome names in your .bam file match the chromosome names in your fasta. And I'd remake the fa.fai with samtools faidx, see if that completes without an error.

              Comment


              • #8
                I am currently trying to deal with exactly this problem: CASAVA alignments that choke mpileup. We use mpileup output directly for analysis with our own SNP finders, and it works very well with output from a BWA-based aligner (stampy). But with CASAVA output, as stated in another response, mpileup thinks everything is a SNP, all reads with phred scores of 0. This is not the case, fortunately, as the same read data run through stampy produces an alignment file that behaves well with mpileup.

                Comment


                • #9
                  I must admit I just gave up on Casava alignments, just use BWA and I am exploring other aligners.

                  Comment


                  • #10
                    I like stampy-BWA, but it does take a lot of cpu time

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      The Impact of AI in Genomic Medicine
                      by seqadmin



                      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                      02-26-2024, 02:07 PM
                    • seqadmin
                      Multiomics Techniques Advancing Disease Research
                      by seqadmin


                      New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                      A major leap in the field has
                      ...
                      02-08-2024, 06:33 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:12 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-23-2024, 04:11 PM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-21-2024, 08:52 AM
                    0 responses
                    73 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-20-2024, 08:57 AM
                    0 responses
                    62 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X