Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with samtools and SNP calling

    I am trying to do some SNP calling with samtools following this:


    The issue is that I don't get any SNPs or indels. To check what happens, I modify this:

    samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf

    and split it into 2 parts:

    samtools mpileup -uf hg19.fa file.bam > file.tmp
    and
    bcftools view -bvcg file.tmp > var.raw.bcf

    There is a lot of data in file.tmp, but var.raw.bcf seems too small (3k). Then, as suggested, I do:

    bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

    the resulting var.flt.vcf is just headers.

    I guess I am missing something obvious but can't figure it now. Any ideas what I am doing wrong?

    Edit: or any ideas for finding SNPs based on tophat .bam output?
    Last edited by lpn; 03-01-2012, 09:36 AM.

  • #2
    try letting samtools use anomalous pairs with the -A flag.

    Comment


    • #3
      Originally posted by peromhc View Post
      try letting samtools use anomalous pairs with the -A flag.
      Same result

      Any ideas for finding SNPs based on tophat .bam output?

      Comment


      • #4
        Okay, the obvious things to try... stop filtering with vcfutils. Just look at the raw vcf.

        Second, make a pileup file, and look at it, and see if it looks right. Is coverage right? Does mpileup have the right reference sequences?

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          Okay, the obvious things to try... stop filtering with vcfutils. Just look at the raw vcf.
          I looked at the var.raw.bcf; seems too small (3k). The raw vcf can't be large either.

          Originally posted by swbarnes2 View Post
          Second, make a pileup file, and look at it, and see if it looks right. Is coverage right? Does mpileup have the right reference sequences?
          How do I look at it? seems like a binary file, but of a good size, several GB.

          seems like something happens in:
          bcftools view -bvcg file.tmp > var.raw.bcf

          where file.tmp is the pileup file
          Last edited by lpn; 03-01-2012, 09:57 AM.

          Comment


          • #6
            Use the following to command to create a readable vcf file:

            bcftools view -vcg file.tmp > var.raw.vcf

            Since you are using the -v option it will only output variants. You could also try creating a vcf only using -cg and check for coverage on all the other regions.

            Edit: Of course , that last option will generate a huge file.
            Last edited by vv85; 03-01-2012, 10:10 AM.

            Comment


            • #7
              mpileup pileup files are perfectly human readable.

              mpileup -f reference.fasta > pileup.txt

              Comment


              • #8
                Originally posted by vv85 View Post
                Use the following to command to create a readable vcf file:

                bcftools view -vcg file.tmp > var.raw.vcf
                Same results

                Originally posted by vv85 View Post
                Since you are using the -v option it will only output variants. You could also try creating a vcf only using -cg and check for coverage on all the other regions.

                Edit: Of course , that last option will generate a huge file.
                Yes, I stopped it; it produces something.

                Comment


                • #9
                  Totally true. I just started from the first step you mentioned. The file.tmp file was generated with -u so it's not human readable.

                  Comment


                  • #10
                    Originally posted by swbarnes2 View Post
                    mpileup pileup files are perfectly human readable.
                    yes, converted that to readable; seems like a pileup file.

                    Comment


                    • #11
                      do as swbarnes2 suggested and check coverage and reference names on the pileup file.
                      also what aligment parameters did you use?

                      Comment


                      • #12
                        Originally posted by vv85 View Post
                        do as swbarnes2 suggested and check coverage and reference names on the pileup file.
                        also what aligment parameters did you use?
                        In the pileup file I get things like:
                        chr2 218208 G 30 ....................,......... +(+++)++++!#+++!+)+++!*****(#(

                        In the reference the names are like chr2, so they are the same.

                        The tophat parameters are:
                        -g 1 -r 150 --coverage-search --no-novel-indels --microexon-search --solexa1.3-quals

                        (I know, these are not optimal, but should not affect SNPs, I think)


                        and the from the tophat output, one chromosome is chosen as:

                        samtools index accepted_hits.bam
                        samtools view -b accepted_hits.bam chr2 > accepted_hits.chr2.bam

                        samtools sort accepted_hits.chr2.bam file.bam
                        Last edited by lpn; 03-01-2012, 11:41 AM.

                        Comment


                        • #13
                          This is RNA-seq data (you are using Tophat)? How many reads?

                          Comment


                          • #14
                            Originally posted by chadn737 View Post
                            This is RNA-seq data (you are using Tophat)? How many reads?
                            Yes, RNAseq. GAII: 30-40M reads, PE 2x100.
                            There should be at least some SNPs detected with such data.

                            Comment


                            • #15
                              another program for variant search is GATK but I don't think that is the problem. By the way, a quick inspection of the pileup file shows variants? are there any known variants that you could directly check in the pileup file?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM
                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 05-24-2024, 07:15 AM
                              0 responses
                              195 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 10:28 AM
                              0 responses
                              218 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 07:35 AM
                              0 responses
                              221 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-22-2024, 02:06 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X