Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lpn
    Member
    • May 2011
    • 17

    problem with samtools and SNP calling

    I am trying to do some SNP calling with samtools following this:


    The issue is that I don't get any SNPs or indels. To check what happens, I modify this:

    samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf

    and split it into 2 parts:

    samtools mpileup -uf hg19.fa file.bam > file.tmp
    and
    bcftools view -bvcg file.tmp > var.raw.bcf

    There is a lot of data in file.tmp, but var.raw.bcf seems too small (3k). Then, as suggested, I do:

    bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

    the resulting var.flt.vcf is just headers.

    I guess I am missing something obvious but can't figure it now. Any ideas what I am doing wrong?

    Edit: or any ideas for finding SNPs based on tophat .bam output?
    Last edited by lpn; 03-01-2012, 09:36 AM.
  • peromhc
    Senior Member
    • Sep 2009
    • 108

    #2
    try letting samtools use anomalous pairs with the -A flag.

    Comment

    • lpn
      Member
      • May 2011
      • 17

      #3
      Originally posted by peromhc View Post
      try letting samtools use anomalous pairs with the -A flag.
      Same result

      Any ideas for finding SNPs based on tophat .bam output?

      Comment

      • swbarnes2
        Senior Member
        • May 2008
        • 910

        #4
        Okay, the obvious things to try... stop filtering with vcfutils. Just look at the raw vcf.

        Second, make a pileup file, and look at it, and see if it looks right. Is coverage right? Does mpileup have the right reference sequences?

        Comment

        • lpn
          Member
          • May 2011
          • 17

          #5
          Originally posted by swbarnes2 View Post
          Okay, the obvious things to try... stop filtering with vcfutils. Just look at the raw vcf.
          I looked at the var.raw.bcf; seems too small (3k). The raw vcf can't be large either.

          Originally posted by swbarnes2 View Post
          Second, make a pileup file, and look at it, and see if it looks right. Is coverage right? Does mpileup have the right reference sequences?
          How do I look at it? seems like a binary file, but of a good size, several GB.

          seems like something happens in:
          bcftools view -bvcg file.tmp > var.raw.bcf

          where file.tmp is the pileup file
          Last edited by lpn; 03-01-2012, 09:57 AM.

          Comment

          • vv85
            Member
            • Feb 2011
            • 17

            #6
            Use the following to command to create a readable vcf file:

            bcftools view -vcg file.tmp > var.raw.vcf

            Since you are using the -v option it will only output variants. You could also try creating a vcf only using -cg and check for coverage on all the other regions.

            Edit: Of course , that last option will generate a huge file.
            Last edited by vv85; 03-01-2012, 10:10 AM.

            Comment

            • swbarnes2
              Senior Member
              • May 2008
              • 910

              #7
              mpileup pileup files are perfectly human readable.

              mpileup -f reference.fasta > pileup.txt

              Comment

              • lpn
                Member
                • May 2011
                • 17

                #8
                Originally posted by vv85 View Post
                Use the following to command to create a readable vcf file:

                bcftools view -vcg file.tmp > var.raw.vcf
                Same results

                Originally posted by vv85 View Post
                Since you are using the -v option it will only output variants. You could also try creating a vcf only using -cg and check for coverage on all the other regions.

                Edit: Of course , that last option will generate a huge file.
                Yes, I stopped it; it produces something.

                Comment

                • vv85
                  Member
                  • Feb 2011
                  • 17

                  #9
                  Totally true. I just started from the first step you mentioned. The file.tmp file was generated with -u so it's not human readable.

                  Comment

                  • lpn
                    Member
                    • May 2011
                    • 17

                    #10
                    Originally posted by swbarnes2 View Post
                    mpileup pileup files are perfectly human readable.
                    yes, converted that to readable; seems like a pileup file.

                    Comment

                    • vv85
                      Member
                      • Feb 2011
                      • 17

                      #11
                      do as swbarnes2 suggested and check coverage and reference names on the pileup file.
                      also what aligment parameters did you use?

                      Comment

                      • lpn
                        Member
                        • May 2011
                        • 17

                        #12
                        Originally posted by vv85 View Post
                        do as swbarnes2 suggested and check coverage and reference names on the pileup file.
                        also what aligment parameters did you use?
                        In the pileup file I get things like:
                        chr2 218208 G 30 ....................,......... +(+++)++++!#+++!+)+++!*****(#(

                        In the reference the names are like chr2, so they are the same.

                        The tophat parameters are:
                        -g 1 -r 150 --coverage-search --no-novel-indels --microexon-search --solexa1.3-quals

                        (I know, these are not optimal, but should not affect SNPs, I think)


                        and the from the tophat output, one chromosome is chosen as:

                        samtools index accepted_hits.bam
                        samtools view -b accepted_hits.bam chr2 > accepted_hits.chr2.bam

                        samtools sort accepted_hits.chr2.bam file.bam
                        Last edited by lpn; 03-01-2012, 11:41 AM.

                        Comment

                        • chadn737
                          Senior Member
                          • Jan 2009
                          • 392

                          #13
                          This is RNA-seq data (you are using Tophat)? How many reads?

                          Comment

                          • lpn
                            Member
                            • May 2011
                            • 17

                            #14
                            Originally posted by chadn737 View Post
                            This is RNA-seq data (you are using Tophat)? How many reads?
                            Yes, RNAseq. GAII: 30-40M reads, PE 2x100.
                            There should be at least some SNPs detected with such data.

                            Comment

                            • vv85
                              Member
                              • Feb 2011
                              • 17

                              #15
                              another program for variant search is GATK but I don't think that is the problem. By the way, a quick inspection of the pileup file shows variants? are there any known variants that you could directly check in the pileup file?

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 11:58 AM
                              0 responses
                              13 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              25 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              36 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              60 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...