Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with samtools and SNP calling

    I am trying to do some SNP calling with samtools following this:


    The issue is that I don't get any SNPs or indels. To check what happens, I modify this:

    samtools mpileup -uf ref.fa aln1.bam aln2.bam | bcftools view -bvcg - > var.raw.bcf

    and split it into 2 parts:

    samtools mpileup -uf hg19.fa file.bam > file.tmp
    and
    bcftools view -bvcg file.tmp > var.raw.bcf

    There is a lot of data in file.tmp, but var.raw.bcf seems too small (3k). Then, as suggested, I do:

    bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

    the resulting var.flt.vcf is just headers.

    I guess I am missing something obvious but can't figure it now. Any ideas what I am doing wrong?

    Edit: or any ideas for finding SNPs based on tophat .bam output?
    Last edited by lpn; 03-01-2012, 09:36 AM.

  • #2
    try letting samtools use anomalous pairs with the -A flag.

    Comment


    • #3
      Originally posted by peromhc View Post
      try letting samtools use anomalous pairs with the -A flag.
      Same result

      Any ideas for finding SNPs based on tophat .bam output?

      Comment


      • #4
        Okay, the obvious things to try... stop filtering with vcfutils. Just look at the raw vcf.

        Second, make a pileup file, and look at it, and see if it looks right. Is coverage right? Does mpileup have the right reference sequences?

        Comment


        • #5
          Originally posted by swbarnes2 View Post
          Okay, the obvious things to try... stop filtering with vcfutils. Just look at the raw vcf.
          I looked at the var.raw.bcf; seems too small (3k). The raw vcf can't be large either.

          Originally posted by swbarnes2 View Post
          Second, make a pileup file, and look at it, and see if it looks right. Is coverage right? Does mpileup have the right reference sequences?
          How do I look at it? seems like a binary file, but of a good size, several GB.

          seems like something happens in:
          bcftools view -bvcg file.tmp > var.raw.bcf

          where file.tmp is the pileup file
          Last edited by lpn; 03-01-2012, 09:57 AM.

          Comment


          • #6
            Use the following to command to create a readable vcf file:

            bcftools view -vcg file.tmp > var.raw.vcf

            Since you are using the -v option it will only output variants. You could also try creating a vcf only using -cg and check for coverage on all the other regions.

            Edit: Of course , that last option will generate a huge file.
            Last edited by vv85; 03-01-2012, 10:10 AM.

            Comment


            • #7
              mpileup pileup files are perfectly human readable.

              mpileup -f reference.fasta > pileup.txt

              Comment


              • #8
                Originally posted by vv85 View Post
                Use the following to command to create a readable vcf file:

                bcftools view -vcg file.tmp > var.raw.vcf
                Same results

                Originally posted by vv85 View Post
                Since you are using the -v option it will only output variants. You could also try creating a vcf only using -cg and check for coverage on all the other regions.

                Edit: Of course , that last option will generate a huge file.
                Yes, I stopped it; it produces something.

                Comment


                • #9
                  Totally true. I just started from the first step you mentioned. The file.tmp file was generated with -u so it's not human readable.

                  Comment


                  • #10
                    Originally posted by swbarnes2 View Post
                    mpileup pileup files are perfectly human readable.
                    yes, converted that to readable; seems like a pileup file.

                    Comment


                    • #11
                      do as swbarnes2 suggested and check coverage and reference names on the pileup file.
                      also what aligment parameters did you use?

                      Comment


                      • #12
                        Originally posted by vv85 View Post
                        do as swbarnes2 suggested and check coverage and reference names on the pileup file.
                        also what aligment parameters did you use?
                        In the pileup file I get things like:
                        chr2 218208 G 30 ....................,......... +(+++)++++!#+++!+)+++!*****(#(

                        In the reference the names are like chr2, so they are the same.

                        The tophat parameters are:
                        -g 1 -r 150 --coverage-search --no-novel-indels --microexon-search --solexa1.3-quals

                        (I know, these are not optimal, but should not affect SNPs, I think)


                        and the from the tophat output, one chromosome is chosen as:

                        samtools index accepted_hits.bam
                        samtools view -b accepted_hits.bam chr2 > accepted_hits.chr2.bam

                        samtools sort accepted_hits.chr2.bam file.bam
                        Last edited by lpn; 03-01-2012, 11:41 AM.

                        Comment


                        • #13
                          This is RNA-seq data (you are using Tophat)? How many reads?

                          Comment


                          • #14
                            Originally posted by chadn737 View Post
                            This is RNA-seq data (you are using Tophat)? How many reads?
                            Yes, RNAseq. GAII: 30-40M reads, PE 2x100.
                            There should be at least some SNPs detected with such data.

                            Comment


                            • #15
                              another program for variant search is GATK but I don't think that is the problem. By the way, a quick inspection of the pileup file shows variants? are there any known variants that you could directly check in the pileup file?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X