Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • vschulz
    Junior Member
    • Apr 2009
    • 8

    samtools mpileup

    Hi,

    When I use samtools mpileup, I get bases with base qualities lower than the -Q cutoff. For example my command

    $samtoolsdir/samtools mpileup -f $resdir/human_g1k_v37.fasta my_reads.recal.bam > junk.pileup

    should have the default cutoff of 13, but I see bases with quality 2=# like
    1 10257 A 1 . #

    Setting -Q to something stupid like 200 seems to have no effect. I am using samtools-0.1.12a . Is there a way to filter output using quality in mpileup?

    Thanks,

    Vince
  • peromhc
    Senior Member
    • Sep 2009
    • 108

    #2
    any reason not to update to the latest version of samtools.. you're using a pretty old version. That alone may solve your problem

    Also, you can use awk/sed/grep to filter results based on quality..

    for instance:
    Code:
    awk '$6<200{next}1' snp.hits > snp.filtered.hits

    Comment

    • vschulz
      Junior Member
      • Apr 2009
      • 8

      #3
      samtools mpileup

      The latest samtools-0.1.18 also does the same thing. I also used different flags

      -B -Q 43

      to disable base recalibration, and -Q43 which should be a more sensible cutoff--the default is maybe phred not sanger based:
      -Q INT Minimum base quality for a base to be considered [13]

      None of these changes fixed the problem. I still wind up with lines like

      1 610235 G 10 ,$......... ?HHHHHI#JK

      and it will not be so easy to fix this using awk/sed/grep to something like

      1 610235 G 9 ,$......... ?HHHHHIJK

      Thanks,

      Vince

      Comment

      • vschulz
        Junior Member
        • Apr 2009
        • 8

        #4
        samtools mpileup

        For the record, problem solved by using latest git version of samtools, see


        Vince

        Comment

        • vyellapa
          Member
          • Oct 2011
          • 59

          #5
          I was running into the same problem and was trying to download the git version. However I do not see a repository.



          The source forge page shows a last modified date of 2011-09-02 for 1.18 version.

          What is the solution to filter reads based on base quality?

          Comment

          • nilshomer
            Nils Homer
            • Nov 2008
            • 1283

            #6
            Tools (written in C using htslib) for manipulating next-generation sequencing data - samtools/samtools

            Comment

            • vyellapa
              Member
              • Oct 2011
              • 59

              #7
              Thank you.That solved the -Q problem but I see another problem.I do not see all the pileup reads in some cases

              Code:
              samtools mpileup -A -B -C 0 -r 1:14600-14600 -q 0 -Q 0 -M 256 sorted309.bam
              [mpileup] 1 samples in 1 input files
              <mpileup> Set max per-file depth to 8000
              1	14600	N	4	CcCC	@.IJ
              While I see 4 reads in mpileup, viewing it through IGV has 11 reads piled up at that particular position.
              When I use pysam "fetch" the pileup has 11 rows too(see output below). I tried looking for patterns (strand direction, spliced reads, etc..) but I do not see any specific pattern.

              Code:
              python sampy.py sorted309.bam 1 14600
              HWI-ST388-W7D:269:B020WACXX:1:2307:17129:160814	417	0	14520	3	[(0, 101)]	0	14755	101	CGCCCCGGCTGTGTGGCCTCAAGCCAGCCTTCCGCTCCTTGAAGCTAGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCA	:?@:=?)<8:8C;CDH3):C@EHHGGE=GG?FBH8GHHG)=FFEEH)BDEH?E@>@DDF@@@CAC;>AC8@<@8A9@<0<8<CD<<2><((4(:(:?@A>:	[('NM', 2), ('NH', 2), ('CC', '15'), ('CP', 102516544.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:1101:9762:35190	355	0	14537	3	[(0, 101)]	0	14572	101	CTCAAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGT	@C@FFDDFDHHHHJJIGGIIJIJJJIJDHGGJJJJJJJGIJJB@FHII?FGGHGIIIIJDHGFEF@@BBCCD-;AC>>A?CCC3<2>:CB>@>>98?BA34	[('NM', 0), ('NH', 2), ('CC', '15'), ('CP', 102516528.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:2302:13220:150064	329	0	14544	3	[(0, 101)]	-1	-1	101	CAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTC	@@@FFFFFHHHHHJJJJJJJJJJJIIJJJJJIIJJFGHJJJIJJJIIJJJFGIJJJJHHHHE@@BBECCACAC@CDDDDDCC?C>;>><52?C@4>ABC34	[('NM', 0), ('NH', 2), ('CC', '15'), ('CP', 102516520.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:1:1306:10005:27456	97	0	14551	3	[(0, 101)]	0	14808	101	CCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCACGTCAGAGCAA	@;?DADDDDHDFFI@DEF<DBA<?<CBG49:9DA:C@3@9?@G<D:@;@;@CHIIG>=77=C=?CC@C?BABCCC?CCAC3(<?BCCCCC###########	[('NM', 1), ('NH', 2), ('CC', '15'), ('CP', 102516512.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:2204:12310:69039	153	0	14564	3	[(0, 101)]	-1	-1	101	CTGGTCTCCACACCGTGCGGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTG	#####################B>50C<5'/83356.;?AEAGAE;A@F=3=)49BEIGF?BF?B8DFB?D:11*C?*>C9FA9EHFAGHB?DH>DD=:<B@	[('NM', 2), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:1304:12406:19285	99	0	14568	3	[(0, 101)]	0	14731	101	TCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATTTCAGAGCAACGGCCCACGTCTGGTTC	@@@?BA++AD>DB+ACBEHBA@?CBGHCFEIIFF>?DBDEF?DBFDBFEC@3@F@AEE@DEE;7A);.7;@@@############################	[('NM', 3), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:1:2103:16843:14609	355	0	14569	3	[(0, 101)]	0	14726	101	CTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCT	@@<DDEFDDDHHFHCGFGHEHI9B@FBGA@AGHE<FHEGF9BDB?B=CC@FCCDFFECDGHECC>;(77?@;BCE@C:5@5>3(862;@B?BCCCACD?##	[('NM', 0), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:2305:7247:113327	329	0	14570	3	[(0, 101)]	-1	-1	101	TCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGTGCTGAGCAGCTTGTCCTGGCTGCGTCCATGGCAGAGCAACGGCGCAAGTCTGGGTCTG	11?;;B;D+2ADDC+<C+<?EE<B++:??D)???BD9*?##############################################################	[('NM', 5), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:1104:5276:115086	97	0	14571	3	[(0, 101)]	0	14789	101	CCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTCTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGG	@B@FFFFFHFHHHIJJJJJIJIJJIEIIJIJDEGGCAEFHEHDGHIJ@DEHGIIIIJIIJACHGHHEFFFFFFDE@@>65ABD;=B9?BBDCEDCC93<A9	[('NM', 1), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:1101:9762:35190	403	0	14572	3	[(0, 101)]	0	14537	101	CACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGC	?ADCCCDDBBA?-@B<DCB><BB@BBADCEECEFFFFFFHHHFFGIEGIIHGIJIHEGF;CGCDDCJJJIJJJJHIIIIIJJJJIIJIHFHDFFFFFFC@@	[('NM', 1), ('NH', 2), ('CC', '15'), ('CP', 102516496.0), ('HI', 0)]
              HWI-ST388-W7D:269:B020WACXX:4:2205:10161:121730	339	0	14587	3	[(0, 101)]	0	14460	101	CCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGNAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATG	9@>A5)>9BB?<B?CCCDDCCDC@<:(28<(,#DDB<5(A?B??A>ACEFFFFFGFHHEEA<D;GIHHGGHFDIEECFJIGIIEGHGIHHGHDFFEFF@CC	[('NM', 1), ('NH', 2), ('CC', '15'), ('CP', 102516480.0), ('HI', 0)]
              Last edited by vyellapa; 06-04-2012, 11:17 PM.

              Comment

              • evonne16
                Junior Member
                • Nov 2011
                • 7

                #8
                GATK :Failed to load reference dictionary

                Hi,
                When I use DepthOfCoverage of GATK, the error is 'Failed to load reference dictionary' .
                I can't get why.
                My command as below:
                $java -jar GenomeAnalysisTK.jar \ -T DepthOfCoverage \ -R refall/allRef.fa \ -I mapping/DNAnew.sorted.bam \ -o mapping/dcDNA.targetCoverage

                Thanks

                Comment

                • vyellapa
                  Member
                  • Oct 2011
                  • 59

                  #9
                  Do you have the reference ".dict" file and all the indicies for the reference you are using at the same location where your reference is?

                  Comment

                  • evonne16
                    Junior Member
                    • Nov 2011
                    • 7

                    #10
                    Hi,
                    I try the CreatSequenceDictionary to get .dict file of reference,but unsuccessful. My reference is some DNA sequence get from NCBI not hg19 or chromosome reference. I just wanna get the coverage of reads on the reference . I don't know how to use CoverageBySample of GATK.
                    Please give some hints, thank you so much!

                    cheng

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    31 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    96 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    117 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    109 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...