Header Leaderboard Ad


SAMtools pileup command



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAMtools pileup command

    I am having trouble with the samtools pileup command and would appreciate any advice. I'm probably doing something wrong, but I have read the documentation and can't figure out what. Originally I was starting with a sam file produced by BWA. But that wasn't working so I switched to a sam file generated by using bowtie2sam.pl on a bowtie file.

    I used the following command to create the unsorted bam file:
    samtools view -buSt chrM.fa -o file.bam file.sam

    And then the following to sort the file:
    samtools sort file.bam file.bam.sorted

    Then I tried a variety of things to produce the pileup file.

    samtools pileup -f chrM.fa file.bam.sorted.bam
    --> no output, no error

    samtools pileup -t chrM.fa.fai file.bam.sorted.bam
    samtools pileup -f chrM.fa -t chrM.fa.fai file.bam.sorted.bam
    [sam_header_read2] 1 sequences loaded.
    [sam_read1] reference '$(((((((((((((((((((((((((NMCX1CMDZ7N0N27' is recognized as '*'.
    [sam_read1] reference '(((((((((((((((((((((((((((((NMCX1CMDZ7N0N27' is recognized as '*'.
    Parse error at line 2: invalid CIGAR character
    Abort trap

    samtools pileup -f chrM.fa -S file.sam
    [samopen] no @SQ lines in the header.
    [sam_read1] missing header? Abort!

    samtools pileup -f chrM.fa file.sam
    [bam_pileup] fail to read the header: non-exisiting file or wrong format.

    The only way I could get pileup output was as follows:
    samtools pileup -t chrM.fa.fai file.sam
    samtools pileup -t chrM.fa.fai -S file.sam
    --> note there is no -S flag in the first one and yet it produced correct output

    I want to be able to use the pileup command on the bam file because I am assuming it will be much faster on large files. Can I do that?

    And I was told that it's better to use -f than -t--is that true, or does it not matter?

  • #2
    I have same issues here when I use the conversion tools. I mean, if I perform the alignment with bwa (which produces SAM output) everything works fine, like this:

    bwa aln genome file.fq > file.sai
    bwa samse file.sai | samtools view -bt genome.fai -o results.bam -

    samtools sort results.bam sorted && mv sorted.bam results.bam
    samtools index results.bam

    samtools pileup [-f genome] results.bam

    I've tried export2sam.pl for old runs but it has some issues which I believe are related to the fact export files have the chromosome (or genome) file name in the 'chromosome' column. This is reported in SAM from the converter but there's no, say, chr5.fa chromosome in the reference used for samtools. The worst happens when I convert eland_export files aligned against a multifasta reference: the expected column for chromosome contains only, say, genome.fa... the export2sam.pl places all the reads as they were aligned to chromosome 'genome.fa':

    11II-08F1-GA_1:6:1:0:1237 81 genome.fa 138895 275 36M = 138757 -174 CGAATTAGTAGATTGTTTGTAAAAATCGGTATTGTN <:<997<;<:<<::;9999;;;:;9<<:
    <<;:;;/& MD:Z:C35 SM:i:133
    11II-08F1-GA_1:6:1:0:1237 161 genome.fa 138757 275 36M = 138895 174 ACTTGTTTTAAATAAACAGATTTTCCACTCATGTTG @@@@[email protected][email protected]=5>@[email protected]@[email protected]
    [email protected]@?<@@ MD:Z:36 SM:i:142
    11II-08F1-GA_1:6:1:0:399 97 genome.fa 179127 275 36M = 179285 194 NATATCATAACCATAAAAAAAAAAGCACAATTCAAC &1<<<<<<<<<<<<<<<<<<<<<8579:
    :<<:::98 MD:Z:A35 SM:i:133

    I don't have bowtie ouputs here at the moment, you may check if the bowtie chromosome column is the name of the file or the chromosome...




    • #3
      * character in pileup output

      Here is an example of line in pileup output which contains * characters
      chrIII 8884091 T W 95 126 60 12 ,,*****,*aA^]A aab`a^aHa^aa

      I do not believe this character is documented. Would be nice to hear back what it means.


      • #4
        How can I pileup all the sites of reference sequence , but not only the sites covered by at least 1 ?


        • #5
          Originally posted by baohua100 View Post
          How can I pileup all the sites of reference sequence , but not only the sites covered by at least 1 ?
          You can get a bioinformatician or computer scientist to write a quick perl script to do this for you. We need jobs!


          Latest Articles


          • seqadmin
            A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
            by seqadmin

            ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

            01-24-2023, 01:19 PM
          • seqadmin
            Introduction to Single-Cell Sequencing
            by seqadmin
            Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

            The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
            01-09-2023, 03:10 PM