Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bfantinatti
    Member
    • Aug 2012
    • 22

    Mpileup output

    Hello!
    I'm using "mpileup -p reference.fasta my_file.bam" to generate a file that gives me the coverage for each base. This command generates a file with 6 columns but only the column 1, 2 and 4 (scaffold, position and coverage, respectively) are useful for me in this case. A can easily delete the useless columns using awk after, but in order to save disk space (around 50% for each file) I need some alternative to generate the mpileup report whitout the useless columns already. I need only the 1st, 2nd and 4th columns only. Is there some parameter that give-me this? (sorry about my poor english).
  • swbarnes2
    Senior Member
    • May 2008
    • 910

    #2
    Pipe the mpileup output to awk (or cut would work too), and then as it gets each line, it will cut out the parts you want, and only output that.

    Comment

    • westerman
      Rick Westerman
      • Jun 2008
      • 1104

      #3
      'cut' is so much more simple than 'awk' (in my opinion, of course). To use 'cut' just do:

      mpileup -p reference.fasta my_file.bam | cut -f 1,2,4

      Comment

      • bfantinatti
        Member
        • Aug 2012
        • 22

        #4
        Hello

        I used pipe awk and worked very well, thank you for the two answers.
        Despite generated the files as I wanted, I detected a little problem in results.
        The columns of the coverage shows some wrong nombers compared to the grafical view of the assembly.
        I'm using Tablet (http://bioinf.scri.ac.uk/tablet/) and IGV (http://www.broadinstitute.org/igv/) for visualize the reads. And looking to both visual and mpileup output, some positions shows a different coverage number, for example: The positions 1-4 is has exactly the same coverage in Tablet, IGV and mpileup. But the position 5-8 shows me one coverage point more in tablet and IGV than in mpileup (was this clear for you?).
        Is there some error in Tablet and IGV or in mpileup output? Or is the mpileup disregarding some reads because some quality problem or other stuff, resulting in diferences in coverage value?
        Thank you

        Comment

        • swbarnes2
          Senior Member
          • May 2008
          • 910

          #5
          The two softwares might differ in how they treat anamalous reads, or reads with zero mapq. For instance, the default on mpileup is to ignore anamlous pairs, and you change that with the command line option -A. I bet IGV counts them all.

          Comment

          • bfantinatti
            Member
            • Aug 2012
            • 22

            #6
            Hello

            I tried using -A and worked very well Thank you for the answer.
            One more question:
            When I have some gap on assembly, mpileup jumps directly to the next position presenting a coverage:
            scaffold_0 1 4
            scaffold_0 2 4
            scaffold_0 3 4
            scaffold_0 7 8
            scaffold_0 8 8
            scaffold_0 9 8

            I need the positions with 0 coverage also be included on mpileup output. Something like this:
            scaffold_0 1 4
            scaffold_0 2 4
            scaffold_0 3 4
            scaffold_0 4 0
            scaffold_0 5 0
            scaffold_0 6 0
            scaffold_0 7 8
            scaffold_0 8 8
            scaffold_0 9 8

            Comment

            • dnusol
              Senior Member
              • Jul 2009
              • 136

              #7
              hi bfantinatti,

              did you manage to get positions with 0 coverage in your mpileup output?

              cheers,

              D.

              Comment

              • bfantinatti
                Member
                • Aug 2012
                • 22

                #8
                Yes

                Hello dnusol, yes i did. Sorry, I forgot to post the solution here. I got the solution on annother forum related to bash issues.
                The solution was to apply the following code:

                awk '($2-p2)>1{
                for(i=p2+1;i<$2;i++)
                print $1,i,0
                }
                {p2=$2}1' file

                This will add lines where its lacks, keeping the sequence of the second column, and adding 0 on the respective 3rd column.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                  Here are nine questions we think about, in roughly the order they matter, before...
                  Today, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 06:09 AM
                0 responses
                16 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                34 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                41 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                48 views
                0 reactions
                Last Post SEQadmin2  
                Working...