Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • medalofhonour
    Member
    • Jul 2011
    • 18

    Plotting per chromosome coverage for BAM files

    Hi,

    I wish to plot coverage profiles for each of the chromosomes from my BAM files which I obtained after aligning exome sequencing data (hg19).

    I had initially thought of using samtools to generate pileup files, and then plot them however, that would take a long time and I guess that would be an inefficient way to go about it.

    Are there any R/other packages that can quickly help me get these plots ?

    Any suggestions would be much appreciated.
  • jimmybee
    Senior Member
    • Sep 2010
    • 119

    #2
    Bedtools can do it straight from the bam (genomeCoveragebed). Use the -d option for per-base coverage

    Comment

    • David.Shine
      Junior Member
      • Apr 2013
      • 1

      #3
      samtools depth can also generate per-base coverage directly. the result simply consist of chromose coordinates and its depth, it is not too hard to draw a coverage plot with R in my opioin

      Comment

      • simonandrews
        Simon Andrews
        • May 2009
        • 870

        #4
        SeqMonk can make these kinds of plots very quickly if you're happy with a more manual solution (see attached).
        Attached Files

        Comment

        • suryasaha
          Member
          • Mar 2011
          • 27

          #5
          I'd recommend bedtools genomeCoverage over samtools depth for this purpose. See http://www.biostars.org/p/67579/ for discussion. Samtools skips secondary alignments and aberrant read pairs. Plots can, of course, be generated in R.

          Comment

          • kadircaner
            Member
            • Jun 2011
            • 13

            #6
            Simon, could you please explain little further how we can generate such a plot in SeqMonk? I tried to find in documentation but couldnt find anything... Thanks!

            Comment

            • simonandrews
              Simon Andrews
              • May 2009
              • 870

              #7
              Originally posted by kadircaner View Post
              Simon, could you please explain little further how we can generate such a plot in SeqMonk? I tried to find in documentation but couldnt find anything... Thanks!
              The quickest way would be to simply load your data and then run Data > Quantitation Pipelines > Wiggle Plot

              In the options you'd set the region to cover to be the whole genome, and if you want real coverage values then turn off the "total count correction" option. Depending on the scale of your data you could try with or without the log transformation.

              Once you've done the quantitation you can select any dataset in the data view (top left) and it will be shown in the genome view, as in the plot I attached before. You can make the colours fixed by pressing the 4th button from the left in the toolbar, and you can save the image by going to File > Export Current View > Genome View.

              Hope this helps

              Comment

              • kadircaner
                Member
                • Jun 2011
                • 13

                #8
                Thanks Simon!!!

                Comment

                • zhoujiayi
                  Member
                  • Sep 2013
                  • 13

                  #9
                  Originally posted by simonandrews View Post
                  The quickest way would be to simply load your data and then run Data > Quantitation Pipelines > Wiggle Plot

                  In the options you'd set the region to cover to be the whole genome, and if you want real coverage values then turn off the "total count correction" option. Depending on the scale of your data you could try with or without the log transformation.

                  Once you've done the quantitation you can select any dataset in the data view (top left) and it will be shown in the genome view, as in the plot I attached before. You can make the colours fixed by pressing the 4th button from the left in the toolbar, and you can save the image by going to File > Export Current View > Genome View.

                  Hope this helps
                  Hi, Simon. I am new to SeqMonk and now I am using your method to get the genome coverage of my alignment file. The problem is for SeqMonk, I can only download genomes as GRCH37/NCBI34/35/36. However, the reference I used to do the alignment is HG19. Is it possible to load the genomes from my reference HG19 fa file? Or it doesn't matter, I can just choose any genome files, SeqMonk will give me the correct coverage?

                  Comment

                  • simonandrews
                    Simon Andrews
                    • May 2009
                    • 870

                    #10
                    Originally posted by zhoujiayi View Post
                    Hi, Simon. I am new to SeqMonk and now I am using your method to get the genome coverage of my alignment file. The problem is for SeqMonk, I can only download genomes as GRCH37/NCBI34/35/36. However, the reference I used to do the alignment is HG19. Is it possible to load the genomes from my reference HG19 fa file? Or it doesn't matter, I can just choose any genome files, SeqMonk will give me the correct coverage?
                    The NCBIXX/GRChXX vs hgXX all refer to exactly the same assemblies, it's just a nomenclature difference between different centres. As long as you pick the correct equivalent name they are completely interchangeable. NCBI36 is hg18 and GRCh37 is hg19.

                    Comment

                    • dariober
                      Senior Member
                      • May 2010
                      • 311

                      #11
                      Originally posted by medalofhonour View Post
                      Hi,

                      I wish to plot coverage profiles for each of the chromosomes from my BAM files which I obtained after aligning exome sequencing data (hg19).

                      I had initially thought of using samtools to generate pileup files, and then plot them however, that would take a long time and I guess that would be an inefficient way to go about it.

                      Are there any R/other packages that can quickly help me get these plots ?

                      Any suggestions would be much appreciated.
                      Hello- As an alternative solution... This one is command-line based and doesn't require to plot millions of datapoints. First, divide each chromosome in the human (or whatever) genome in windows of n bases, then count reads in each window. The output file is pretty small and easy to plot with R or anything else:

                      Code:
                      ## Get human chromosomes, if you don't have them already
                      mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
                      	"select chrom, size from hg19.chromInfo" > hg19.genome
                      
                      ## Make windows and count reads
                      bedtools makewindows -g hg19.genome -w 100000 \
                          | coverageBed -abam myaln.bam -b - -counts \
                          | sort -k 1,1 -k2,2n -k3,3n > myaln.counts.txt
                      
                      
                      ## Output example:
                      cat myaln.counts.txt
                      chr1    0       100000  17
                      chr1    100000  200000  10
                      chr1    200000  300000  16
                      chr1    300000  400000  0
                      chr1    400000  500000  0
                      chr1    500000  600000  33
                      chr1    600000  700000  32
                      ...
                      Dario

                      Comment

                      • sindrle
                        Senior Member
                        • Aug 2013
                        • 266

                        #12
                        When loading BAM into seqmonk I get 401980 warnings like this:

                        Reading position 195567590 was 95619bp beyond the end of chr1 (195471971)


                        What does that mean?


                        I trying to re-build my own genome GRCm38, just to check, it was the one I used for alignment.
                        Last edited by sindrle; 03-05-2014, 08:02 PM.

                        Comment

                        • Brian Bushnell
                          Super Moderator
                          • Jan 2014
                          • 2709

                          #13
                          Originally posted by medalofhonour View Post
                          Hi,

                          I wish to plot coverage profiles for each of the chromosomes from my BAM files which I obtained after aligning exome sequencing data (hg19).

                          I had initially thought of using samtools to generate pileup files, and then plot them however, that would take a long time and I guess that would be an inefficient way to go about it.

                          Are there any R/other packages that can quickly help me get these plots ?

                          Any suggestions would be much appreciated.
                          Creating a bam, then sorting, then indexing it, then doing what you want is inefficient. If you have enough memory to store the complete genome (around 3 bytes per reference base), I recommend using pileup.sh (from the BBMap package). You can use bbmap.sh and pipe output to pileup.sh, which will generate a coverage file as the only output, skipping sam/bam files, and requiring no sorting (at the expense of using more memory - around 4 bytes per reference base). You can also use already generated sam files (and bam files if you have samtools installed).

                          For example:
                          pileup.sh in=samfile.sam out=coverage.txt
                          Please run pileup.sh with no arguments to display more options. The default gives the output that people at JGI usually want.

                          Comment

                          • simonandrews
                            Simon Andrews
                            • May 2009
                            • 870

                            #14
                            Originally posted by sindrle View Post
                            When loading BAM into seqmonk I get 401980 warnings like this:

                            Reading position 195567590 was 95619bp beyond the end of chr1 (195471971)


                            What does that mean?


                            I trying to re-build my own genome GRCm38, just to check, it was the one I used for alignment.
                            Those errors almost certainly mean that you've used a different version of the genome assembly to do your mapping to the one you've used to create your project. You should go back and check. If you have samtools installed on your machine then you can run samtools view -H some.bam to look at the headers, which will include the command used to generate the file, which should in turn contain the name of the assembly.

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                              by SEQadmin2


                              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                              Here are nine questions we think about, in roughly the order they matter, before...
                              06-18-2026, 07:11 AM
                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              06-02-2026, 10:05 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Today, 11:10 AM
                            0 responses
                            6 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-17-2026, 06:09 AM
                            0 responses
                            42 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-09-2026, 11:58 AM
                            0 responses
                            102 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-05-2026, 10:09 AM
                            0 responses
                            124 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...