Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zlu
    Member
    • Nov 2008
    • 34

    overlaying coverage plots

    I'm trying to overlay coverage plots of individual chromosomes from different experiments to get a quick overview of probable CNVs. I've tried using simple xy plot, ggplot and plotrix packages of R (and I'm a real novice in R) but it seems that my linux machine with 64GB of memory is unable to handle the task. I've also reduced my file size by putting only the coordinates and the coverages derived from samtools pileup into a single file.

    Can someone comment on this and suggest a better and more memory efficient way of doing this? Thank you.
  • robsyme
    Junior Member
    • Jan 2009
    • 6

    #2
    I've found Hilbert Plots very helpful for chromosome coverage at a glance. Try http://www.bioconductor.org/packages...ilbertVis.html
    Bioconductor can encode coverage with efficient run length encoding. Your massive 64GB will be fine.
    -r

    Comment

    • henry.wood
      Member
      • Apr 2010
      • 63

      #3
      Are you doing the whole thing in R? I've been doing similar things and I've found it's a lot quicker getting the data ready in python or perl first before using R's plotting functions.
      I extract a simple list of start positions of each read from the SAM file and sort them by chromosome and position. Then I split the genome up into windows of either 50/100/500 Kb etc or 50/100/200 reads and make a file a line for each window and columns for chromosome, start, end, number of test reads and number of normal reads. I then import this file into R, and the plotting is much more painless.

      Comment

      • zlu
        Member
        • Nov 2008
        • 34

        #4
        Rosyme, Thanks for suggesting the HikbertVis package. It seems to plot the coverage without any problem. However, do you know how to adjust the scale on the y-axis? I have some exceptionally high coverage which skew the whole plot and ylim is not working.

        Henry, how do you decide which may be the best window size to use? Do you mind sharing your script? I was actually thinking about doing something similar to the maq cns2win.

        By the way, I think overlaying is probably not the right word but what I really want to do is superimposing one plot on top of another.
        Last edited by zlu; 08-14-2010, 06:12 AM.

        Comment

        • adamdeluca
          Member
          • Jul 2010
          • 95

          #5
          For a quick overview you can always upload a wig/bigwig file to the ucsc browser.

          Comment

          • henry.wood
            Member
            • Apr 2010
            • 63

            #6
            The best window size is one of those 'how long is a piece of string' questions. It's a signal versus noise question. If I want beautiful plots to put into a talk or impress my boss I use windows of 400 reads. If I want to see small deletions or amplifications I go down to 200 or 100, but the graphs look a bit messier. I tend to feed the data into the DNAcopy package from bioconductor. Using simulated data, it can pick up events using windows of 20 reads, even though the actual graph looks like a random mass of dots.
            My script is currently embarrassing. It's the first one I ever wrote and is a bit of an unholy mess. It needs to be manually installed onto a computer to work and uses my wife's birthday to know when to stop because I didn't know how to end a for loop. Is there any part of it you need and I might try and tidy it up. I have a colleague who is currently preparing a proper statistical package to do all this better than I ever could.

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 11:08 AM
            0 responses
            6 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            11 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            19 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            53 views
            0 reactions
            Last Post SEQadmin2  
            Working...