Seqanswers Leaderboard Ad

**tonybolger** · 05-23-2011, 09:16 AM

Originally posted by ForeignMan View Post

recently, I've been looking at two coverage plots from the same human material sequenced two times at (on average) 8x (run1) and at 30x (run2) sequence depth. I noticed (only by eye) a significant difference in the read distribution leading to high peaks in run2 and a bit messy picture, while the read distribution in run1 looks very "nice" and pretty flat. Is there a problem in the data or is this a usual picture when you deal with data of very high sequencing depth (> 30x)? Is there maybe some kind of "exponential" gain on special genomic regions like gc-rich / -poor, repetetive regions etc. that getting more and more significant the higher the sequencing depth gets?

Was it the same library used in each run or were two different libraries prepared?

Assuming the latter, at a guess, i'd say the PCR step during the second library prep has biased the result.

**Richard Finney** · 05-23-2011, 09:23 AM

How about posting a picture of it? ("worth a thousand words")

What *seq is it? Whole? rnaseq? chipseq?

If your output is BAM, try the rmdup command on the bam and for take a look at the rmdup'ed output bam file.

**ForeignMan** · 05-24-2011, 12:42 AM

Thanks for your answers and sorry I was lacking so much information. Was hoping it might be a quite general or even normal effect.
So, the whole genome has been (paired-end) sequenced two times (100bp per read). For each run a new library has been prepared. Additionally, the first run comes from Illumina's GA II, the second one from the new HiSeq2000.
I have the alignment (used BWA) in BAM format and removed duplicates with Picard.
I attached an example image of chromosome 1 (run1 is grey, run2 yellow; y-axis runs from 0-40; coverage has been computed over 100.000bp windows). It does not look that bad, but I was just wondering if these deviations, ups and downs, can only be explained by the different sequencing conditions (library, technology) or if you have to expect this in data with high sequence depth. I'm also interested in doing a copynumber analysis with this sequencing data and was asking myself if this is a common effect that can be reduced by normalization (by gc-content, mappability regions etc.) or if the data is really biased.
Thank you for your help and interest.

**Richard Finney** · 05-24-2011, 07:51 AM

I cant see http://imiblinux05.uni-muenster.de/~...s_coverage.jpg

Error message is : 404

Not Found

The requested URL /~c_bart07/sc_circos_coverage.jpg was not found on this server.

**ForeignMan** · 05-24-2011, 11:59 PM

Strange ... I can see the image here and is has a complete different URL.
Maybe this direct link works:
http://s2.postimage.org/tt5qlhll7/sc...s_coverage.jpg

**Richard Finney** · 05-25-2011, 04:27 AM

Yes, it should be flatter. (or "rounder", in your image). Good example: http://postimage.org/image/1ohvgyx6s/ You can see tumor copy number changes. My image is log scaled, not logged would look even flatter.

Note the anomaly of high coverage next to centromere on short arm (a frequent occurrence near repeated regions near the centromeres). The "low coverage" has it, the high doesn't. It should be flatter.

I do not know what is wrong and can only recommend some desperate measures: 1) don't remove duplicates and see. 2) take random sample of reads and check that they're lining up as reported (you're not displaying hg18 alignments on hg19 display, for instance).

Otherwise, it's not flat (or flatter) but should be.

Attached Files

xx.jpg (42.8 KB, 55 views)

**tonybolger** · 05-25-2011, 07:39 AM

Originally posted by ForeignMan View Post

Strange ... I can see the image here and is has a complete different URL.
Maybe this direct link works:
http://s2.postimage.org/tt5qlhll7/sc...s_coverage.jpg

Agreed - it's a bit odd. Then again, even the first one isn't exactly flat, it has a broadly similar profile except lower.

Incidentally, how did you do the alignment? All reads against all chromosomes - i assume the material wasn't pre-separated per chromosome? And what did you do with ambiguous reads?

**Richard Finney** · 05-25-2011, 07:57 AM

These are TCGA reads from various sources. You can view the bigwig (zoomable wiggle files) coverage tracks for various diseases at cgwb.nci.nih.gov . Check the various NG tracks. I'm sure the various TCGA research institutions align whole genome reads against all chromosomes (or at least chr1-22,X,Y,M , not sure about the "random" or "unattached" genomic chunks), with no chromosome separation. BWA is the weapon of mass alignment used in most TCGA samples (all TCGA bams? ... I'm not sure). BWA assigns ambiguous reads randomly, i.e. it just picks one of the alignments. SNP calling in ambiguous regions is hard.

I'm wondering if there's some sort of "accordion effect" going in your circular view. Imagine taking an accordion and wrapping it around into an O shape: the inner edge is the same length as starting flat length but outer edge is wavy and longer. There may be an exaggeration effect.

There is some vague resemblance to high "mountain ranges" and CG content, I must admit.

Another desperate check : did you align all whole reads against one chromosome only? probably not i hope

**ForeignMan** · 05-25-2011, 08:28 AM

Thanks a lot for your comments! And for the link to cgwb.nci.nih.gov.

Your guess was right, Richard. I used BWA for the alignment and ambiguos reads were assigned randomly. And, of course, I aligned to the complete human genome, not only to chromosome 1. Chose this one only for the example to save some space, and since no copynumber change is expected for this chromosome. The profile looks very similar over the complete genome.

I don't think that this "accordion effect" should be very significant here, although I really like the image. Then, apart from the radius, this effect holds for all coverage profiles. Of course, one has to be careful analysing such dense plots, but I think it works for a quick comparison since all datasets were plotted under the same conditions. If the coverage profile would have been good you could definitely see it here

.

But I agree to tony noting the similarity to the lower profile. That's also why I got the idea of some kind of stronger deviations having a higher sequence depth depth (like, getting very naive now, having four times the deviation when having four times the depth). And this in correlation with specific regions on the genome. Although Richard's plots and the browser on cgwb.nci.nih.gov look very nice and somehow as I'd expected in my case.

I did a copynumber analysis on this "wavy" data (used FREEC) and the copynumber profile looked quite ok, similar to the other "good" one. Although having a few more (but not so very much) artificial gains and losses. The normalization seems to take effect. I was asking myself if there's a common tools that perform only some kind of normalization on alignment data.

Thanks again for all your help and ideas!

**tonybolger** · 05-26-2011, 03:23 AM

Originally posted by ForeignMan View Post

That's also why I got the idea of some kind of stronger deviations having a higher sequence depth depth (like, getting very naive now, having four times the deviation when having four times the depth).

Not sure i understand you.

I would expect that 4x the coverage will have very close 4x the deviation from the mean of the coverage (so about the same coefficient of variance) - over a 100K window, poisson noise should be negligible - and every other source of bias should just scale up.

**ForeignMan** · 05-26-2011, 03:50 AM

"Coefficient of variance" is exactly what I meant. Thanks tony! I was not aware of this measure and it confirms (a bit) that both runs are not so very different and that the deviations and bias scale up. Although it's still quite extreme and not very usual in this case, it helps me understanding the results. I know that the whole experiment is a bit biased, so I guess I had to expect this kind of image.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Read distribution at high sequence depth

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News