Mapping with high error rates is not an issue, but there could be edge effects if you are mapping to short contigs. Is this using a short read de novo assembly?
Looking at GC content of the original reads without mapping can be problematic if read trimming isn't working well. I believe it is working somewhat well with the latest software release.
-mark
Announcement
Collapse
No announcement yet.
X
-
Well... if the gc-distribution of the reads looks identical to the gc-distribution of the reference, that implies no bias, assuming all of the reads originated from that reference. Mapping will give you a much better indication of gc bias, but then you no longer know if the gc bias came from the raw reads or from the mapping. So with PacBio reads, which are hard to map due to the high error rates, I'd just compare the ref gc dist to the read gc dist without mapping.
Note that the reference could be wrong, too. High and low gc areas have lower complexity and thus are more likely to be repetitive, and collapsed by an assembler. So if you have a poor assembly or a highly-repetitive organism, it's possible that the seemingly higher coverage of extreme gc areas is actually due to the fact that they are collapsed repeats.
Leave a comment:
-
Originally posted by Brian Bushnell View PostThere's still an issue of blasr being used. Can you just plot the raw gc dist of the raw reads and the raw fasta?
But that doesn't really reveal gc-related coverage-bias?
Leave a comment:
-
There's still an issue of blasr being used. Can you just plot the raw gc dist of the raw reads and the raw fasta?
Leave a comment:
-
Originally posted by Brian Bushnell View PostThere is no noticeable GC-bias in the raw reads. The error-corrected reads are a different matter; that would depend on many factors like repeat composition of the organism and biases in the algorithm. I would expect neutral-GC areas to error-correct better than high or low, so that graph does seem odd, but it could also be a problem with the way the data was normalized, or mapped. I would rather see a GC-content profile of the raw and error-corrected reads; the graph you displayed, in my opinion, has too much processing to be able to correlate it well with the bias of the reads themselves.
To see if this was the case I tried to run BLASR on all my filtered reads (Polymerase read quality > 0.75, Readlength > 50) which is not error corrected.
BLASR was run just as the previous one, and again plotted GC bias with Picard.
The graph looks almost the same: http://imgur.com/d5pdxsC
Could it be a problem with Picard and the longer reads, or do I really have a bias like that?
Leave a comment:
-
There is no noticeable GC-bias in the raw reads. The error-corrected reads are a different matter; that would depend on many factors like repeat composition of the organism and biases in the algorithm. I would expect neutral-GC areas to error-correct better than high or low, so that graph does seem odd, but it could also be a problem with the way the data was normalized, or mapped. I would rather see a GC-content profile of the raw and error-corrected reads; the graph you displayed, in my opinion, has too much processing to be able to correlate it well with the bias of the reads themselves.
Leave a comment:
-
GC-bias
Hello!
As far as I know, Pacbio sequencing shouldn't have any GC-bias at all, or at least very small.
When I was comparing GC-bias in a couple of samples (using Picard's CollectGcBiasMetrics) sequenced with different technologies I noticed that the Pacbio graph appeared particularly strange: http://i57.tinypic.com/2rxbgc0.png
Basically an inverted normal-distribution rather than a flat line if no bias.
The GC-bias is calculated from a bamfile where the PBcR are aligned using BLASR back to the scaffolds produced in an assembly.
The graphs I produced for other technologies looks more or less as expected.
Anyone know what is going on, or am I doing something terribly wrong?
Something due to the fact that the reads are error-corrected?
Latest Articles
Collapse
-
by seqadmin
At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...-
Channel: Articles
09-26-2023, 06:26 AM -
-
by seqadmin
Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...-
Channel: Articles
09-07-2023, 11:15 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 07:14 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:14 AM
|
||
Started by seqadmin, 09-29-2023, 09:38 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
09-29-2023, 09:38 AM
|
||
Started by seqadmin, 09-27-2023, 06:57 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-27-2023, 06:57 AM
|
||
Started by seqadmin, 09-26-2023, 07:53 AM
|
0 responses
31 views
0 likes
|
Last Post
by seqadmin
09-26-2023, 07:53 AM
|
Leave a comment: