Header Leaderboard Ad
Collapse
Two peaks on FastQC plot "Per sequence GC content"
Collapse
Announcement
Collapse
SEQanswers June Challenge Has Begun!
The competition has begun! We're giving away a $50 Amazon gift card to the member who answers the most questions on our site during the month. We want to encourage our community members to share their knowledge and help each other out by answering questions related to sequencing technologies, genomics, and bioinformatics. The competition is open to all members of the site, and the winner will be announced at the beginning of July. Best of luck!
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
See more
See less
X
-
As mentioned above, the two peaks could very well be a sign of a mixed sample (contamination).
You could remove the all the high GC content reads and see if this improves the assembly.
BBtools (BBduk?) has a GC content filter.
Leave a comment:
-
my fastq GC content report has two peaks.can any one help me how i can assemble these type of data?Attached Files
Leave a comment:
-
Unfortunately, it looks like that tool does not merge reads with insert size shorter than read length, which was the point of the exercise. But from the graph I can infer that maybe 30% of the reads are indeed in that category, so there are a few possibilities:
1) The twin peaks are indeed from exon-capture bias, though I kind of doubt that, as it does not explain why trimming the reads would reduce it; and I would have expected such a bias to shift the peak center rather than creating a bimodal distribution, but of course it depends on the bait design.
2) There is an exonic and intronic peak, or gene and non-gene peak. The GC content of a gene changes markedly once you get just outside of its bounds. For example, just upstream of the gene, it becomes very AT-rich, IIRC. But, I don't really like that explanation either.
3) The adapter-trimming is unsuccessful or incomplete. From your GC content by base position, it looks fairly flat across the read, aside from the first 20 bp... so that doesn't make much sense either. Still, it wouldn't hurt to confirm. What were the total percent of reads and bases trimmed during adapter-trimming? I would expect something like 30% of the reads and maybe 5-10% of the bases. If you are using Nextera adapters, be sure you use those sequences for trimming.
I suggest that you bin some of your reads by GC - just split them into pairs with GC<50% and GC>50%. Map both to human and look at the mapping rates (ideally, forcing unclipped global alignments). If they are equivalent, then the issue is not caused by contamination or adapter sequence, and it's probably safe to ignore.
You can split the reads by GC content with my reformat tool:
reformat.sh in1=read1.fq in2=read2.fq out1=low1.fq out2=low2.fq maxgc=0.5
reformat.sh in1=read1.fq in2=read2.fq out1=high1.fq out2=high2.fq mingc=0.5
Leave a comment:
-
Thanks for your response. I first have to mention that I don't have a very strong background in bioinformatics and am using the CLC Genomics Workbench (ver. 7.5) which has a GUI and runs on Windows. I have used the Workbench's 'Merge Overlapping Pairs' function to generate the histogram below (I'm guessing it's similar to the BBMerge mentioned by Brian). I also haven't used the FASTQC but the native QC check in the Workbench. I'm attaching the output here. As you can see there is no severe drop in quality along the reads and besides the peaks in GC content observed at the end of the read (as I understand it, typical for Illumina data), the GC content along read length is around 45%. And the samples are human.
Leave a comment:
-
Would you be able to post all of the FastQC output plots for comparison with other runs. For now, I would mention that Exome capture does not sample genome randomly, so it is not unusual to see what you are reporting.
Leave a comment:
-
This is sometimes a sign of contamination, though if trimming the reads reduces it, that's a bit odd. Is this supposed to be human data? Human should peak around 50%, which does not correspond to either of your peaks. The most important question is what organism this is supposed to be, and what it's average GC% is.
Also, please post an insert-size histogram, which will help determine if the problem is caused by short inserts. You can get one quickly using BBMerge:
bbmerge.sh in1=read1.fq in2=read2.fq ihist=ihist.txt
Leave a comment:
-
Hi!
I have two problems: one is two peaks in the per sequence GC-content and another is a weird profile which I'm attaching here.
We're trying out Agilent's SureSelect enrichment protocol for Exome-Seq and have just concluded our first run on samples that were already done before using Illumina's Nextera kit (so we have another run with which to compare our results). The first run was sequenced on the Illumina HiSeq while this run was done on a MiSeq. Also, the first run was a 100bp paired end run while this was 150bp paired end run. Anyway, upon running a QC on the Fastq files I got this weird profile for the per-sequence GC content. I had already removed the low-quality reads and trimmed the adaptors but that didn't change anything. The only thing that helped was trimming 25 nucleotides from each end of the reads. Since we lose a lot of information that way, I'd prefer not to do this and want to ask if anyone has seen anything like this. I have no idea what might cause this.Attached FilesLast edited by Khillo81; 10-14-2014, 04:55 AM.
Leave a comment:
-
Originally posted by simonandrews View PostThe per base GC plot was removed in the latest version since it mostly replicated information which was in the per base composition plot. You should still be able to see the biased positions as a deviation in the composition of C or G content at the same positions, but it's possible it's not enough of a deviation to trigger a warning.
I assume everything is OK then since the GC content in the specie s around 40%,
It was an Nextera MiSeq bacterial genome sequencing experiment.
Thank you very much for your helpAttached FilesLast edited by chariko; 08-19-2014, 01:40 AM.
Leave a comment:
-
Originally posted by chariko View PostI updated FastQC to the 11.2 version and my error disappeared. I wonder it was an old version problem...
Leave a comment:
-
Originally posted by nucacidhunter View PostI think it will be helpful if you could provide more information such as library type, input material, kit used for library prep and graphs from new version of FastQC.
Leave a comment:
-
I think it will be helpful if you could provide more information such as library type, input material, kit used for library prep and graphs from new version of FastQC.
Leave a comment:
-
Originally posted by simonandrews View PostThey might show some effects. If you have adapter dimers then you'll see the adapter sequence superimposed on the sequence content graphs. If your adapters have markedly different GC content than your library in general then you might also see an overall effect on the GC level.
In the latest fastqc release there is a graph specifically to measure adapter content which will show exactly what proportion of the library is composed of read-through adapter which will illustrate this much better than trying to use sequence content plots.
[PASS] Basic Statistics
[PASS] Per base sequence quality
[PASS] Per sequence quality scores
[FAIL] Per base sequence content
[FAIL] Per base GC content
[WARNING] Per sequence GC content
[PASS] Per base N content
[WARNING] Sequence Length Distribution
[WARNING] Sequence Duplication Levels
[WARNING] Overrepresented sequences
[WARNING] Kmer Content
Oversequencing is probably not the problem because in fact I obtained less reads as expected. Could it be due to a adaptor problem? Any clue would be really appreciatedAttached Files
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Developments in sequencing technologies and methodologies have transformed the field of epigenetics, giving researchers a better way to understand the complex world of gene regulation and heritable modifications. This article explores some of the diverse sequencing methods employed in the study of epigenetics, ranging from classic techniques to cutting-edge innovations while providing a brief overview of their processes, applications, and advances.
Methylation Detect...-
Channel: Articles
05-31-2023, 10:46 AM -
-
Differential Expression and Data Visualization: Recommended Tools for Next-Level Sequencing Analysisby seqadmin
After covering QC and alignment tools in the first segment and variant analysis and genome assembly in the second segment, we’re wrapping up with a discussion about tools for differential gene expression analysis and data visualization. In this article, we include recommendations from the following experts: Dr. Mark Ziemann, Senior Lecturer in Biotechnology and Bioinformatics, Deakin University; Dr. Medhat Mahmoud Postdoctoral Research Fellow at Baylor College of Medicine;...-
Channel: Articles
05-23-2023, 12:26 PM -
-
by seqadmin
Continuing from our previous article, we share variant analysis and genome assembly tools recommended by our experts Dr. Medhat Mahmoud, Postdoctoral Research Fellow at Baylor College of Medicine, and Dr. Ming "Tommy" Tang, Director of Computational Biology at Immunitas and author of From Cell Line to Command Line.
Variant detection and analysis tools
Mahmoud classifies variant detection work into two main groups: short variants (<50...-
Channel: Articles
05-19-2023, 10:03 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 06-07-2023, 07:14 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
06-07-2023, 07:14 AM
|
||
Started by seqadmin, 06-06-2023, 01:08 PM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
06-06-2023, 01:08 PM
|
||
Started by seqadmin, 06-01-2023, 08:56 PM
|
0 responses
164 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 08:56 PM
|
||
Deep Sequencing Unearths Novel Genetic Variants: Enhancing Precision Medicine for Vascular Anomalies
by seqadmin
Started by seqadmin, 06-01-2023, 07:33 AM
|
0 responses
299 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 07:33 AM
|
Leave a comment: