Originally posted by F_KVH
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Dealing with Random Hexamer Bias?
Originally posted by lletourn View PostThe illumina RNA protocol uses random hexamers to amplify the RNA. The thing is they are not 100% random so the beginning looks skewed for base composition, but that's because of the amplification.
For mapping it's no problem. For assembly it might confuse some assemblers. (When assembling I would trim the 5' of RNA, not for mapping)
(I am sorry if this question has been answered thoroughly elsewhere in the forum... I have only just joined, and despite trying to navigate the posts with the "Search" tool, I have not yet come across an answer).
Comment
-
Originally posted by boilermaker View PostOur group recently used the HiSeq 2000 platorm to generate transcriptome data (single-end, 50 bp reads). I have noticed that Illumina transcriptome sequencing yields typically yield these errors ("Per base sequence content" and "Per base GC content") during FastQC analysis. You suggest that these can safely be ignored when mapping to a genome? I wasn't sure if there was a "best practices" approach to dealing with these biases.
Is there a mapping algorithm that is preferred among those who are dealing with Illumina transcriptome sequencing data?
Comment
-
Originally posted by Brian Bushnell View PostIt would probably be better to ignore them than try to correct them, though if you posted the fastqc graphs it would be easier to say.
I have attached a "typical" fastqc graphs (per base gc content, per base sequence content) from one of my datasets (most have profiles like this example).
Comment
-
The base composition bias you are seeing is very typical for RNA-Seq, and has indeed been mentioned on numerous posts here. The most relevant publication about this can be found here: http://nar.oxfordjournals.org/content/38/12/e131.full.
In essence, the bias is normally introduced by the random priming step in the RNA-Seq library preparation which is not quite as random as you would hope it was. Trimming the first positions of every read wouldn't make any difference since the sequence would still align to the very same position.
Comment
-
Originally posted by fkrueger View PostThe base composition bias you are seeing is very typical for RNA-Seq, and has indeed been mentioned on numerous posts here. The most relevant publication about this can be found here: http://nar.oxfordjournals.org/content/38/12/e131.full.
In essence, the bias is normally introduced by the random priming step in the RNA-Seq library preparation which is not quite as random as you would hope it was. Trimming the first positions of every read wouldn't make any difference since the sequence would still align to the very same position.
You (or Simon) really should make a sticky post with a few example plots about this (and k-mers). Will save many, a bunch of time and worry. Another suggestion would be to put a note on the FastQC page itself, so it would be visible to anyone downloading the software.Last edited by GenoMax; 04-30-2014, 04:10 PM.
Comment
-
Originally posted by GenoMax View PostIf I had a penny for every time this question has been asked (and answered)
You (or Simon) really should make a sticky post with a few example plots about this (and k-mers). Will save many, a bunch of time and worry. Another suggestion would be to put a note on the FastQC page itself, so it would be visible to anyone downloading the software.
I've added some information about this topic to the FastQC help so at least there will be a bit more guidance in the next release. I'll also have to make a new video for that so I'll make sure to mention it there. This is something we talk about at some length in the RNA-Seq analysis courses I run since it is a true technical bias but just one we tend to ignore (mostly because of not having any other option).
It's maybe also worth noting that a similar bias now seems to be appearing in transposase fragmented libraries, so it's not just RNA-Seq libraries which see this.
Comment
-
Originally posted by simonandrews View PostI don't think we, as normal users, can create sticky posts can we?
Originally posted by simonandrews View PostI've added some information about this topic to the FastQC help so at least there will be a bit more guidance in the next release. I'll also have to make a new video for that so I'll make sure to mention it there.
Comment
-
Hi All,
has by chance anybody figured out a pipeline that would automatically produce/convert the FASTQC report as a single file -perhaps as a single PDF file?
In my eyes it would be great if one could run FASTQC from the command line and not have to deal with an HTML folder and multiple files.
Thanks in advance.
Comment
-
Originally posted by luc View PostHi All,
has by chance anybody figured out a pipeline that would automatically produce/convert the FASTQC report as a single file -perhaps as a single PDF file?
In my eyes it would be great if one could run FASTQC from the command line and not have to deal with an HTML folder and multiple files.
Thanks in advance.
Comment
-
After a very long gestation I've finally released a new version of FastQC (v0.11.1). This is now available from the project web site.
This is a major release of the software which introduces a load of new features. Some of the big changes are:
- Added configurable warn/fail thresholds for all modules
- Allow modules to be selectively turned off
- Added a per-tile quality plot for Illumina libraries
- Added an adapter content plot
- Improved the duplication plot
- Improved the Kmer module
- Used embedded graphics in the HTML output so you can distribute a single file
- Added the ability to read data from stdin
- Changed how base grouping works to better accommodate long reads
- Dropped support for Solexa64 format (NB not Phred 64 which is still supported) to avoid mis-detection errors
We've done a fair bit of testing on the new version but I'm aware that there's a lot of new code in there so please report any problems either directly into our bug tracking system or via email to [email protected]
Comment
Latest Articles
Collapse
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:55 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:55 AM
|
||
Started by seqadmin, 05-30-2024, 03:16 PM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
05-30-2024, 03:16 PM
|
||
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability
by seqadmin
Started by seqadmin, 05-29-2024, 01:32 PM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
05-29-2024, 01:32 PM
|
||
Started by seqadmin, 05-24-2024, 07:15 AM
|
0 responses
215 views
0 likes
|
Last Post
by seqadmin
05-24-2024, 07:15 AM
|
Comment