Unconfigured Ad

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • simonandrews
    Simon Andrews
    • May 2009
    • 870

    Originally posted by F_KVH View Post
    Dear Simon,
    Thanks for your help. I installed again & now it works. Do you think I need to install xcode app?
    Best regards
    No, you'd only need something like xcode if you wanted to edit the source code. It wouldn't be needed just to run it.

    Comment

    • boilermaker
      Junior Member
      • Apr 2014
      • 5

      Dealing with Random Hexamer Bias?

      Originally posted by lletourn View Post
      The illumina RNA protocol uses random hexamers to amplify the RNA. The thing is they are not 100% random so the beginning looks skewed for base composition, but that's because of the amplification.

      For mapping it's no problem. For assembly it might confuse some assemblers. (When assembling I would trim the 5' of RNA, not for mapping)
      Our group recently used the HiSeq 2000 platorm to generate transcriptome data (single-end, 50 bp reads). I have noticed that Illumina transcriptome sequencing yields typically yield these errors ("Per base sequence content" and "Per base GC content") during FastQC analysis. You suggest that these can safely be ignored when mapping to a genome? I wasn't sure if there was a "best practices" approach to dealing with these biases. Is there a mapping algorithm that is preferred among those who are dealing with Illumina transcriptome sequencing data?

      (I am sorry if this question has been answered thoroughly elsewhere in the forum... I have only just joined, and despite trying to navigate the posts with the "Search" tool, I have not yet come across an answer).

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        Originally posted by boilermaker View Post
        Our group recently used the HiSeq 2000 platorm to generate transcriptome data (single-end, 50 bp reads). I have noticed that Illumina transcriptome sequencing yields typically yield these errors ("Per base sequence content" and "Per base GC content") during FastQC analysis. You suggest that these can safely be ignored when mapping to a genome? I wasn't sure if there was a "best practices" approach to dealing with these biases.
        It would probably be better to ignore them than try to correct them, though if you posted the fastqc graphs it would be easier to say.
        Is there a mapping algorithm that is preferred among those who are dealing with Illumina transcriptome sequencing data?
        I prefer BBMap when mapping Illumina RNA-seq data. It's more robust to errors than other RNA-seq aligners, and doesn't require an annotation file. Oh, and I wrote it, but that's not why.

        Comment

        • boilermaker
          Junior Member
          • Apr 2014
          • 5

          Originally posted by Brian Bushnell View Post
          It would probably be better to ignore them than try to correct them, though if you posted the fastqc graphs it would be easier to say.
          Thank you Brian. I will certainly give BBMap a try (and thank you very much for scripting it!)

          I have attached a "typical" fastqc graphs (per base gc content, per base sequence content) from one of my datasets (most have profiles like this example).
          Attached Files

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            Doesn't look ideal, but I can't think of a good way to improve it, assuming you've already trimmed adapters (which can alter the base composition).

            Comment

            • fkrueger
              Senior Member
              • Sep 2009
              • 627

              The base composition bias you are seeing is very typical for RNA-Seq, and has indeed been mentioned on numerous posts here. The most relevant publication about this can be found here: http://nar.oxfordjournals.org/content/38/12/e131.full.

              In essence, the bias is normally introduced by the random priming step in the RNA-Seq library preparation which is not quite as random as you would hope it was. Trimming the first positions of every read wouldn't make any difference since the sequence would still align to the very same position.

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                Originally posted by fkrueger View Post
                The base composition bias you are seeing is very typical for RNA-Seq, and has indeed been mentioned on numerous posts here. The most relevant publication about this can be found here: http://nar.oxfordjournals.org/content/38/12/e131.full.

                In essence, the bias is normally introduced by the random priming step in the RNA-Seq library preparation which is not quite as random as you would hope it was. Trimming the first positions of every read wouldn't make any difference since the sequence would still align to the very same position.
                If I had a penny for every time this question has been asked (and answered)

                You (or Simon) really should make a sticky post with a few example plots about this (and k-mers). Will save many, a bunch of time and worry. Another suggestion would be to put a note on the FastQC page itself, so it would be visible to anyone downloading the software.
                Last edited by GenoMax; 04-30-2014, 04:10 PM.

                Comment

                • simonandrews
                  Simon Andrews
                  • May 2009
                  • 870

                  Originally posted by GenoMax View Post
                  If I had a penny for every time this question has been asked (and answered)

                  You (or Simon) really should make a sticky post with a few example plots about this (and k-mers). Will save many, a bunch of time and worry. Another suggestion would be to put a note on the FastQC page itself, so it would be visible to anyone downloading the software.
                  I don't think we, as normal users, can create sticky posts can we?

                  I've added some information about this topic to the FastQC help so at least there will be a bit more guidance in the next release. I'll also have to make a new video for that so I'll make sure to mention it there. This is something we talk about at some length in the RNA-Seq analysis courses I run since it is a true technical bias but just one we tend to ignore (mostly because of not having any other option).

                  It's maybe also worth noting that a similar bias now seems to be appearing in transposase fragmented libraries, so it's not just RNA-Seq libraries which see this.

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    Originally posted by simonandrews View Post
                    I don't think we, as normal users, can create sticky posts can we?
                    I was thinking that you can create a post and then PM ECO (or one of the other moderators) to see if they can make it sticky.

                    Originally posted by simonandrews View Post
                    I've added some information about this topic to the FastQC help so at least there will be a bit more guidance in the next release. I'll also have to make a new video for that so I'll make sure to mention it there.
                    Would it be useful to make this information available in the "Documentation" section of the main FastQC page where you have the Good/bad data set examples? Something along the lines of "this type of nucleotide distribution is normal if you have RNAseq data". Creating a video is a good idea but having the information available on a page that people can glance through (while their FastQC download is ongoing?) can make a better impact.

                    Comment

                    • simonandrews
                      Simon Andrews
                      • May 2009
                      • 870

                      Originally posted by GenoMax View Post
                      Would it be useful to make this information available in the "Documentation" section of the main FastQC page where you have the Good/bad data set examples?
                      That's exactly what I did. When the package updates to the next version it will be there.

                      Comment

                      • luc
                        Senior Member
                        • Dec 2010
                        • 469

                        Hi All,

                        has by chance anybody figured out a pipeline that would automatically produce/convert the FASTQC report as a single file -perhaps as a single PDF file?
                        In my eyes it would be great if one could run FASTQC from the command line and not have to deal with an HTML folder and multiple files.
                        Thanks in advance.

                        Comment

                        • simonandrews
                          Simon Andrews
                          • May 2009
                          • 870

                          Originally posted by luc View Post
                          Hi All,

                          has by chance anybody figured out a pipeline that would automatically produce/convert the FASTQC report as a single file -perhaps as a single PDF file?
                          In my eyes it would be great if one could run FASTQC from the command line and not have to deal with an HTML folder and multiple files.
                          Thanks in advance.
                          The next release of the program (which will be out by the end of next week if it kills me!) creates a single HTML file with embedded graphics so you will then be able to distribute just that file instead of having to keep the existing folder structure.

                          Comment

                          • luc
                            Senior Member
                            • Dec 2010
                            • 469

                            Hi Simon,

                            thanks a lot. That sounds very practical.

                            Comment

                            • simonandrews
                              Simon Andrews
                              • May 2009
                              • 870

                              After a very long gestation I've finally released a new version of FastQC (v0.11.1). This is now available from the project web site.

                              This is a major release of the software which introduces a load of new features. Some of the big changes are:
                              • Added configurable warn/fail thresholds for all modules
                              • Allow modules to be selectively turned off
                              • Added a per-tile quality plot for Illumina libraries
                              • Added an adapter content plot
                              • Improved the duplication plot
                              • Improved the Kmer module
                              • Used embedded graphics in the HTML output so you can distribute a single file
                              • Added the ability to read data from stdin
                              • Changed how base grouping works to better accommodate long reads
                              • Dropped support for Solexa64 format (NB not Phred 64 which is still supported) to avoid mis-detection errors


                              We've done a fair bit of testing on the new version but I'm aware that there's a lot of new code in there so please report any problems either directly into our bug tracking system or via email to [email protected]

                              Comment

                              • GenoMax
                                Senior Member
                                • Feb 2008
                                • 7142

                                2-6-13: Version 0.11.1 released
                                Shouldn't that be

                                2-6-14: Version 0.11.1 released
                                Thank you for adding the example reports on the FastQC page!

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  Yesterday, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 12:03 PM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, Yesterday, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...