Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Thanks for this nice tool!
    I have a question about FastQC output. I would like to see the base quality statistics for each base position over my read length, but the output I'm getting is in bins. Are there any options to control this output?

    The bins from the run on my data are individual bases from 1-9, then a bin for 10-14, 15-19, etc. The text output is below... where the column shows "#Base", but clearly this corresponds to the bin number (the bins are shown in the image produced) and not the base number.

    >>Per base sequence quality pass
    #Base Mean Median Lower Quartile Upper Quartile 10th Percentile 90th Percentile
    1 37.91416393670764 39.0 38.0 39.0 36.0 39.0
    2 37.7802692498972 39.0 38.0 39.0 36.0 39.0
    3 37.75980052781196 39.0 38.0 39.0 36.0 39.0
    4 37.72458631487738 39.0 38.0 39.0 36.0 39.0
    5 37.64205081404644 39.0 37.0 39.0 35.0 39.0
    6 37.70586245218606 39.0 38.0 39.0 36.0 39.0
    7 37.66601699162634 39.0 37.0 39.0 35.0 39.0
    8 37.64294997657492 39.0 37.0 39.0 35.0 39.0
    9 37.62989555748825 39.0 37.0 39.0 35.0 39.0
    10 35.65410371880296 37.4 34.6 39.0 29.2 39.0
    11 27.368080692355903 27.8 22.6 33.0 17.2 36.6
    12 32.93512849552606 35.6 29.8 38.2 23.0 38.6
    13 36.13160789430458 38.0 35.4 39.0 31.2 39.0
    14 36.36371473707043 38.0 36.0 39.0 32.4 39.0
    15 36.05634409485578 38.0 36.0 39.0 31.2 39.0
    16 35.925524253742545 37.8 36.0 39.0 31.2 39.0
    17 35.6997651658535 37.2 35.4 39.0 30.6 39.0
    18 35.28466949479758 37.0 35.0 39.0 29.8 39.0
    19 34.77657154343021 37.0 34.4 39.0 28.6 39.0
    20 34.148071746563744 36.8 33.8 39.0 27.0 39.0
    21 33.42794421987793 36.0 33.0 38.8 24.8 39.0
    22 32.38145771412453 35.6 31.8 37.2 21.6 39.0
    23 32.798688106416485 36.6 32.6 38.4 21.8 39.0
    24 32.71342747797581 36.8 32.6 38.8 20.0 39.0
    25 31.64278511180337 36.0 31.2 38.4 7.8 39.0
    26 30.307687847194188 35.6 29.4 37.8 2.0 39.0
    27 28.901251293483632 35.0 27.4 37.0 2.0 39.0
    28 27.45705418439078 34.0 24.0 37.0 2.0 39.0


    Because I only have data for bins, it makes it hard to decide what base to trim my data to when there is a problem.
    What do you think?

    Comment


    • Originally posted by Jeremy37 View Post
      Thanks for this nice tool!
      I have a question about FastQC output. I would like to see the base quality statistics for each base position over my read length, but the output I'm getting is in bins. Are there any options to control this output?
      Yes, if you've got the latest release and are running the command line version then you can add --nogroup to the command line options to disable the grouping behaviour (but be prepared for some pretty wide graphs if you're analysing really long reads). If you really want to do this in the interactive application you can add -Dfastqc.nogroup=true to the startup command but the graphs drawn inside the application will probably be too cramped to be of any use.

      Originally posted by Jeremy37 View Post
      The text output is below... where the column shows "#Base", but clearly this corresponds to the bin number (the bins are shown in the image produced)
      and not the base number.
      Thanks for spotting that - this is a bug which I've just fixed in the development branch. This will work properly in the next release.

      Comment


      • I recently started working with NGS data, and founds this program to be an excellent tool for QC. Thanks!

        Originally posted by simonandrews View Post
        We analyse our first and second read data separately. Although they come from the same insert there could easily be a problem with affected only the first or second read, and which would be difficult to spot if you concatonated the two files.
        I'm seeing this a lot in my data. The second read files ( *_2.fastq) are generally of much lower quality than the first ones (*_1.fastq). The Per base sequence quality graph shows that the base quality drops much sooner for the second reads, sometimes right from the beginning. Is there something in particular that could be causing this, so that I could pinpoint something to the technicians generating the data (who are also just getting started with NGS)? The data is whole genome without enrichment, and paired-end Illumina.

        Comment


        • Originally posted by ilari.scheinin View Post
          I'm seeing this a lot in my data. The second read files ( *_2.fastq) are generally of much lower quality than the first ones (*_1.fastq). The Per base sequence quality graph shows that the base quality drops much sooner for the second reads, sometimes right from the beginning. Is there something in particular that could be causing this, so that I could pinpoint something to the technicians generating the data (who are also just getting started with NGS)?
          It's pretty normal to have lower qualities in your second read than your first. The major cause of this seems to be the strength of signal you get after doing your first base incorporation. The strength of signal will drop during your first read with an attendant loss of quality. When you strip and rehybridise your second primer for the second read the signal intensity will increase again, but not quite back to the levels at the start of your first read (maybe 80% of the initial level).

          Sequence quality doesn't really drop too much until the signal intensity gets to quite a low level. If you're seeing loss of quality much before 50bp on your first read then this would suggest that your first base incorporation wasn't great, and this will lead to you getting much worse qualities in your second read as the intensity degrades further.

          Comment


          • Phil Ewels at Babraham has been doing some work on the CSS used on the FastQC reports and has come up with an improved layout which we're considering using by default in new releases of FastQC (although this will still be user configurable by editing the templates).

            We've tested it on a number of browsers here and it seems to work OK, but I'd appreciate wider testing and feedback to root out any problems which may be lurking.

            If you could look at the example reports below and let me know if you encounter any problems or usability issues with them I'd be very grateful.

            Example report 1

            Example report 2

            Thanks

            Comment


            • Originally posted by simonandrews View Post
              If you could look at the example reports below and let me know if you encounter any problems or usability issues with them I'd be very grateful.
              The layout looks good on a computer browser (Chromium on Linux), but doesn't seem to work fully on an iPad. Scrolling doesn't work, but the reports are accessible through the links, as long as they completely fit on the screen.

              Tablets are naturally not maybe the first priority in NGS applications, but I could see the potential use case of keeping an eye on the QC of your pipeline while on the road.

              And thanks for the previous answer regarding the quality difference between first and second reads.

              Comment


              • Originally posted by ilari.scheinin View Post
                The layout looks good on a computer browser (Chromium on Linux), but doesn't seem to work fully on an iPad. Scrolling doesn't work, but the reports are accessible through the links, as long as they completely fit on the screen.
                Thanks for the feedback. Looking into this it seems that there is a limitation on mobile Safari which doesn't allow scrolling in an overflowed container (which is what the new layout uses). The Android browser does allow this so it's not a limitation of the platform, just the way apple implements it.

                There is a work round implemented in javascript which enables scrolling for IOS, but as I don't have an iPad to test on I'm not going to attempt to add this. It might be easier to leave this to sites which this might affect to implement this themselves (any site can customise the default CSS template shipped with fastqc).
                Last edited by simonandrews; 05-25-2011, 05:26 AM. Reason: Added forgotten link.

                Comment


                • Originally posted by simonandrews View Post
                  Thanks for the feedback. Looking into this it seems that there is a limitation on mobile Safari which doesn't allow scrolling in an overflowed container (which is what the new layout uses). The Android browser does allow this so it's not a limitation of the platform, just the way apple implements it.

                  There is a work round implemented in javascript which enables scrolling for IOS, but as I don't have an iPad to test on I'm not going to attempt to add this. It might be easier to leave this to sites which this might affect to implement this themselves (any site can customise the default CSS template shipped with fastqc).
                  To scroll in an overflowed container in mobile Safari you need to use "two-fingered" scrolling. Swipe up and down (left and right) using two fingers touching the screen within the container. I tested it with your sample pages and it works. Of course it is not as intuitive, the scrolling is not as smooth as normal scrolling and unless you know why the page isn't scrolling normally you wouldn't think to do it.

                  I love my iPad but I don't expect to do much NGS stuff on it.

                  Comment


                  • FastQC v0.9.3 is now out. This includes the updated CSS theme discussed in the last few posts as well as bzip2 compression support when reading FastQ files.

                    Many thanks to all of those who provided feedback on the new theme.

                    Comment


                    • FastQC v0.9.3 is now out.
                      Looking forward to seeing it in action!

                      Comment


                      • diversity measure of FASTQ reads

                        We are keen to get feedback from other sites - in particular we'd like to know:
                        Are there other tests you think would be useful
                        I have found FastQC very useful in diagnosing the quality of my FASTQ and BAM files. Thank you for making it available. I have been working on a test of "diversity" of sequencing reads. Information about it can be found here, and download is available here. I have found this diversity plot to be useful in conjunction with the base-composition plot and k-mer plot in FastQC. It takes a page from the biodiversity guys and gals, and measures the evenness and richness of k-mers in your data. I would be interested to know if others find this useful.

                        thanks,
                        Justin

                        Comment


                        • Justin,

                          Thanks for pointing out that diversity analysis. I think that could be a really good addition to the existing plots and doesn't look like it would be too hard to add. I'll take a look at this to see if we can accommodate it in a future release.

                          Simon.

                          Comment


                          • Hi Simon,

                            Thanks for taking the time to look at this. I am glad you think it could be of use. I have been using it with the k-mer plot in FastQC. Since it is impractical to show all 4^k k-mer profiles (unless it is like the 1-mer case, which would give the 4 profiles of the base composition plot), showing a summary statistic of the k-mers via the diversity plot and the top enriched k-mers available in FastQC is useful. Let me know if I can help out in anyway. Thanks again for the neat tool.

                            Justin

                            Comment


                            • Hi Simon (& all),

                              Thanks very much for this software, very useful!
                              I've been trying it out and occasionally it will fail to process a file, and I can't work out why. It just terminates with something like:
                              Failed to process file sequences.fastq

                              The fastq files themselves look ok to me. My best guess is that they are too small for some of the statistics, as it seems to happen more often with smaller files and particularly when I was just testing a pipeline with quick little snippets of data.

                              Have others encountered this? Do you know what the cause actually is?

                              I think we have v0.9.1 of fastqc installed. Thanks!

                              Comment


                              • Clare,

                                There was a problem with older versions of FastQC where the offiline application wouldn't produce a complete error report which told you exactly why a file had failed to be processed.

                                I've just put up a new version of the program (v0.9.4) which fixes this. If you run your files with this version then they'll still fail but you should see a proper error message saying why. If there's a problem which needs to be fixed then can you post the error message here (or email it to me), and I'll take a look.

                                [If you can't see v0.9.4 on our site, press shift+refresh in your browser to force our cache to update]

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                31 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                33 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X