Seqanswers Leaderboard Ad

**Jeremy37** · 04-22-2011, 08:49 AM

Thanks for this nice tool!
I have a question about FastQC output. I would like to see the base quality statistics for each base position over my read length, but the output I'm getting is in bins. Are there any options to control this output?

The bins from the run on my data are individual bases from 1-9, then a bin for 10-14, 15-19, etc. The text output is below... where the column shows "#Base", but clearly this corresponds to the bin number (the bins are shown in the image produced) and not the base number.

>>Per base sequence quality pass
#Base Mean Median Lower Quartile Upper Quartile 10th Percentile 90th Percentile
1 37.91416393670764 39.0 38.0 39.0 36.0 39.0
2 37.7802692498972 39.0 38.0 39.0 36.0 39.0
3 37.75980052781196 39.0 38.0 39.0 36.0 39.0
4 37.72458631487738 39.0 38.0 39.0 36.0 39.0
5 37.64205081404644 39.0 37.0 39.0 35.0 39.0
6 37.70586245218606 39.0 38.0 39.0 36.0 39.0
7 37.66601699162634 39.0 37.0 39.0 35.0 39.0
8 37.64294997657492 39.0 37.0 39.0 35.0 39.0
9 37.62989555748825 39.0 37.0 39.0 35.0 39.0
10 35.65410371880296 37.4 34.6 39.0 29.2 39.0
11 27.368080692355903 27.8 22.6 33.0 17.2 36.6
12 32.93512849552606 35.6 29.8 38.2 23.0 38.6
13 36.13160789430458 38.0 35.4 39.0 31.2 39.0
14 36.36371473707043 38.0 36.0 39.0 32.4 39.0
15 36.05634409485578 38.0 36.0 39.0 31.2 39.0
16 35.925524253742545 37.8 36.0 39.0 31.2 39.0
17 35.6997651658535 37.2 35.4 39.0 30.6 39.0
18 35.28466949479758 37.0 35.0 39.0 29.8 39.0
19 34.77657154343021 37.0 34.4 39.0 28.6 39.0
20 34.148071746563744 36.8 33.8 39.0 27.0 39.0
21 33.42794421987793 36.0 33.0 38.8 24.8 39.0
22 32.38145771412453 35.6 31.8 37.2 21.6 39.0
23 32.798688106416485 36.6 32.6 38.4 21.8 39.0
24 32.71342747797581 36.8 32.6 38.8 20.0 39.0
25 31.64278511180337 36.0 31.2 38.4 7.8 39.0
26 30.307687847194188 35.6 29.4 37.8 2.0 39.0
27 28.901251293483632 35.0 27.4 37.0 2.0 39.0
28 27.45705418439078 34.0 24.0 37.0 2.0 39.0

Because I only have data for bins, it makes it hard to decide what base to trim my data to when there is a problem.
What do you think?

**simonandrews** · 04-22-2011, 10:59 AM

Originally posted by Jeremy37 View Post

Thanks for this nice tool!
I have a question about FastQC output. I would like to see the base quality statistics for each base position over my read length, but the output I'm getting is in bins. Are there any options to control this output?

Yes, if you've got the latest release and are running the command line version then you can add --nogroup to the command line options to disable the grouping behaviour (but be prepared for some pretty wide graphs if you're analysing really long reads). If you really want to do this in the interactive application you can add -Dfastqc.nogroup=true to the startup command but the graphs drawn inside the application will probably be too cramped to be of any use.

Originally posted by Jeremy37 View Post

The text output is below... where the column shows "#Base", but clearly this corresponds to the bin number (the bins are shown in the image produced)
and not the base number.

Thanks for spotting that - this is a bug which I've just fixed in the development branch. This will work properly in the next release.

**ilari.scheinin** · 05-05-2011, 06:20 AM

I recently started working with NGS data, and founds this program to be an excellent tool for QC. Thanks!

Originally posted by simonandrews View Post

We analyse our first and second read data separately. Although they come from the same insert there could easily be a problem with affected only the first or second read, and which would be difficult to spot if you concatonated the two files.

I'm seeing this a lot in my data. The second read files ( *_2.fastq) are generally of much lower quality than the first ones (*_1.fastq). The Per base sequence quality graph shows that the base quality drops much sooner for the second reads, sometimes right from the beginning. Is there something in particular that could be causing this, so that I could pinpoint something to the technicians generating the data (who are also just getting started with NGS)? The data is whole genome without enrichment, and paired-end Illumina.

**simonandrews** · 05-05-2011, 11:20 PM

Originally posted by ilari.scheinin View Post

I'm seeing this a lot in my data. The second read files ( *_2.fastq) are generally of much lower quality than the first ones (*_1.fastq). The Per base sequence quality graph shows that the base quality drops much sooner for the second reads, sometimes right from the beginning. Is there something in particular that could be causing this, so that I could pinpoint something to the technicians generating the data (who are also just getting started with NGS)?

It's pretty normal to have lower qualities in your second read than your first. The major cause of this seems to be the strength of signal you get after doing your first base incorporation. The strength of signal will drop during your first read with an attendant loss of quality. When you strip and rehybridise your second primer for the second read the signal intensity will increase again, but not quite back to the levels at the start of your first read (maybe 80% of the initial level).

Sequence quality doesn't really drop too much until the signal intensity gets to quite a low level. If you're seeing loss of quality much before 50bp on your first read then this would suggest that your first base incorporation wasn't great, and this will lead to you getting much worse qualities in your second read as the intensity degrades further.

**simonandrews** · 05-24-2011, 08:30 AM

Phil Ewels at Babraham has been doing some work on the CSS used on the FastQC reports and has come up with an improved layout which we're considering using by default in new releases of FastQC (although this will still be user configurable by editing the templates).

We've tested it on a number of browsers here and it seems to work OK, but I'd appreciate wider testing and feedback to root out any problems which may be lurking.

If you could look at the example reports below and let me know if you encounter any problems or usability issues with them I'd be very grateful.

Example report 1

Example report 2

Thanks

**ilari.scheinin** · 05-25-2011, 05:07 AM

Originally posted by simonandrews View Post

If you could look at the example reports below and let me know if you encounter any problems or usability issues with them I'd be very grateful.

The layout looks good on a computer browser (Chromium on Linux), but doesn't seem to work fully on an iPad. Scrolling doesn't work, but the reports are accessible through the links, as long as they completely fit on the screen.

Tablets are naturally not maybe the first priority in NGS applications, but I could see the potential use case of keeping an eye on the QC of your pipeline while on the road.

And thanks for the previous answer regarding the quality difference between first and second reads.

**simonandrews** · 05-25-2011, 05:25 AM

Originally posted by ilari.scheinin View Post

The layout looks good on a computer browser (Chromium on Linux), but doesn't seem to work fully on an iPad. Scrolling doesn't work, but the reports are accessible through the links, as long as they completely fit on the screen.

Thanks for the feedback. Looking into this it seems that there is a limitation on mobile Safari which doesn't allow scrolling in an overflowed container (which is what the new layout uses). The Android browser does allow this so it's not a limitation of the platform, just the way apple implements it.

There is a work round implemented in javascript which enables scrolling for IOS, but as I don't have an iPad to test on I'm not going to attempt to add this. It might be easier to leave this to sites which this might affect to implement this themselves (any site can customise the default CSS template shipped with fastqc).

**kmcarr** · 05-25-2011, 06:06 AM

Originally posted by simonandrews View Post

Thanks for the feedback. Looking into this it seems that there is a limitation on mobile Safari which doesn't allow scrolling in an overflowed container (which is what the new layout uses). The Android browser does allow this so it's not a limitation of the platform, just the way apple implements it.

There is a work round implemented in javascript which enables scrolling for IOS, but as I don't have an iPad to test on I'm not going to attempt to add this. It might be easier to leave this to sites which this might affect to implement this themselves (any site can customise the default CSS template shipped with fastqc).

To scroll in an overflowed container in mobile Safari you need to use "two-fingered" scrolling. Swipe up and down (left and right) using two fingers touching the screen within the container. I tested it with your sample pages and it works. Of course it is not as intuitive, the scrolling is not as smooth as normal scrolling and unless you know why the page isn't scrolling normally you wouldn't think to do it.

I love my iPad but I don't expect to do much NGS stuff on it.

**simonandrews** · 06-16-2011, 02:46 AM

FastQC v0.9.3 is now out. This includes the updated CSS theme discussed in the last few posts as well as bzip2 compression support when reading FastQ files.

Many thanks to all of those who provided feedback on the new theme.

**ewels** · 06-16-2011, 02:55 AM

FastQC v0.9.3 is now out.

Looking forward to seeing it in action!

**BAMseek** · 07-06-2011, 06:27 AM

diversity measure of FASTQ reads

We are keen to get feedback from other sites - in particular we'd like to know:
Are there other tests you think would be useful

I have found FastQC very useful in diagnosing the quality of my FASTQ and BAM files. Thank you for making it available. I have been working on a test of "diversity" of sequencing reads. Information about it can be found here, and download is available here. I have found this diversity plot to be useful in conjunction with the base-composition plot and k-mer plot in FastQC. It takes a page from the biodiversity guys and gals, and measures the evenness and richness of k-mers in your data. I would be interested to know if others find this useful.

thanks,
Justin

**simonandrews** · 07-06-2011, 11:16 PM

Justin,

Thanks for pointing out that diversity analysis. I think that could be a really good addition to the existing plots and doesn't look like it would be too hard to add. I'll take a look at this to see if we can accommodate it in a future release.

Simon.

**BAMseek** · 07-07-2011, 06:41 AM

Hi Simon,

Thanks for taking the time to look at this. I am glad you think it could be of use. I have been using it with the k-mer plot in FastQC. Since it is impractical to show all 4^k k-mer profiles (unless it is like the 1-mer case, which would give the 4 profiles of the base composition plot), showing a summary statistic of the k-mers via the diversity plot and the top enriched k-mers available in FastQC is useful. Let me know if I can help out in anyway. Thanks again for the neat tool.

Justin

**Clare S** · 07-12-2011, 10:52 PM

Hi Simon (& all),

Thanks very much for this software, very useful!
I've been trying it out and occasionally it will fail to process a file, and I can't work out why. It just terminates with something like:
Failed to process file sequences.fastq

The fastq files themselves look ok to me. My best guess is that they are too small for some of the statistics, as it seems to happen more often with smaller files and particularly when I was just testing a pipeline with quick little snippets of data.

Have others encountered this? Do you know what the cause actually is?

I think we have v0.9.1 of fastqc installed. Thanks!

**simonandrews** · 07-13-2011, 03:14 AM

Clare,

There was a problem with older versions of FastQC where the offiline application wouldn't produce a complete error report which told you exactly why a file had failed to be processed.

I've just put up a new version of the program (v0.9.4) which fixes this. If you run your files with this version then they'll still fail but you should see a proper error message saying why. If there's a problem which needs to be fixed then can you post the error message here (or email it to me), and I'll take a look.

[If you can't see v0.9.4 on our site, press shift+refresh in your browser to force our cache to update]

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News