Seqanswers Leaderboard Ad

**maubp** · 07-11-2012, 01:15 AM

Personally I would first try to redownload the FASTQ file in case it was corrupted over the network, and if applicable repeat the decompression as well - again, just in case there is a bad sector on your drive or something. It might also be worth running a test on your RAM (e.g. memcheck) to make sure that is working fine - otherwise you can get problems from that too, e.g. bases flipping as in http://mira-assembler.sourceforge.ne...onus_part.html

**SillyPoint** · 07-11-2012, 03:22 AM

How did it get that way?

Assuming the bowtie2 error messages speak the truth (which you can verify by examining the relevant fastq lines), I'd sure recommend tracing the problem back to its source, rather than trying to clean up the data after the fact.

Are the bad reads interspersed with good ones, or do they fall at the end of the fastq file? In the latter case, you may have filled up your disk.

What is the output of the instrument -- .bcl files? How do you turn that into fastq files?

Do the reads which bowtie2 does NOT complain about look plausible? E.g., quality characters in the correct range?

If all else fails, post a few (6) reads here, showing a bad read in context with other 'good' ones.

--SP

**kz26** · 07-24-2012, 12:56 PM

I too am running into this issue with a quite a few datasets, using bowtie2.0.0-beta6.

Examples:

@SRR387921.488948 0303_20110429_2_SL_AWG_TG_NA11829_4_2pA_01003434289_1_4_41_117/1
T10331232322220002110220221110022020211222032021222
+
!%85117+****&7(&=,'%%).%'((4).)61)%.,(&''7='10%-&,)

@SRR096575.4651 VAB_0513_20101119_1_SP_ANG_TG_NA11830_3_1sA_01003380693_2853_102_63/1
T322003021302112201213211122322210023002300122221.1
+
!9,%.7%6-/9.)975+%),8+(<.(*19*%+&%%*2%<)'*5*&.)%(!&

@SRR096590.1165 VAB_0510_20101117_2_SP_ANG_TG_NA11831_5_1sA_01003380706_11279_16_41/1
T300321021030023001031320311113312212333223222232.3
+
!557*.7;6925=46+:>-9:-690>;%(3-2-&5)/&'5)%8%&)*(%!2

Strangely enough, these are all the first read in their respective files, and all of them appear to be correct (i.e. same number of quality values as read chars.)

**maubp** · 07-24-2012, 01:08 PM

Originally posted by kz26 View Post

I too am running into this issue with a quite a few datasets, using bowtie2.0.0-beta6.

...

Strangely enough, these are all the first read in their respective files, and all of them appear to be correct (i.e. same number of quality values as read chars.)

Those are colour space FASTQ, and frustratingly there seem to be two schools of thought on how many quality scores are needed, specifically should there be a score for the adaptor base or not.

**afadda** · 08-09-2012, 08:27 AM

maubp, what does that mean? I have the same problem as kz26. Help please!

**maubp** · 08-09-2012, 08:46 AM

I mean some sources include a quality for the adaptor, e.g. here we have an adapter plus 50 colour space calls. Should there be 51 qualities or just 50?

Code:

@SRR387921.488948 0303_20110429_2_SL_AWG_TG_NA11829_4_2pA_01003434289_1_4_41_117/1
T10331232322220002110220221110022020211222032021222
+
!%85117+****&7(&=,'%%).%'((4).)61)%.,(&''7='10%-&,)

That file has 51 quality scores, including one for the adapter. Some tools do not expect a quality for the adapter. So if we remove the "!" for the adapter "T" in this case we'd get:

Code:

@SRR387921.488948 0303_20110429_2_SL_AWG_TG_NA11829_4_2pA_01003434289_1_4_41_117/1
T10331232322220002110220221110022020211222032021222
+
%85117+****&7(&=,'%%).%'((4).)61)%.,(&''7='10%-&,)

I don't do any work with colour space, so I've not researched this issue. But this is my observation and guess about the apparent problem.

**afadda** · 08-10-2012, 04:02 AM

what i have is this

@HWI-ST1146:66:C0YHCACXX:7:1101:2909:2074 1:N:0:ATCACG
CCACTAGCTTTCCTGGCAC
+
JJEHIJIIJJJHEHFHFFF

so the number of letters is the same for the read and the quality. I'm using Bowtie 0.12.7. and i've used it before 10's of times but with output from older machines. this new one is from HiSeq

**maubp** · 08-10-2012, 04:29 AM

Originally posted by afadda View Post

what i have is this

@HWI-ST1146:66:C0YHCACXX:7:1101:2909:2074 1:N:0:ATCACG
CCACTAGCTTTCCTGGCAC
+
JJEHIJIIJJJHEHFHFFF

so the number of letters is the same for the read and the quality. I'm using Bowtie 0.12.7. and i've used it before 10's of times but with output from older machines. this new one is from HiSeq

Is there an error message? The recent Illumina pipelines use the original Sanger FASTQ encoding for quality scores - perhaps you are using an option specific to the obsolete Illumina specific FASTQ encoding?

**afadda** · 08-10-2012, 04:33 AM

yes. message is:
Too few quality values for read: HWI-ST1146:66:C0YHCACXX:7:1101:8166:5424 1:N:0:ACTTGA
are you sure this is a FASTQ-int file?

my command line is:
bowtie -S -a --best --strata -v2 -m14 $reference $seqfile > $samfile --un $unalignfile

**maubp** · 08-10-2012, 04:42 AM

OK - so what does that read look like in the FASTQ input file? You showed a different read (which was only 19 bases long, and had as expected a matching 19 quality scores).

**afadda** · 08-10-2012, 04:54 AM

you're absolutely right. it's a programming mistake on my side when i was trimming the reads, so that the read in the error message had different length for quality.
thanks for trouble shooting!
(should never program when sleepy)

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Problem with quality in fastq file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News