Seqanswers Leaderboard Ad

**mcnelson.phd** · 12-10-2013, 05:29 AM

Hi Daniel,

You're correct in that the actual read length of an Illumina run is always N+1 because the extra base is used for phasing/pre-phasing analysis. Ideally that last base should be trimmed off because it's not properly quality checked.

As for reads < 250bp, if you're using a Nextera kit and had Trim Adapters checked in the sample sheet, then you're correct about why you have shorter reads. If you're seeing that the two reads of a pair aren't the same length, then you're also probably seeing that read 1 is shorter than read 2. This would be an issue with the trimming where the base quality of read 2 dropped low enough that the adapter sequence wasn't properly called and thus couldn't be recognized to be trimmed. Some third-party apps can do a much better job of trimming so you may want to try those.

**dsobral** · 12-10-2013, 05:33 AM

Thanks for the reply. Very useful information.

The only thing I'm still puzzled is why some reads have 250bp and others have 251bp.

Daniel

**GenoMax** · 12-10-2013, 05:40 AM

Some facilities set up a run as (n+1) depending on the number of bases (n) you had asked to be sequenced.

If you did not set this run up yourself then it is possible that the original run was set up as 250 x 251 bp (if one read is consistently 250 or less and other is 251 bp or less depending on trimming).

**dsobral** · 12-10-2013, 05:54 AM

I would understand if there was some obvious consistency.
What I observe is that for the same run, read1 OR read2 can be either 250 or 251bp (and sometimes 249bp!) with no apparently consistent pattern. I'm suspicious that the behaviour is coming from adapter trimming.

Counts | Read1 | Read2
4223 | 250 | 248
7940 | 250 | 249
58517 | 250 | 250
130842 | 250 | 251
10571 | 251 | 248
21321 | 251 | 249
145959 | 251 | 250
331396 | 251 | 251
...

**GenoMax** · 12-10-2013, 05:58 AM

Adapter trimming can't be the cause of it (unless this was set up as a longer run originally than 250 bp).

Did you run this yourself (if not you should ask the facility that ran it to see how the original run was set up).

**dsobral** · 12-10-2013, 06:11 AM

I didn't run it myself, but it was using Nextera V2 250x250
Adaptor trimming was on (I guess by default)

Thanks,
Daniel

**dsobral** · 12-10-2013, 06:13 AM

PS: although the data has these peculiarities, I used these for denovo assembly of a bacteria, and it gave good results...

I just noticed because when I tried Edena on the full data, it complained about the sizes...

I was just wandering what to think of it.

Thanks

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

MiSeq fastq output: 250-251 bp reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News