Unconfigured Ad

**flxlex** · 01-13-2012, 12:34 AM

Without knowing for sure, perhaps the paired reads are split into two pair halves, thereby increasing the number of reads?

**Ole** · 01-13-2012, 09:20 AM

I think Lex is onto the right answer here. Count the number of {LKG in the .frg file (which is how many pairs you have) and then you can find the number of input reads by: Total fragments - number of links = input reads.

From your example you would have 699213 - 584072 = 115141 {LKG. There's a fair number of shotgun reads in any paired end 454 library. Here's an example from one of our libraries:

INPUT
numReadsInSFF 10178511

LENGTH
too short 226135
ok 9952376
trimmed by N 0
too long 0
-------
10178511

LINKER
not examined 993719
none detected 1304337
inconsistent 183024
partial 3798009
good 3899422
-------
10178511

OUTCOME
fragment 5102346
mate pair 3899422
deleted inconsistent 183024
deleted duplicate 767584
deleted too short 226135
deleted N not allowed 0
-------
10178511

Of the total good reads, more than half are fragment (shotgun) reads, while only 3899422 are pairs, which should give us 1949711 pairs in total.

**fabferre** · 01-16-2012, 03:23 AM

Thanks to both of you. I didn't notice the LKG record, which in effect corresponds to the number of pairs. The following question is: is this information already in the original sff files, or it is something that sffToCA detects?

**Ole** · 01-16-2012, 04:22 AM

It's both. The two mates in a sff-file are separated by a linker sequence which can for example be:
linker GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC (FLX)
linker TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG (Titanium)
(Again from the .stats file.)

sffToCA will detect these sequences, and create two FRGs, one for the left part and one for the right part of the sequence. In addition, it will create a LKG with reference to these two new FRGs.

**fabferre** · 01-16-2012, 04:28 AM

Perfectly clear now. Thank you very much

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 12 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

sffToCA frg output

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News