Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • fabferre
    Junior Member
    • Jun 2010
    • 5

    sffToCA frg output

    We used sffToCA to convert sff files from a number of mate pair libraries into .frg files. We were wondering why the number of input reads reported in the sffToCA .stats output file (in the field numReadsInSFF and also reported as the total of each filtering/analysis procedure) is remarkably different from the number of fragments in the generated .frg files (for one library we got 584072 input reads reported in the .stats file, and 699213 fragments in the .fgr file).
    We counted the number of fragments in the .frg files checking FRG records (i.e. blocks starting with {FRG and ending in }) or grep-ping "seq:" in the file.
    To what filtering procedure is this difference accountable for?
    Thanks
  • flxlex
    Moderator
    • Nov 2008
    • 412

    #2
    Without knowing for sure, perhaps the paired reads are split into two pair halves, thereby increasing the number of reads?

    Comment

    • Ole
      Member
      • Oct 2011
      • 17

      #3
      I think Lex is onto the right answer here. Count the number of {LKG in the .frg file (which is how many pairs you have) and then you can find the number of input reads by: Total fragments - number of links = input reads.

      From your example you would have 699213 - 584072 = 115141 {LKG. There's a fair number of shotgun reads in any paired end 454 library. Here's an example from one of our libraries:

      INPUT
      numReadsInSFF 10178511

      LENGTH
      too short 226135
      ok 9952376
      trimmed by N 0
      too long 0
      -------
      10178511

      LINKER
      not examined 993719
      none detected 1304337
      inconsistent 183024
      partial 3798009
      good 3899422
      -------
      10178511

      OUTCOME
      fragment 5102346
      mate pair 3899422
      deleted inconsistent 183024
      deleted duplicate 767584
      deleted too short 226135
      deleted N not allowed 0
      -------
      10178511

      Of the total good reads, more than half are fragment (shotgun) reads, while only 3899422 are pairs, which should give us 1949711 pairs in total.

      Comment

      • fabferre
        Junior Member
        • Jun 2010
        • 5

        #4
        Thanks to both of you. I didn't notice the LKG record, which in effect corresponds to the number of pairs. The following question is: is this information already in the original sff files, or it is something that sffToCA detects?

        Comment

        • Ole
          Member
          • Oct 2011
          • 17

          #5
          It's both. The two mates in a sff-file are separated by a linker sequence which can for example be:
          linker GTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAAC (FLX)
          linker TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG (Titanium)
          (Again from the .stats file.)

          sffToCA will detect these sequences, and create two FRGs, one for the left part and one for the right part of the sequence. In addition, it will create a LKG with reference to these two new FRGs.

          Comment

          • fabferre
            Junior Member
            • Jun 2010
            • 5

            #6
            Perfectly clear now. Thank you very much

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 07-02-2026, 11:08 AM
            0 responses
            12 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            14 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            20 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            54 views
            0 reactions
            Last Post SEQadmin2  
            Working...