Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 454 re-scoring

    Hi there!

    I am a little confused about the re-scoring option on sfffile. In the manual, v2.6 of the Data Analysis package (2011), says:

    sfffile -r : This option re-generates the phred-based quality scores for each of the input reads using the current quality scoring table, and overwrites the existing quality scores with these new quality scores in the output file.

    But, in the manual of the Data Processing software, v2.3 (2009), Section 6.6 states that (not transcript, sorry, in my own notes):

    For GS 20, GS FLX and GS FLX Titanium, different training sets were used to build the lookup tables, since they show slightly different error tendencies.

    Well my question now is: How do I know when to use this sfffile option? How many different scoring tables exist? For what chemistry should I use this option or I can use it with old .sff files from SRA NCBI archive?

    This is SRR000001.sra processed with SRA toolkit fastq-dump:

    @SRR000001.1 EM7LVYS01C1LWG length=255
    TCAGGGGGGAGCTTAAATTTGAAACTAGAAAAATTTTGAACAAAATAATCATAATTGTTAGCTGATGAAAAACTAGAAAAGATTTTCTGAGTGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAACGGTATCCCGTAGTGTGCATTCATCCCTGCTCTGGATACAGTCAGCTCCCAAATTCCATAAACAACTCCTTTGTAAGTAACCTCCTTTTGACAGGGGGTACTGAGCGGGCTGGCAAGGCN
    +SRR000001.1 EM7LVYS01C1LWG length=255
    =;8GC91*#==<C=EA.EA/<B=(<<:=HC90'FB5&;B:<GC6(=D=<<==C=C==B<=<<<=;<<GC8.#<<9=FB4%<8EA4%87:<<8=B;C<@8>5=C?*A<&A<&<=49/2A='@;#A<&<A9C=@9B::B:<;=C?+<<;<===<=;C<==<FB0=<=<<<D=9=;;=<=<=<;=FB2FB2C<C<;=FB0<C==;C<D@-<=B:<=C=C;<C=GD7*=;:=HD90'==<<=<=:FB0<<C<;C=C=<!

    And this is the same read, after "sfffile -r ... ; sff2fastq ... ":

    @EM7LVYS01C1LWG
    TCAGGGGGGAGCTTAAATTTGAAACTAGAAAAATTTTGAACAAAATAATCATAATTGTTAGCTGATGAAAAACTAGAAAAGATTTTCTGAGTGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAACGGTATCCCGTAGTGTGCATTCATCCCTGCTCTGGATACAGTCAGCTCCCAAATTCCATAAACAACTCCTTTGTAAGTAACCTCCTTTTGACAGGGGGTA
    +EM7LVYS01C1LWG
    FFFDEFGGGFFFEEEEFFFFD;;;FFFE55555BBCCEEFHFFFGIHGIFFFFFFFEEEEFFFFFFD77777FFCC1111CA7777@AEFFFFFFFFDDAAC?33444444=??7774444444443?FAAEEEEFFFFEEDDDFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEAAA===EEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEEEBBBEE@@@@AEE

    Both programs are using Phred+33 encoding. SRR000001 is from GS FLX experiment done in 2007. From http://sourceforge.net/apps/mediawik...454_Platforms:

    "Recent versions (since 2008) produce better QV's than early versions. Our SFF parser detects the software version by searching for the XML element "<qualityScoreVersion>1.1.03</qualityScoreVersion>" in the SFF manifest. The parser will complain "WARNING: Fragments not rescored!" if this XML element is not found."

    but when I did "sffinfo -mnf" I got a "No manifest found". So, this is everything very confusing.
    What FASTQ values should I be confident with??

    Some final, maybe more broad questions:
    how much has changed the scoring in 454 since GS20 till now?
    And, is still the same homopolymer based scoring?
    Is it better the actual algorithm?
    Is it important which Data Analysis package version I use in relation with the 454 chemistry procedence of the reads?

    Thank you,
    Carlos
    Last edited by CPCantalapiedra; 03-20-2012, 09:59 AM.

  • #2
    I have always thought that the SRA 454 files are incomplete, and your case of a missing manifest could mean I'm right in my suspicion. What if you run 'sffinfo' without paramaters on your sff file, do you see anything metadata-like?

    As far as I know, any older sff file benefits from rescoring. The latest scoring is supposed to be better than older ones, also for older data. Also, the newest data analysis software should work best regardless of sequencing chemistry (at least that is what 454 intended, I believe)

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Recent Developments in Metagenomics
      by seqadmin





      Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
      09-23-2024, 06:35 AM
    • seqadmin
      Understanding Genetic Influence on Infectious Disease
      by seqadmin




      During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

      Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
      09-09-2024, 10:59 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 10-02-2024, 04:51 AM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 10-01-2024, 07:10 AM
    0 responses
    20 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 09-30-2024, 08:33 AM
    0 responses
    25 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 09-26-2024, 12:57 PM
    0 responses
    18 views
    0 likes
    Last Post seqadmin  
    Working...
    X