Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • v_kisand
    Member
    • Jan 2009
    • 38

    454 /NCBI SRA & traceinfo

    Are there SFF files for 454 projects in SRA somewhere? For recent submissions I find only fastq, but I am looking for traceinfo xml as well belonging to particular short reads. Somehow I remember xml files were also available earlier?!

    v.
  • v_kisand
    Member
    • Jan 2009
    • 38

    #2
    ok re-found again TraceDB (some time since I tried to retrieve such data)
    ftp://ftp.ncbi.nlm.nih.gov/pub/TraceDB

    BUT

    I do not find any similar organisms in TraceDB which correspond to SRR numbers

    v.

    Comment

    • kmcarr
      Senior Member
      • May 2008
      • 1181

      #3
      V.

      The NCBI Trace Archive (TA) and Short Read Archive (now renamed the Sequence Read Archive or SRA) are two separate databases with separate missions. The TA was designed to store traces, sequences and metadata generated by Sanger sequencing, primarily from WGS projects. When next gen sequencing came on the scene the NCBI recognized that the TA design was not a good fit for this new type of massively parallel sequencing thus they designed the SRA. The SRA does not use or have traceinfo.xml files. And while data from 454 experiments is uploaded to the SRA as SFF files, you can not download said SFF files. The SRA only provides the sequence and q-scores available for download in the form of FASTQ files.

      Comment

      • v_kisand
        Member
        • Jan 2009
        • 38

        #4
        right, now I remember that TA was down for a while because next-generation data (?) and there was not possible to get data but I did not follow the developments there... Are these fastq traces cleaned for adaptor sequences (454 reads)? Should be known issue that Roche-software does not clean properly ...

        I guess I found some scripts to do adaptor clipping, I'll try soon. Anyway seems that would be much easier to do run clipping on sff, not a problem with your own data though.

        v.

        Comment

        • kmcarr
          Senior Member
          • May 2008
          • 1181

          #5
          The SFF file definition includes the full flowgram and base calls plus left (3') and right (5') clipping points. The 3' end of the read is clipped for the keytag sequence (TCAG). The 3' end of the read has a number of trimming filters applied including one which identifies the 454-B adapter sequence. The downloaded FASTQ is the trimmed sequence only.

          Should be known issue that Roche-software does not clean properly ...
          I'm not sure what you mean by this. I've never seen the 454 filter failing to remove the 454 adapter sequence. I suppose this is possible if the quality of the read was so degraded that it could not recognize the sequence, but in that case the signal/quality based filters would trim off that portion of the read.

          Comment

          • v_kisand
            Member
            • Jan 2009
            • 38

            #6
            Originally posted by kmcarr View Post
            The SFF file definition includes the full flowgram and base calls plus left (3') and right (5') clipping points. The 3' end of the read is clipped for the keytag sequence (TCAG). The 3' end of the read has a number of trimming filters applied including one which identifies the 454-B adapter sequence. The downloaded FASTQ is the trimmed sequence only.



            I'm not sure what you mean by this. I've never seen the 454 filter failing to remove the 454 adapter sequence. I suppose this is possible if the quality of the read was so degraded that it could not recognize the sequence, but in that case the signal/quality based filters would trim off that portion of the read.
            Yes, that's why I am looking for SFF files
            Seems Roche's software is not the best in clipping, or at least used to be not the best. Why , I do not know, check for example the discussion in:

            Comment

            • kmcarr
              Senior Member
              • May 2008
              • 1181

              #7
              Originally posted by v_kisand View Post
              Yes, that's why I am looking for SFF files
              Seems Roche's software is not the best in clipping, or at least used to be not the best. Why , I do not know, check for example the discussion in:
              http://www.freelists.org/post/mira_t...aptor-clipping
              The thread you linked to is discussing clipping of adapters introduced for cDNA synthesis, specifically the SMART cDNA construction adapters. The Roche signal processing pipeline, which outputs the SFF files, was never intended to remove cloning/adapter sequences introduced by the end user; it only removes the primer from the 454 library construction which it does just fine. The Roche assembly programs (gsAssembler, gsMapper) can trim other adapter sequences provided by the user as part of their assembly or mapping process. If you are using third party software (like MIRA) then of course you will have to trim any non-Roche adapters yourself.

              Comment

              • v_kisand
                Member
                • Jan 2009
                • 38

                #8
                Originally posted by kmcarr View Post
                The thread you linked to is discussing clipping of adapters introduced for cDNA synthesis, specifically the SMART cDNA construction adapters. The Roche signal processing pipeline, which outputs the SFF files, was never intended to remove cloning/adapter sequences introduced by the end user; it only removes the primer from the 454 library construction which it does just fine. The Roche assembly programs (gsAssembler, gsMapper) can trim other adapter sequences provided by the user as part of their assembly or mapping process. If you are using third party software (like MIRA) then of course you will have to trim any non-Roche adapters yourself.
                Thanks for clarifying but what about
                http://chevreux.org/uploads/media/mi...tml#section_27 ?

                maybe this TCTCCGTC is custom adapter

                maybe I am wrong that Roche processing pipeline should not take care of it but then it is sequence provider problem and data in NCBI may contain adaptors, right?

                Why I started this discussion was because downloading quite resent SRR029264 for testing various assemblers as theses data should be quite similar too data I get soon and I see CCGGCCAC in it. Should SFF file contain information about such adaptors? Anyway getting rid of these 8 bp is not a big problem, but as I am not too much into the topic yet, can NCBI short reads contain more of such type of stuff? Do uploaded data need to be cleaned or it is ok for database to have them in without auxiliary information (i.e. traceinfo)?

                v.
                Last edited by v_kisand; 12-28-2009, 02:16 AM.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  Yesterday, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 12:03 PM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, Yesterday, 11:40 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-26-2026, 10:12 AM
                0 responses
                31 views
                0 reactions
                Last Post SEQadmin2  
                Working...