Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • format of AB SOLiD 4 System sequencing output

    To validate one of my hypothesis, I've downloaded some public data from EMBL-EBI ENA (European Nucleotide Archive) (http://www.ebi.ac.uk/ena/).
    The data is from a paper published in Nature structural & molecular biology in 2011. It was generated by AB SOLID 4 System.
    As described in the ENA for this data set, the Fastq files are available both via ftp or galaxy.
    The problem is that , I found that the fastq file that I downloaded is so wired
    and I have never faced this before. Details are showed as following.

    ###Eg. 09_public_data$ less ERR042386.fastq

    @ERR042386.1 solid0032_385_1_4_20100830_FRAG
    T32120132000132211310023202201202002303130332322311
    +
    !@%62B8?=A690@>><->8=51%:==5521=582<@9>9><,6785.>4&

    Generally, in a classic fastq format file, first line is begin with "@", 2nd line is the sequence of reads, 3rd line is a "+" and 4th line is the quality.
    However in these fastq files, the sequence of reads are some numbers ("0,1,2,3"). I really have no idea what does it means ...

    Is that ("0,1,2,3") represent ("A,G,C,T") respectively ?
    or is it a unique format for ABI solid sequence output format ?

    Does someone have experience to deal with this kind of data ?
    All suggestions are appreciated ...


  • #2
    format of AB SOLiD 4 System sequencing output

    It's a unique format for SOLiD, what you're seeing is the sequence in colorspace.

    SOLiD uses a dibase encoding system, where each color represents a sequence of two bases.

    Have a look at some of the manuals on the Life Technologies website,

    Researchers use Applied Biosystems integrated systems for sequencing, flow cytometry, and real-time, digital and end point PCR—from sample prep to data analysis.

    Comment


    • #3
      Thanks a lot ...
      I know SOLID using dinucleotides enconding the sequence.
      However, what i downloaded is already fastq format file, at least it should be converted to AGCT...
      I have analyzed solid data before whereas it's the normal sequence in Fastq format file...
      Anyway, thanks a lot ... Do you know some tools to do the convention?

      Comment


      • #4
        With Solid data you need to do the mapping in base-space, not convert the fastq and map that. If you're not familiar with it I would recommend tracking down an expert.

        I don't know if it is still available but I think the Life Tech software was called BioScope, they now have 'LifeScope' but I don't know if that is good for v4 machines.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X