Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • All sequence bases have the same quality score.

    Hi all,
    I am doing some analysis on the dataset here:



    Some basic info for the data without looking into above link:
    ----
    Illumina Genome Analyzer IIx paired end sequencing
    shotgun sequencing
    WGS
    Pseudomonas fluorescens
    Paired-end
    ----

    When I search for 'Genome Analyzer IIx', could find the quality encoding information. I have seen that the quality scores for all bases are '?', e.g.

    @ERR1363506.14 226/1
    GTCCACTACAGGTCGAAGCCGAAGGCGACGAGTTGCGTGTTTACGCGCCCAATCGTTTTGTTCTCGACTGGGTCAACGAGAAGTACCTGAGCCGCGTGCT
    +
    ????????????????????????????????????????????????????????????????????????????????????????????????????

    My question is:
    Is it normal to have a identical quality score for all bases?
    When I analysis the data, some bio tools report errors that it cannot detect the quality offset or quality encoding, is above the cause of the errors?

    Thanks.

  • #2
    This is an odd dataset.

    First of all there are three files for a PE dataset (I thought one was a file for the barcode/tags, but that does not appear to be the case). The fastq headers are non-standard and then there is that issue of every Q-score set to ? for the entire dataset in all three files.

    You should try to find out more information (directly from the submitter, if you can) before spending time analyzing this data.

    Comment


    • #3
      Originally posted by GenoMax View Post
      This is an odd dataset.

      First of all there are three files for a PE dataset (I thought one was a file for the barcode/tags, but that does not appear to be the case). The fastq headers are non-standard and then there is that issue of every Q-score set to ? for the entire dataset in all three files.

      You should try to find out more information (directly from the submitter, if you can) before spending time analyzing this data.

      Thanks your answer.

      This data can be found from DRASearch, NCBI SRA, and EBI.
      All these sources of these data has strange quality values.
      However I wasn't able to find the contact info of the submitter, but I email EBI help, and got reply as follow:

      CRAM files are compressed NGS read files. The sequences can are retrieved byusing the reference but quality scores are quantised into a smaller range in
      order to use less space. It looks like the compression on this cram file is such
      that all quality scores average into the same value. These are probably low
      value quality scores, or the quality scores were not available in the first
      place.
      I would just leave the data, or set the --offset =33 for the tool, just to pass the analysis.

      Comment


      • #4
        Ok. So we have an explanation for the Q-scores but what about the presence of 3 files, all of which have the same length sequence data?

        Edit: I think the third file is likely of single reads that had the mate discarded during trimming. You can check on that possibility to see if the headers there are not present in _1 or _2 file.
        Last edited by GenoMax; 06-24-2016, 07:37 AM.

        Comment


        • #5
          Originally posted by GenoMax View Post
          Ok. So we have an explanation for the Q-scores but what about the presence of 3 files, all of which have the same length sequence data?
          Usually, splitting the .sra files of pair-end reads using fastq-dump from SRA-toolkit,

          a parameter --split-3 is used to do this:


          Legacy 3-file splitting for mate-pairs: First 2 biological reads satisfying dumping conditions are placed in files *_1.fastq and *_2.fastq If only 1 biological read is dumpable - it is placed in *.fastq.

          so the smaller file is usually called unmapped sequence, which contains the sequence which the mate pair sequence cannot be found.


          SRA Tools. Contribute to ncbi/sra-tools development by creating an account on GitHub.

          Comment


          • #6
            See the edit I just made to the post above.

            Comment


            • #7
              Originally posted by GenoMax View Post
              See the edit I just made to the post above.
              Saw it.
              I think there is no trimming involved at/before that stage. The third file is a collection of unloved ones.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              50 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X