Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • All sequence bases have the same quality score.

    Hi all,
    I am doing some analysis on the dataset here:



    Some basic info for the data without looking into above link:
    ----
    Illumina Genome Analyzer IIx paired end sequencing
    shotgun sequencing
    WGS
    Pseudomonas fluorescens
    Paired-end
    ----

    When I search for 'Genome Analyzer IIx', could find the quality encoding information. I have seen that the quality scores for all bases are '?', e.g.

    @ERR1363506.14 226/1
    GTCCACTACAGGTCGAAGCCGAAGGCGACGAGTTGCGTGTTTACGCGCCCAATCGTTTTGTTCTCGACTGGGTCAACGAGAAGTACCTGAGCCGCGTGCT
    +
    ????????????????????????????????????????????????????????????????????????????????????????????????????

    My question is:
    Is it normal to have a identical quality score for all bases?
    When I analysis the data, some bio tools report errors that it cannot detect the quality offset or quality encoding, is above the cause of the errors?

    Thanks.

  • #2
    This is an odd dataset.

    First of all there are three files for a PE dataset (I thought one was a file for the barcode/tags, but that does not appear to be the case). The fastq headers are non-standard and then there is that issue of every Q-score set to ? for the entire dataset in all three files.

    You should try to find out more information (directly from the submitter, if you can) before spending time analyzing this data.

    Comment


    • #3
      Originally posted by GenoMax View Post
      This is an odd dataset.

      First of all there are three files for a PE dataset (I thought one was a file for the barcode/tags, but that does not appear to be the case). The fastq headers are non-standard and then there is that issue of every Q-score set to ? for the entire dataset in all three files.

      You should try to find out more information (directly from the submitter, if you can) before spending time analyzing this data.

      Thanks your answer.

      This data can be found from DRASearch, NCBI SRA, and EBI.
      All these sources of these data has strange quality values.
      However I wasn't able to find the contact info of the submitter, but I email EBI help, and got reply as follow:

      CRAM files are compressed NGS read files. The sequences can are retrieved byusing the reference but quality scores are quantised into a smaller range in
      order to use less space. It looks like the compression on this cram file is such
      that all quality scores average into the same value. These are probably low
      value quality scores, or the quality scores were not available in the first
      place.
      I would just leave the data, or set the --offset =33 for the tool, just to pass the analysis.

      Comment


      • #4
        Ok. So we have an explanation for the Q-scores but what about the presence of 3 files, all of which have the same length sequence data?

        Edit: I think the third file is likely of single reads that had the mate discarded during trimming. You can check on that possibility to see if the headers there are not present in _1 or _2 file.
        Last edited by GenoMax; 06-24-2016, 07:37 AM.

        Comment


        • #5
          Originally posted by GenoMax View Post
          Ok. So we have an explanation for the Q-scores but what about the presence of 3 files, all of which have the same length sequence data?
          Usually, splitting the .sra files of pair-end reads using fastq-dump from SRA-toolkit,

          a parameter --split-3 is used to do this:


          Legacy 3-file splitting for mate-pairs: First 2 biological reads satisfying dumping conditions are placed in files *_1.fastq and *_2.fastq If only 1 biological read is dumpable - it is placed in *.fastq.

          so the smaller file is usually called unmapped sequence, which contains the sequence which the mate pair sequence cannot be found.


          SRA Tools. Contribute to ncbi/sra-tools development by creating an account on GitHub.

          Comment


          • #6
            See the edit I just made to the post above.

            Comment


            • #7
              Originally posted by GenoMax View Post
              See the edit I just made to the post above.
              Saw it.
              I think there is no trimming involved at/before that stage. The third file is a collection of unloved ones.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Best Practices for Single-Cell Sequencing Analysis
                by seqadmin



                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                06-06-2024, 07:15 AM
              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 07:49 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-20-2024, 07:23 AM
              0 responses
              14 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-17-2024, 06:54 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-14-2024, 07:24 AM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Working...
              X