Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Novel question---What's .qual file for?

    Recently I got my sequencing data based on SOLiD system. For each sample I got two .csfasta files labled 1 and 2, and two .qual files corresponding to the csfasta file. Since I am quite new to this high throughput sequencing technology, I don't know how to use the .qual file for analysis. I know the .csfasta files contain the sequence data which are in color space format. I am able to convert the color space sequences to base space. So can i just use the csfasta file for the following analysis, such as trimming adaptor, alignment, and statistical analysis, without the .qual file?

    And the two .csfasta files (1 and 2) can be pooled together for analysis?

    Hope to get your help. Thank you in advance.

  • #2
    Hi harrike,
    I'll try to give you short answer to your vey long question.
    You should use the qual files in order to perform more reliable alignment (the quality of base will help you to decrease the false positive call).
    You probably would like to start by building FASTQ files from csfasta and qual files you got.
    I'm really suggesting you not to convert the color space to bases before alignment or assembly, only after alignment.
    Most probably you have paired reads; this is why you have two qual and two csfasta files.
    As a beginner in Next Gen please consider to play with your data in Galaxy:
    Galaxy is a community-driven web-based analysis platform for life science research.


    Look for "NGS Toolbox Beta" and take a tutorial of SOLiD mapping.

    This should help you better understand (in basic) how properly deal with your data.

    Best,

    Ilia

    Comment


    • #3
      IIia,

      Thank you for your reply. You answer is quite helpful to me.

      Actually, my lab has bought the CLC workbench, which is not available to me now since I am out of the lab for a period. I just want to understand the data more before my analysis.

      Yes, there is a lot I need to know. Thank you for your kind help.

      Comment


      • #4
        Bump - Hello, I just joined this forum, but I'd like to ask about QUAL files. I recently analyzed the data in my FASTA file without a QUAL file to go with it from the pyrosequencing facility. I am going to rerun the analysis now that I have received an accompanying QUAL file, but I wanted to know how much the two analyses may be expected to differ. Is the first analysis pretty much useless without a QUAL file? Is the second analysis going to yield radically different results from the first?

        Comment


        • #5
          That depends on the type of analysis and what you're looking for. Before you redo the entire analysis you should try and plot the average distribution of the quality score per base to see if there are any regions that would alter your result - i.e. if the fifth base of every read had a very low quality.

          But answer your question, I haven't seen a significant increase in performance when mapping though enough miss-called bases or low quality bases can affect your results and suggest something that is not really there. However the assembly process seems to be affected a lot, specifically de novo assembly. One trick that is commonly used is to filter the low quality reads (based on the quality score, of course) and then assemble which has worked wonders.

          So it really depends on your analysis but all in all, from 454 - especially if you're interested in the 5' end of the reads (e.g. amplicon) - then you shouldn't see too much of a difference. I would be interested to hear your results if you happen to redo the entire analysis.

          I hope that helps!

          Comment


          • #6
            Thanks! Yeah, I'm wondering how much relative abundance of a certain taxon will be affected by QUAL scores. Sounds like it would not be very much, which is good, because I need to share the analysis I have now with a collaborator. I will certainly tell them about what is missing from this analysis, but hopefully the picture that emerges now will still be useful. I will rerun this with the QUAL scores soon, so I will let you know how it turns out.

            Comment


            • #7
              How divergent is this taxon from the rest of the population? How strict are your settings? With that type of analysis the stringency of the parameters at which you distinguish one taxon from another could change dramatically IF the quality values are poor. I think it would be most advantageous to plot the quality distribution and filter out bad reads if need be.

              Please keep me posted on what you find out, thanks!
              Last edited by twaddlac; 03-30-2012, 09:12 AM. Reason: fix error

              Comment


              • #8
                Well, the results are in. I performed the analysis both with and without the QUAL data, and the final results were pretty similar. In both analyses, the relative abundances of the various taxa were about the same, giving the same overall picture for community composition.

                Comment


                • #9
                  Hello,

                  Are the values in the .qual files I get from SOLiD runs simply Phred scores using the standard equation , and without encoding into ASCII ??

                  Thanks a lot!

                  Carmen

                  Comment


                  • #10
                    Yes, they are.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X