Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pre-filtering before mapping?

    Hi,
    We have some mated paired end sequences (50bp) to be mapped to the reference genome for SNP (and maybe indel) discovery.
    The data I have are csfasta and qual files.
    Could any one let me know if I need to do pre-filtering for the sequences before I use any software to map them?
    If I use bowtie, I should remove the orphan reads (and maybe try to map the orphan reads using a different parameter set).
    If I use BFAST, should I do the same?

    If I do need to filter the sequences based on the quality score, what's the cut-off threshold people normally use? Average of Q10?
    How to translate the quality score to the % error rate like the Phred score?

    Thanks!
    Nan

  • #2
    csfasta_quality_filter.pl this script will be used about qulity contorl in solid.
    you can find this script by google

    Comment


    • #3
      Hi,

      I am on index step, but apparently there is an error:

      In function "RGIndexLayoutCreate": Fatal Error[OutOfRange]. Message: Layout must begin with a one.

      Could someone help me with this problem???

      Comment


      • #4
        Originally posted by fenciso View Post
        Hi,

        I am on index step, but apparently there is an error:

        In function "RGIndexLayoutCreate": Fatal Error[OutOfRange]. Message: Layout must begin with a one.

        Could someone help me with this problem???

        you mean the indexing of the reference genome using bowtie?

        Below is the command I use (assuming bowtie-build and all_reference.fa are in the same folder)

        ./bowtie-build -C -f all_reference.fa reference_Color

        Comment


        • #5
          Originally posted by fanyucai1 View Post
          csfasta_quality_filter.pl this script will be used about qulity contorl in solid.
          you can find this script by google
          Thanks!
          I downloaded the program. It has many parameters to use for trimming. Can anyone tell me normally what value they use for filtering (if any)? Thanks!

          1. num_colors_to_hard_trim
          2. min_median_qv
          3. max_bad_colors_in_first_ten
          4. max_number_bad_colors
          5. num_consec_colors_to_trim
          6. trim_terminal_bad_colors
          7. min_read_length

          Comment


          • #6
            solid not supply the parameters, choose them by yourself, i advice you can statistics raw data before using last script.

            Comment


            • #7
              Originally posted by fanyucai1 View Post
              solid not supply the parameters, choose them by yourself, i advice you can statistics raw data before using last script.
              Thanks! I am new in the area and would like some advice on choosing the values for those parameters, such as: Minimum QV value for a single color.

              As to the statistics, besides median/average quality score, what else should I look into?

              Thanks!

              Comment


              • #8
                there is a paper called :Analysis of quality raw data of second generation sequencers with Quality Assessment Software. you can find it by google. it is very simple ,it wil help you .
                some parameters contains: min \max\Q20\mean\median you should consider

                Comment


                • #9
                  I downloaded and executet csfasta_quality_filter.pl script on my SOLiD 5500 csfasta (and qual) data to trim a fixed number of colors. ( code below)

                  perl csfasta_quality_filter.pl -f F5.cs fasta -q F5.QV.qual -o 5_bases_trimmed_F5.csfasta --num_colors_to_hard_trim 5

                  I was wondering about the output file, from the manual (Filtered and trimmed reads are output in csfasta format to a user-specified filename). And where is my associated QV.qual file trimmed? Any ideas? I cannot supply TopHat or any other alignment software with a trimmed csfasta and a full lenght associated quality file..

                  Comment


                  • #10
                    Originally posted by paolo.kunder View Post
                    I downloaded and executet csfasta_quality_filter.pl script on my SOLiD 5500 csfasta (and qual) data to trim a fixed number of colors. ( code below)

                    perl csfasta_quality_filter.pl -f F5.cs fasta -q F5.QV.qual -o 5_bases_trimmed_F5.csfasta --num_colors_to_hard_trim 5

                    I was wondering about the output file, from the manual (Filtered and trimmed reads are output in csfasta format to a user-specified filename). And where is my associated QV.qual file trimmed? Any ideas? I cannot supply TopHat or any other alignment software with a trimmed csfasta and a full lenght associated quality file..
                    This script could not output the .qual file after quality-contorl ,you could choose it from raw file according csfasta file .

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Quality Control Essentials for Next-Generation Sequencing Workflows
                      by seqadmin




                      Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

                      Nucleic Acid Quality Control
                      Preparing for NGS starts with isolating the...
                      02-10-2025, 01:58 PM
                    • seqadmin
                      An Introduction to the Technologies Transforming Precision Medicine
                      by seqadmin


                      In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
                      01-27-2025, 07:46 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 02-07-2025, 09:30 AM
                    0 responses
                    72 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-05-2025, 10:34 AM
                    0 responses
                    113 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-03-2025, 09:07 AM
                    0 responses
                    90 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 01-31-2025, 08:31 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X