Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • NanYu
    Member
    • Apr 2011
    • 21

    pre-filtering before mapping?

    Hi,
    We have some mated paired end sequences (50bp) to be mapped to the reference genome for SNP (and maybe indel) discovery.
    The data I have are csfasta and qual files.
    Could any one let me know if I need to do pre-filtering for the sequences before I use any software to map them?
    If I use bowtie, I should remove the orphan reads (and maybe try to map the orphan reads using a different parameter set).
    If I use BFAST, should I do the same?

    If I do need to filter the sequences based on the quality score, what's the cut-off threshold people normally use? Average of Q10?
    How to translate the quality score to the % error rate like the Phred score?

    Thanks!
    Nan
  • fanyucai1
    Member
    • Jan 2011
    • 11

    #2
    csfasta_quality_filter.pl this script will be used about qulity contorl in solid.
    you can find this script by google

    Comment

    • fenciso
      Junior Member
      • Apr 2011
      • 5

      #3
      Hi,

      I am on index step, but apparently there is an error:

      In function "RGIndexLayoutCreate": Fatal Error[OutOfRange]. Message: Layout must begin with a one.

      Could someone help me with this problem???

      Comment

      • NanYu
        Member
        • Apr 2011
        • 21

        #4
        Originally posted by fenciso View Post
        Hi,

        I am on index step, but apparently there is an error:

        In function "RGIndexLayoutCreate": Fatal Error[OutOfRange]. Message: Layout must begin with a one.

        Could someone help me with this problem???

        you mean the indexing of the reference genome using bowtie?

        Below is the command I use (assuming bowtie-build and all_reference.fa are in the same folder)

        ./bowtie-build -C -f all_reference.fa reference_Color

        Comment

        • NanYu
          Member
          • Apr 2011
          • 21

          #5
          Originally posted by fanyucai1 View Post
          csfasta_quality_filter.pl this script will be used about qulity contorl in solid.
          you can find this script by google
          Thanks!
          I downloaded the program. It has many parameters to use for trimming. Can anyone tell me normally what value they use for filtering (if any)? Thanks!

          1. num_colors_to_hard_trim
          2. min_median_qv
          3. max_bad_colors_in_first_ten
          4. max_number_bad_colors
          5. num_consec_colors_to_trim
          6. trim_terminal_bad_colors
          7. min_read_length

          Comment

          • fanyucai1
            Member
            • Jan 2011
            • 11

            #6
            solid not supply the parameters, choose them by yourself, i advice you can statistics raw data before using last script.

            Comment

            • NanYu
              Member
              • Apr 2011
              • 21

              #7
              Originally posted by fanyucai1 View Post
              solid not supply the parameters, choose them by yourself, i advice you can statistics raw data before using last script.
              Thanks! I am new in the area and would like some advice on choosing the values for those parameters, such as: Minimum QV value for a single color.

              As to the statistics, besides median/average quality score, what else should I look into?

              Thanks!

              Comment

              • fanyucai1
                Member
                • Jan 2011
                • 11

                #8
                there is a paper called :Analysis of quality raw data of second generation sequencers with Quality Assessment Software. you can find it by google. it is very simple ,it wil help you .
                some parameters contains: min \max\Q20\mean\median you should consider

                Comment

                • paolo.kunder
                  Member
                  • Aug 2011
                  • 93

                  #9
                  I downloaded and executet csfasta_quality_filter.pl script on my SOLiD 5500 csfasta (and qual) data to trim a fixed number of colors. ( code below)

                  perl csfasta_quality_filter.pl -f F5.cs fasta -q F5.QV.qual -o 5_bases_trimmed_F5.csfasta --num_colors_to_hard_trim 5

                  I was wondering about the output file, from the manual (Filtered and trimmed reads are output in csfasta format to a user-specified filename). And where is my associated QV.qual file trimmed? Any ideas? I cannot supply TopHat or any other alignment software with a trimmed csfasta and a full lenght associated quality file..

                  Comment

                  • fanyucai1
                    Member
                    • Jan 2011
                    • 11

                    #10
                    Originally posted by paolo.kunder View Post
                    I downloaded and executet csfasta_quality_filter.pl script on my SOLiD 5500 csfasta (and qual) data to trim a fixed number of colors. ( code below)

                    perl csfasta_quality_filter.pl -f F5.cs fasta -q F5.QV.qual -o 5_bases_trimmed_F5.csfasta --num_colors_to_hard_trim 5

                    I was wondering about the output file, from the manual (Filtered and trimmed reads are output in csfasta format to a user-specified filename). And where is my associated QV.qual file trimmed? Any ideas? I cannot supply TopHat or any other alignment software with a trimmed csfasta and a full lenght associated quality file..
                    This script could not output the .qual file after quality-contorl ,you could choose it from raw file according csfasta file .

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    30 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    44 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    50 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    51 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...