Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pre-filtering before mapping?

    Hi,
    We have some mated paired end sequences (50bp) to be mapped to the reference genome for SNP (and maybe indel) discovery.
    The data I have are csfasta and qual files.
    Could any one let me know if I need to do pre-filtering for the sequences before I use any software to map them?
    If I use bowtie, I should remove the orphan reads (and maybe try to map the orphan reads using a different parameter set).
    If I use BFAST, should I do the same?

    If I do need to filter the sequences based on the quality score, what's the cut-off threshold people normally use? Average of Q10?
    How to translate the quality score to the % error rate like the Phred score?

    Thanks!
    Nan

  • #2
    csfasta_quality_filter.pl this script will be used about qulity contorl in solid.
    you can find this script by google

    Comment


    • #3
      Hi,

      I am on index step, but apparently there is an error:

      In function "RGIndexLayoutCreate": Fatal Error[OutOfRange]. Message: Layout must begin with a one.

      Could someone help me with this problem???

      Comment


      • #4
        Originally posted by fenciso View Post
        Hi,

        I am on index step, but apparently there is an error:

        In function "RGIndexLayoutCreate": Fatal Error[OutOfRange]. Message: Layout must begin with a one.

        Could someone help me with this problem???

        you mean the indexing of the reference genome using bowtie?

        Below is the command I use (assuming bowtie-build and all_reference.fa are in the same folder)

        ./bowtie-build -C -f all_reference.fa reference_Color

        Comment


        • #5
          Originally posted by fanyucai1 View Post
          csfasta_quality_filter.pl this script will be used about qulity contorl in solid.
          you can find this script by google
          Thanks!
          I downloaded the program. It has many parameters to use for trimming. Can anyone tell me normally what value they use for filtering (if any)? Thanks!

          1. num_colors_to_hard_trim
          2. min_median_qv
          3. max_bad_colors_in_first_ten
          4. max_number_bad_colors
          5. num_consec_colors_to_trim
          6. trim_terminal_bad_colors
          7. min_read_length

          Comment


          • #6
            solid not supply the parameters, choose them by yourself, i advice you can statistics raw data before using last script.

            Comment


            • #7
              Originally posted by fanyucai1 View Post
              solid not supply the parameters, choose them by yourself, i advice you can statistics raw data before using last script.
              Thanks! I am new in the area and would like some advice on choosing the values for those parameters, such as: Minimum QV value for a single color.

              As to the statistics, besides median/average quality score, what else should I look into?

              Thanks!

              Comment


              • #8
                there is a paper called :Analysis of quality raw data of second generation sequencers with Quality Assessment Software. you can find it by google. it is very simple ,it wil help you .
                some parameters contains: min \max\Q20\mean\median you should consider

                Comment


                • #9
                  I downloaded and executet csfasta_quality_filter.pl script on my SOLiD 5500 csfasta (and qual) data to trim a fixed number of colors. ( code below)

                  perl csfasta_quality_filter.pl -f F5.cs fasta -q F5.QV.qual -o 5_bases_trimmed_F5.csfasta --num_colors_to_hard_trim 5

                  I was wondering about the output file, from the manual (Filtered and trimmed reads are output in csfasta format to a user-specified filename). And where is my associated QV.qual file trimmed? Any ideas? I cannot supply TopHat or any other alignment software with a trimmed csfasta and a full lenght associated quality file..

                  Comment


                  • #10
                    Originally posted by paolo.kunder View Post
                    I downloaded and executet csfasta_quality_filter.pl script on my SOLiD 5500 csfasta (and qual) data to trim a fixed number of colors. ( code below)

                    perl csfasta_quality_filter.pl -f F5.cs fasta -q F5.QV.qual -o 5_bases_trimmed_F5.csfasta --num_colors_to_hard_trim 5

                    I was wondering about the output file, from the manual (Filtered and trimmed reads are output in csfasta format to a user-specified filename). And where is my associated QV.qual file trimmed? Any ideas? I cannot supply TopHat or any other alignment software with a trimmed csfasta and a full lenght associated quality file..
                    This script could not output the .qual file after quality-contorl ,you could choose it from raw file according csfasta file .

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Best Practices for Single-Cell Sequencing Analysis
                      by seqadmin



                      While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                      Today, 07:15 AM
                    • seqadmin
                      Latest Developments in Precision Medicine
                      by seqadmin



                      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                      Somatic Genomics
                      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                      05-24-2024, 01:16 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:18 AM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Today, 08:04 AM
                    0 responses
                    12 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 06-03-2024, 06:55 AM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 05-30-2024, 03:16 PM
                    0 responses
                    27 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X