Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie and Color Space

    Hey, I tried the latest release of Bowtie, which now supports color space.

    I downloaded the dataset from the link below and got the results further below after slightly less than 4 hours.

    (I study sequence patterns in Nucleosomal DNA.)


    ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX000/SRX000425/
    (both files)

    /Desktop/bowtie-0.12.0-beta1$ ./bowtie -C -q -a -m 1 c_elegans_ws200_c c_elegans_425.fastq c_elegans_425_positions.txt | awk -v OFS='\t' '{if($2 == "-") {$4 += (length($5)-1)} ; print $0}'

    # reads processed: 107422570
    # reads with at least one reported alignment: 35370418 (32.93%)
    # reads that failed to align: 66794981 (62.18%)
    # reads with alignments suppressed due to -m: 5257171 (4.89%)
    Reported 35370418 alignments to 1 output stream(s)

    I think the quality of reads using Solid is not as good as Solexa (i.e. more mismatches). I used the default 2 mismatch threshold.

    I'm going to try to compare with BFAST someday once I figure out how to make proper masks for C. elegans. Bowtie is nice for those who have no idea how these programs work and are not computer science majors. The manual is very easy to understand and one can simply download the indices for commonly used genomes from the website.

    -Clayton

  • #2
    Originally posted by cutcopy11 View Post
    Hey, I tried the latest release of Bowtie, which now supports color space.

    I downloaded the dataset from the link below and got the results further below after slightly less than 4 hours.

    (I study sequence patterns in Nucleosomal DNA.)


    ftp://ftp.ncbi.nlm.nih.gov/sra/static/SRX000/SRX000425/
    (both files)

    /Desktop/bowtie-0.12.0-beta1$ ./bowtie -C -q -a -m 1 c_elegans_ws200_c c_elegans_425.fastq c_elegans_425_positions.txt | awk -v OFS='\t' '{if($2 == "-") {$4 += (length($5)-1)} ; print $0}'

    # reads processed: 107422570
    # reads with at least one reported alignment: 35370418 (32.93%)
    # reads that failed to align: 66794981 (62.18%)
    # reads with alignments suppressed due to -m: 5257171 (4.89%)
    Reported 35370418 alignments to 1 output stream(s)

    I think the quality of reads using Solid is not as good as Solexa (i.e. more mismatches). I used the default 2 mismatch threshold.

    I'm going to try to compare with BFAST someday once I figure out how to make proper masks for C. elegans. Bowtie is nice for those who have no idea how these programs work and are not computer science majors. The manual is very easy to understand and one can simply download the indices for commonly used genomes from the website.

    -Clayton
    I would set the # of mismatches to 10% of the read length since SOLiD error can be up to 10% and a SNP will eat up two mismatches in color space.

    Comment


    • #3
      Can you tell me how to convert csfasta and qual files to csfastq? Thanks!

      Comment


      • #4
        exactly. you need to increase mismatch tolerance levels. it takes 2 mismatches for a single SNP, 1 mismatch is system error. Most people run 6 mismatches for a 50bp tag if you have the processing power.

        Comment


        • #5
          Originally posted by xuying View Post
          Can you tell me how to convert csfasta and qual files to csfastq? Thanks!
          Hi Xuying,
          You are much more likely to get an answer to a question if you start a new thread. Especially if the current thread is unrelated to your question.

          --
          Phillip

          Comment


          • #6
            Originally posted by xuying View Post
            Can you tell me how to convert csfasta and qual files to csfastq? Thanks!
            I can't, but google can: http://www.google.com/search?rlz=1C1...les+to+csfastq
            --
            Senthil Palanisami

            Comment


            • #7
              Originally posted by spenthil View Post
              If it is going to be that way, then try

              For all those people who find it more convenient to bother you with their question rather than to Google it for themselves.

              Comment


              • #8
                paired reads

                has anyone had success using bowtie with colorspace mate pair inputs?

                Comment


                • #9
                  Originally posted by snetmcom View Post
                  exactly. you need to increase mismatch tolerance levels. it takes 2 mismatches for a single SNP, 1 mismatch is system error. Most people run 6 mismatches for a 50bp tag if you have the processing power.
                  But, Bowtie only can set the mismatch up to 3 (-v). How could you fix it?

                  Comment


                  • #10
                    Originally posted by June View Post
                    But, Bowtie only can set the mismatch up to 3 (-v). How could you fix it?
                    It allows up to 3 mismatches in the seed, you don't need to set the -v parameter. You might need to increase -e to allow high-quality mismatches for SNPs though.

                    It is quite amasing to align 10 M reads in 15 min on a single node... But it seems strange that --best is not default.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 11:49 AM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 08:47 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    61 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X