Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • xuying
    Member
    • Mar 2008
    • 16

    Hi Ben:
    I will put the csfastq (maybe part of it) later somewhere because it's huge.
    And I am using bowtie 0.12.1 (but color index was built by using 0.12-beta).
    There are millions of lines in SAM and pileup files. So fixed "48M" in SAM and fixed "A" in pileup file are unreasonable. (pls wait for me to send you the csfastq files). Thanks a lot! :-)

    Comment

    • Ben Langmead
      Senior Member
      • Sep 2008
      • 200

      Originally posted by xuying View Post
      There are millions of lines in SAM and pileup files. So fixed "48M" in SAM and fixed "A" in pileup file are unreasonable. (pls wait for me to send you the csfastq files). Thanks a lot! :-)
      Why? Given that M = "match or mismatch", when would you expect something other than 48M?

      Ben

      Comment

      • xuying
        Member
        • Mar 2008
        • 16

        Oh, yes, sorry. I just confused the file with CIGAR notation.

        Comment

        • xuying
          Member
          • Mar 2008
          • 16

          Hi Ben:
          It seems I can't find a suitable place to put my csfastq file.
          Here I just show some lines in the csfastq file generated from program "solid2fastq" of bfast. Do you think it is ok to go? Should I remove the first primer letter and 1st color to get a true base there?

          @2292_469_84
          T210002310010221002200330303002200201120221.2111.2.
          +
          8<;==:=@?=<<>>>;;??<=<;96:?:5<>;85:=7,,:5/",(/)"*"
          @2292_469_216
          T000111101020011320222113222200220200120202.2222.2.
          +
          /6=>=::>>=;==>;;6=;;9<6:8<(3:-<;/9:852=-7/"2(6)")"
          @2292_469_274
          T300101122322222232222222210222222222022220.2222.2.
          +
          ,=#$$#@%#'#>$,&(;$*$*=)*'&6%,%##*,+#,4),#)",5'#","

          Comment

          • acnoll
            Member
            • Mar 2008
            • 14

            Option for output of pairs where only one end aligns

            With bowtie's current set of options is it possible to have pairs with only one end mapping to the genome be included in the alignment file (e.g. sam file)? I am interested in identifying intra-read short indels through the
            anchoring of one of a mate pair's ends.

            Comment

            • SillyPoint
              Member
              • May 2008
              • 39

              I'd just logged on here to post exactly the question acnoll poses above: "is it possible to have pairs with only one end mapping to the genome be included in the alignment file?"

              The implication there, which after reading the manual and running Bowtie 0.12.1 I believe, is that only read pairs which both match, and fall within the -I/-X constraints, will be output. True?

              The alternative for now is to specify the -a option to get all the mapped output, and post-process that to find what you're interested in, be that the best pair (for some definition of "best"), or reads where only one end matches.

              To have the option to do that directly in Bowtie would be nice.

              --TS

              Comment

              • bekkari
                Member
                • Oct 2009
                • 10

                Hi Ben,
                Can some one pleast let me know whether bowtie works with longer inserts (~20kb) between mate pairs?

                Thanks

                Comment

                • malcook
                  Member
                  • Sep 2009
                  • 24

                  bowtie: should I mask the pseudoautosomal segments of human genome

                  What do you think of my plan to mask the pseudoautosomal segments of human Y chromosome prior to running bowtie on an RNASeq project.

                  Since pseudoautosomal portion of human genome chromosomes X & Y are sequence-wise identical, any alignment strategy that utilizes only unique alignments will discard all alignments to these regions, as each aligning read will have two matches. Thus the 24 known genes wont be counted.

                  I plan to use EMBOSS' `maskseq` to "hard mask" (replace with 'N') chrY prior to building the bowtie indices at:

                  chrY:10001-2649520
                  chrY:59034050-59363566

                  Does anyone see a problem with this approach?

                  I see the `--ntoa` option of the bowtie manual that explicitly states that "By default, Ns are simply excluded from the index and bowtie will not report alignments that overlap them." Does anyone know if the same is true for Xs?

                  Finally, do you agree that the ability to direct bowtie-build to ignore portions of <reference_in> would be a sensible feature to request?

                  Thanks for thinking!

                  Malcolm Cook
                  Stowers Institute for Medical Research

                  Comment

                  • amaer
                    Member
                    • Oct 2009
                    • 15

                    Originally posted by Ben Langmead View Post
                    Hi amaer,

                    Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules . But by end-of-year is a reasonable guess.

                    Thanks,
                    Ben
                    Hi Ben,

                    What's the status of doing gapped alignments? Do you have an estimated date?

                    thanks, and keep up the great work!

                    Comment

                    • Ben Langmead
                      Senior Member
                      • Sep 2008
                      • 200

                      I'm working on this now. I don't have any time estimates.

                      Thanks,
                      Ben

                      Comment

                      • jlmlj
                        Junior Member
                        • Dec 2009
                        • 7

                        Hi Dr. lengmead,

                        I am doing data analysis for ChIP-seq experiments on transcription factor binding sites. I have 5 million raw reads (76 bp read length) per sample from Illumina platform. I used bowite 0.11.3 to align these reads to reference human genome.

                        The code I used for one high quality alignment was:
                        ~/120809_ChiPseq/bowtie-0.11.3_linux_x86_64/bowtie --solexa1.3-quals -v 2 -a -m 1 -t -p 30 --un result_chipseq2/index2.hq.un --max result_chipseq2/index2.hq.max indexes_chipseq1/h_sapiens_asm reads/index2.fq > result_chipseq2/index2.hq.bt

                        The result is as below:
                        Reads uniquely aligned was 45~%,
                        Reads multiple aligned was ~6%,
                        Read failed to align was ~49%.

                        Then I increased mismatches to 3 (-v 3) and trimmed the low quality end (--trim3 22). However I still had ~45% reads failed to align.

                        There are two questions bother me:
                        1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?

                        2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?

                        Many thanks for your help,
                        jlmlj

                        Comment

                        • Xi Wang
                          Senior Member
                          • Oct 2009
                          • 317

                          There are two questions bother me:
                          1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?
                          There is another parameter set of bowtie to deal with the mismaches when mapping reads back to the reference genome: -n -e -l

                          2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?
                          Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.
                          Xi Wang

                          Comment

                          • jlmlj
                            Junior Member
                            • Dec 2009
                            • 7

                            "Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.[/QUOTE]"

                            Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...

                            I am thinking to try a couple of parameters, such as --strata, however it looks a bit tricky and I am not sure of the way to handle it yet

                            Comment

                            • Xi Wang
                              Senior Member
                              • Oct 2009
                              • 317

                              Originally posted by jlmlj View Post
                              Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...
                              I meant here also the 'N's existing in the human reference genome. Our group have observed many cases where lots of reads packed at the neighbor of 'N' regions.
                              Hope this helps.
                              Xi Wang

                              Comment

                              • Chipper
                                Senior Member
                                • Mar 2008
                                • 323

                                Originally posted by jlmlj View Post

                                The result is as below:
                                Reads uniquely aligned was 45~%,
                                Reads multiple aligned was ~6%,
                                Read failed to align was ~49%.
                                51% aligned is not too bad, but yo could try also without the -v parameter to allow more mismatches in the 3' end.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  Yesterday, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 12:03 PM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, Yesterday, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...