Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rebrendi
    ng
    • May 2008
    • 78

    Paired-end Solexa data mapping wit Bowtie

    Hello,

    I am mapping paired-end reads from two files per lane, with the following Bowtie command line:

    ./bowtie -t -v 2 -p 8 -m 1 --solexa-quals mm9 -1 filename1.fastq -2 filename2.fastq outputfilename.map

    The program processed 200 million reads in 9 hours, but as a result only about 1% of them mapped. (I expect ~80% reads to be mapped for this experiment). It is very time-consuming to play with the Bowtie parameters for such large files, so I ask for your help.

    Any ideas what goes wrong?

    Thank you!
  • Bukowski
    Senior Member
    • Jan 2010
    • 388

    #2
    If only 1% map, then I'm sure taking the first 100,000 reads would give you plenty of sample data with which to tune your parameters without running the entire dataset through.

    Comment

    • rebrendi
      ng
      • May 2008
      • 78

      #3
      Still can't get them mapped. It's a good idea to use truncated files. I created test files with just 1000 first reads from the two paired-end files. Now I can play with Bowtie parameters.

      The question is which parameters should I change? I already tried changing --fr/--rf/--ff, no help. What are the other possible options?
      Last edited by rebrendi; 11-09-2011, 06:16 AM.

      Comment

      • ERG
        Junior Member
        • Sep 2011
        • 1

        #4
        I'm not sure of the reason why this has worked for me, but try switching the sequence in which you order the fastq files. So basically you'd put "-1 filename2.fastq -2 filename1.fastq"

        Good luck!

        Comment

        • rebrendi
          ng
          • May 2008
          • 78

          #5
          Originally posted by ERG View Post
          I'm not sure of the reason why this has worked for me, but try switching the sequence in which you order the fastq files. So basically you'd put "-1 filename2.fastq -2 filename1.fastq"

          Good luck!
          I tried this, no help

          Comment

          • kmcarr
            Senior Member
            • May 2008
            • 1181

            #6
            Originally posted by rebrendi View Post
            Hello,

            I am mapping paired-end reads from two files per lane, with the following Bowtie command line:

            ./bowtie -t -v 2 -p 8 -m 1 --solexa-quals mm9 -1 filename1.fastq -2 filename2.fastq outputfilename.map

            The program processed 200 million reads in 9 hours, but as a result only about 1% of them mapped. (I expect ~80% reads to be mapped for this experiment). It is very time-consuming to play with the Bowtie parameters for such large files, so I ask for your help.

            Any ideas what goes wrong?

            Thank you!
            "--solexa-quals" indicates the reads have Q-scores encoded from a version of the GA Pipeline prior to 1.3. This version is ancient (in NGS terms), are you sure about this? (Though this may be irrelevant since you are aligning in -v mode which nominally ignores Q-scores.)

            Increase -m to something > 1.

            Comment

            • cjp
              Member
              • Jun 2011
              • 58

              #7
              Can you map them as single end? Also try setting a larger value for -X depending on the insert size of the library.

              -X/--maxins <int> maximum insert size for paired-end alignment (default: 250)

              Chris

              Comment

              • fkrueger
                Senior Member
                • Sep 2009
                • 627

                #8
                It is probably indeed a matter of using the wrong quality settings and/or the -X paramter. The alignment summary in the end will tell you whether most reads got removed by the -m 1 parameter, but reducing alignments to 1% seems rather unrealistic.

                Another reason for this behavior might be processing the paired-end files with adapter/quality trimmers which remove sequences altogether. Sequence files need to be of the exact same length (same number of lines) and sequences need to correspond perfectly to each other in file 1 and file 2. Otherwise you might just try to align sequences from anywhere in the genome as sequence pairs and only a tiny subset will produce valid alignments.

                Comment

                • rebrendi
                  ng
                  • May 2008
                  • 78

                  #9
                  Originally posted by kmcarr View Post
                  "--solexa-quals" indicates the reads have Q-scores encoded from a version of the GA Pipeline prior to 1.3. This version is ancient (in NGS terms), are you sure about this? (Though this may be irrelevant since you are aligning in -v mode which nominally ignores Q-scores.)

                  Increase -m to something > 1.
                  I tried mapping without "--solexa-quals". The same result
                  I tried Increase -m to 3 and to 10, This increased the number of mapped reads to 2% and 4% correspondingly. Still not too much help.

                  Comment

                  • rebrendi
                    ng
                    • May 2008
                    • 78

                    #10
                    Originally posted by cjp View Post
                    Can you map them as single end? Also try setting a larger value for -X depending on the insert size of the library.

                    -X/--maxins <int> maximum insert size for paired-end alignment (default: 250)

                    Chris
                    I tried changing -X/--maxins <int> , The same result.
                    I tried mapping the two files independently in the single-read mode: 75% and 71% mapped for each of the file. So the data seems OK, but the paired-end mapping still does not work.

                    Comment

                    • rebrendi
                      ng
                      • May 2008
                      • 78

                      #11
                      Originally posted by fkrueger View Post
                      Another reason for this behavior might be processing the paired-end files with adapter/quality trimmers which remove sequences altogether. Sequence files need to be of the exact same length (same number of lines) and sequences need to correspond perfectly to each other in file 1 and file 2. Otherwise you might just try to align sequences from anywhere in the genome as sequence pairs and only a tiny subset will produce valid alignments.
                      I have checked: the two files have exactly the same length.

                      Comment

                      • fkrueger
                        Senior Member
                        • Sep 2009
                        • 627

                        #12
                        Could you post the first say 20 lines of each file? Do the reads have similar names or belong to the same cluster?

                        Comment

                        • cjp
                          Member
                          • Jun 2011
                          • 58

                          #13
                          Did you try other aligners such as BWA or Bowtie2. They are much better at pairing reads. Bowtie2 is easy to run and pretty quick too, but you'll need to reindex your genome.

                          example command:

                          bowtie2 -x /path/to/ref/hg19 -X 650 -p4 -1 r1.fq -2 r2.fq -S r12.bowtie2.sam

                          Chris

                          Comment

                          • rebrendi
                            ng
                            • May 2008
                            • 78

                            #14
                            Originally posted by fkrueger View Post
                            Could you post the first say 20 lines of each file? Do the reads have similar names or belong to the same cluster?
                            Here are the first 4 lines of the first file:

                            @HWI-ST841:93099JACXX:8:1101:1134:1866 1:N:0:
                            NGGTAAGTGAGAAAATCCCCCAAAGGAGACCAAGACNCTGTTTCCTGATGC
                            +
                            #1:ABBDDFCBDBEHHHHIGIIGEGEECFFGEC?BH#00B?D?BDFFEHG>
                            @HWI-ST841:93099JACXX:8:1101:1117:1870 1:N:0:
                            NGACGCTGAGAGTTGTCATGCCTCGGTGNNNNNNNNNNNNNNNNNNNTGGC
                            +
                            #4:BBBDD?DDD+A@EIEIIIIIEFI;E#######################
                            @HWI-ST841:93099JACXX:8:1101:1196:1879 1:N:0:
                            NGAAGGTCAACTTGATCCTGATTCAACTTTGGTACCTGGTATCTGTCCAGA
                            +
                            #1=DFFFFHHHHHJIJJJJJJJJJIJJJJJJJIIJJJJJJIIIJJJJJJHI
                            @HWI-ST841:93099JACXX:8:1101:1236:1882 1:N:0:
                            NGGCAGGCAAGCTAACTGCTGCTGTGATGTTCAAGGCATGTGTTACCCATC
                            Here are the first 4 lines of the second file:
                            @HWI-ST841:93099JACXX:8:1101:1134:1866 2:N:0:
                            AGCATCTGCGTCTCTGTTACTATTTTTCAGAATGAGGGAGGAATGGGATGG
                            +
                            @@@FDDADH?D<<CF+<A,A4,:AFHG########################
                            @HWI-ST841:93099JACXX:8:1101:1117:1870 2:N:0:
                            AAGGGAGGAAGGTGTGTCACCAGCCTAAGTGAATGTGGACTGTGCTGTTTA
                            +
                            @?@FFBDDFFFHFHHIJBHIIGIDGH3:C?DGHDGGGIGEHGHGDGGFHG@
                            @HWI-ST841:93099JACXX:8:1101:1196:1879 2:N:0:
                            AGATCCTGAAGAAATCCAAAACACCATCAGATCCTTCTACAAAAGGCTATA
                            +
                            CCCFFFFFHHHHGJJJJJJJJJJJJJJJJJIJJJJJJJJJIJJIIIJJJJI
                            @HWI-ST841:93099JACXX:8:1101:1236:1882 2:N:0:
                            AGGAGGAAGAAAGATTATAAAAGCTTTACAAAAGGTTCCGCCGTTGGAAGC

                            Comment

                            • rebrendi
                              ng
                              • May 2008
                              • 78

                              #15
                              Originally posted by cjp View Post
                              Did you try other aligners such as BWA or Bowtie2.
                              I tried Eland, there were also the same problems. I did not try BWA or Bowtie2.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 08:59 AM
                              0 responses
                              7 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...