Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low overall alignment rate

    Dear All,
    I did run a file.fasta using bowtie2 with a command bellow

    ./bowtie2 -x ~/tan_analysis/rice1 -U ~/tan_analysis/GBS20130709_S1_L001_R1_001.fastq -S GBS_test.sam
    1604137 reads; of these:
    1604137 (100.00%) were unpaired; of these:
    1583639 (98.72%) aligned 0 times
    16736 (1.04%) aligned exactly 1 time
    3762 (0.23%) aligned >1 times
    1.28% overall alignment rate

    my question is why the overall alignment rate very low/

  • #2
    It's rather difficult to say without seeing the data. Perhaps you need to trim your reads. Perhaps the reference is just not that similar. Perhaps your samples were swapped with someone elses. Try doing local alignment and see if that helps. Alternatively, blast a few of the reads and see what you get.


    • #3
      Dear dpryan,
      Thank you very much for your suggestions

      i thinhk i should trim my reads, because i did not trim.

      i am looking for how to trim.

      do you have any suggestion how to trim barcode and low alignment rate?


      • #4
        Are you aligning to the correct reference/genome? Take some of the sequences that Bowtie isn't aligning and BLAT them, do they align?


        • #5
          I don’t think a 1% alignment rate will be fixed by trimming. For more help you should put an example output from fastqc up. That will usually tell you if your reads need trimming.

          If you’re using a reference genome that is even a little bit diverged from your species, you’ll need to loosen the parameters for bowtie (even some out-bread populations in certain species have enough sequence diversity that this can be an issue). You should also use the local alignment dpryan suggests, its more flexible than end-to-end and could negate the need to trim.


          • #6
            Total of 1.2 Million reads seems to be a very small number when considering that this is rice genome OP is aligning to (what kind of an experiment is this BTW?).

            As others have said you should do some QC (if needed trimming) before doing the alignments. It would not hurt to take a set of reads (convert them to fasta) and just blast them against genbank to see if you have the right sequence (i.e. rice).


            • #7
              Thank you very much for your suggestion,
              After I removed barcode and Illumina sequence in the data, i got these result

              2398641 reads; of these:
              2398641 (100.00%) were unpaired; of these:
              269976 (11.26%) aligned 0 times
              1542588 (64.31%) aligned exactly 1 time
              586077 (24.43%) aligned >1 times
              88.74% overall alignment rate

              it look better than previous one


              • #8
                very low alignment rates with bowtie2 and bwa


                I am getting really low alignment rates too.

                Bowtie2 gives me the following output:

                #map the reads
                -bash-4.1$ ./bowtie2 -p 1 -x AER -1 S25_R1_001.fastq -2 S25_R2_001.fastq > S25_bowtie2.sam

                4240966 reads; of these:
                4240966 (100.00%) were paired; of these:
                4240777 (100.00%) aligned concordantly 0 times
                161 (0.00%) aligned concordantly exactly 1 time
                28 (0.00%) aligned concordantly >1 times
                4240777 pairs aligned concordantly 0 times; of these:
                10902 (0.26%) aligned discordantly 1 time
                4229875 pairs aligned 0 times concordantly or discordantly; of these:
                8459750 mates make up the pairs; of these:
                8168790 (96.56%) aligned 0 times
                49047 (0.58%) aligned exactly 1 time
                241913 (2.86%) aligned >1 times
                3.69% overall alignment rate

                and bwa gives me the following output (I ran samtools flagstat command to see % overall alignment rate, which is 47% if I understand the output correctly)

                #map the reads
                -bash-4.1$ bwa mem -t 4 AER.fasta S25_R1_001.fastq S25_R2_001.fastq > S25_bwa.sam
                #convert to bam
                -bash-4.1$ ./samtools view -bS S25_bwa.sam > S25_bwa.bam
                #get flagstats
                -bash-4.1$ ./samtools flagstat S25_bwa.bam

                8816220 + 0 in total (QC-passed reads + QC-failed reads)
                0 + 0 secondary
                334288 + 0 supplementary
                0 + 0 duplicates
                4153225 + 0 mapped (47.11%:-nan%)
                8481932 + 0 paired in sequencing
                4240966 + 0 read1
                4240966 + 0 read2
                2605074 + 0 properly paired (30.71%:-nan%)
                3513674 + 0 with itself and mate mapped
                305263 + 0 singletons (3.60%:-nan%)
                901042 + 0 with mate mapped to a different chr
                41706 + 0 with mate mapped to a different chr (mapQ>=5)

                I have read that bwa me is generally a very aggressive aligner and that probably explains the 47% rate.

                I am looking into how to extract unmapped reads so that I can blast them to see any contamination issues.
                The fastqc reports look fine (no red crosses), especially after I trim the first ~10 and last ~3 bases. Is there any other quality control I should be doing before mapping? What are all of the reasons for low alignment rates?


                Latest Articles


                • seqadmin
                  The Impact of AI in Genomic Medicine
                  by seqadmin

                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                  02-26-2024, 02:07 PM
                • seqadmin
                  Multiomics Techniques Advancing Disease Research
                  by seqadmin

                  New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                  A major leap in the field has
                  02-08-2024, 06:33 AM





                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:12 AM
                0 responses
                Last Post seqadmin  
                Started by seqadmin, 02-23-2024, 04:11 PM
                0 responses
                Last Post seqadmin  
                Started by seqadmin, 02-21-2024, 08:52 AM
                0 responses
                Last Post seqadmin  
                Started by seqadmin, 02-20-2024, 08:57 AM
                0 responses
                Last Post seqadmin