Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Jackken
    Member
    • Dec 2012
    • 10

    Exome sequencing alignment

    Hi,

    I used bowtie to align exome sequencing (Illumina GA), and this is what I got:


    # reads processed: 37205349
    # reads with at least one reported alignment: 141065 (0.38%)
    # reads that failed to align: 37064284 (99.62%)
    Reported 141065 alignments to 1 output stream(s)

    I am wondering what went wrong.

    By the way, here is the .fastq file looks like,

    @SRR350953.3 MENDEL_0047_FC62MN8AAXX:1:1:1488:946 length=152TTTTTTTT
    NTCCCATTATCTCAAGCAGCCATATGTTTCTCATTCACTTGATACACTGTTTCTTTTCAACCCCCACATCCTCACCGTGCTCAA
    ACAAAGAAACAGGTGGTGAGGATGTGGGGGTTGAAAAGAAACAGTGTATCAAGTGAATGAGAAACATA########B@<7:EFE
    +SRR350953.3 MENDEL_0047_FC62MN8AAXX:1:1:1488:946 length=152EEB@@>>D
    ############################################################################FAD=FCCC
    CCFDD;E??@@FD?BD>F?FB=BFAEFDGGDGGDG@BB=5=/;?8B=DDDGDGBDGDACAE@B@G88;CTCCTTCCAGGACCCA


    Thanks!
  • Jackken
    Member
    • Dec 2012
    • 10

    #2
    Exome sequencing alignment

    Can anybody help?

    Thanks!

    Comment

    • TonyBrooks
      Senior Member
      • Jun 2009
      • 303

      #3
      "#" is a Q-score of 2 (>60% error). There's your problem.

      Actually, after that the error rate improves with still plenty of sequence left in the read. You can try trimming your reads using fastq-trimmer and re-aligning.
      Last edited by TonyBrooks; 12-13-2012, 08:22 AM.

      Comment

      • Jackken
        Member
        • Dec 2012
        • 10

        #4
        Thanks! I will give it a try.

        Comment

        • Jackken
          Member
          • Dec 2012
          • 10

          #5
          fastx_trimmer gave me an error (see below), is there anyway to make it work?

          Thanks!

          fastx_trimmer: Invalid quality score value (char '#' ord 35 quality value -29) on line 4.

          @SRR350953.1 MENDEL_0047_FC62MN8AAXX:1:1:1206:930 length=152
          NTGATTTAGCTGCATAGTTTTCTTCTTTTTAATCCATAATGTATACATTTTAGACTTTGTATTTTAACTGCTGACATTCC
          AGTCTAAGTCGGAAGCCACATCTTCTAAACCAAATGTCTCTTCATCCCTTATGTCAGGAACCTATTTTTTTT
          +SRR350953.1 MENDEL_0047_FC62MN8AAXX:1:1:1206:930 length=152
          ############################################################################B@<7
          :EFEEBF?8B?3=;@9GGGG?;:C7CBABA=DG><GGB>DGE>3<EADGEC=DDB8GGD3<CE-EEB@@>>D

          Comment

          • quique_vzquez
            Junior Member
            • Dec 2012
            • 4

            #6
            Are you sure your data isn't paired end? When I've got that large reads from Illumina always are paired end.
            If it is pair, you must separate your file into two files before align.

            Comment

            • Jackken
              Member
              • Dec 2012
              • 10

              #7
              Many thanks!

              I am now trying to use "grep" to separate the original file into two.

              grep -A 1 "\.1 " originalfile.fastq > newfile_1.fastq
              grep -A 1 "\.2 " originalfile.fastq > newfile_2.fastq

              Comment

              • quique_vzquez
                Junior Member
                • Dec 2012
                • 4

                #8
                I separate them from the original .sra file with:
                fastq-dump --split-3 originalFile.sra

                Comment

                • Jackken
                  Member
                  • Dec 2012
                  • 10

                  #9
                  Thanks a lot!

                  I will try it.

                  Comment

                  • kmcarr
                    Senior Member
                    • May 2008
                    • 1181

                    #10
                    Originally posted by Jackken View Post
                    fastx_trimmer gave me an error (see below), is there anyway to make it work?

                    Thanks!

                    fastx_trimmer: Invalid quality score value (char '#' ord 35 quality value -29) on line 4.

                    @SRR350953.1 MENDEL_0047_FC62MN8AAXX:1:1:1206:930 length=152
                    NTGATTTAGCTGCATAGTTTTCTTCTTTTTAATCCATAATGTATACATTTTAGACTTTGTATTTTAACTGCTGACATTCC
                    AGTCTAAGTCGGAAGCCACATCTTCTAAACCAAATGTCTCTTCATCCCTTATGTCAGGAACCTATTTTTTTT
                    +SRR350953.1 MENDEL_0047_FC62MN8AAXX:1:1:1206:930 length=152
                    ############################################################################B@<7
                    :EFEEBF?8B?3=;@9GGGG?;:C7CBABA=DG><GGB>DGE>3<EADGEC=DDB8GGD3<CE-EEB@@>>D
                    The FASTX toolkit still assumes by default that all FASTQ files use the original Solexa Phred+64 encoding for their quality scores. Your file uses the (now standard) Phred+33 encoding. You have to explicitly tell fastx_trimmer that your file is Phred+33 by adding the parameter "-Q33" to your command line.

                    Comment

                    • Jackken
                      Member
                      • Dec 2012
                      • 10

                      #11
                      Thanks, kmcarr.

                      I think I didn't realize that it's paired end. So quique_vzquez is right. And I am separating the original .fastq file into two. I think it's working now.

                      quique_vzquez, thanks a lot!

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Today, 08:59 AM
                      0 responses
                      8 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      21 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      15 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      29 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...