Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • why can't I get alignment with BWA?

    What's the general steps to align a set of reads to ref?
    Is the flowing right?
    1.build index -> bwa index -c -a is ecoli.fa
    2.convert csfasta and correspond qual file into fastq
    3.Find the SA coordinates of the input reads -> bwa aln -n 3 -t 4 -M 3 -c ecoli.fa t_ecoli.fastq >map.sai
    4.Generate alignments in the SAM format given single-end reads -> bwa samse -n 1 ecoli.fa map.sai t_ecoli.fastq >map.sam
    Do as the above, I get a sam file with no reads mapped. All the 25000 items have flag 4.
    What's more, I have 50000 reads in the input fastq file, why does it only process 25000?
    If I delete one read form the fastq file, BWA will process 24999 reads. Why?
    And what is the bwasw for?
    Thank you !
    Last edited by gigigou; 09-06-2012, 05:54 AM.

  • #2
    I think in step 3 you are missing the output character. It should be:

    bwa aln -n 3 -t 4 -M 3 -c ecoli.fa t_ecoli.fastq > map.sai

    See if that helps

    Comment


    • #3
      Originally posted by jimmybee View Post
      I think in step 3 you are missing the output character. It should be:

      bwa aln -n 3 -t 4 -M 3 -c ecoli.fa t_ecoli.fastq > map.sai

      See if that helps
      Oh, sorry that I missed the ">" in the post, but I did add it in the command line.
      So it can't be the reason.
      Actually I do get the align file, but in the file I find there is no mapped items. All the 25000 reads are reported unmapped.
      thank you all the same

      Comment


      • #4
        Originally posted by gigigou View Post
        What's the general steps to align a set of reads to ref?
        Is the flowing right?
        1.build index -> bwa index -c -a is ecoli.fa
        2.convert csfasta and correspond qual file into fastq
        3.Find the SA coordinates of the input reads -> bwa aln -n 3 -t 4 -M 3 -c ecoli.fa t_ecoli.fastq >map.sai
        4.Generate alignments in the SAM format given single-end reads -> bwa samse -n 1 ecoli.fa map.sai t_ecoli.fastq >map.sam
        Do as the above, I get a sam file with no reads mapped. All the 25000 items have flag 4.
        What's more, I have 50000 reads in the input fastq file, why does it only process 25000?
        If I delete one read form the fastq file, BWA will process 24999 reads. Why?
        And what is the bwasw for?
        Thank you !
        You are making a color space index, but converting your reads to fastq? Are you sure that's right?

        Comment


        • #5
          I agree with swbarnes, after creating the color-space index then give bwa the color-space input file and not a nucleotide file.

          Comment


          • #6
            Originally posted by swbarnes2 View Post
            You are making a color space index, but converting your reads to fastq? Are you sure that's right?
            The reads are converted into fastq format, but they are still in color space
            @SRR001354.lite.sra.1 461_28_1048
            T23331323333132332323133333332333232
            +
            %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

            I tried to input the original csfasta file, it processed 50000 reads, but still with no reads mapped.

            50000 + 0 in total (QC-passed reads + QC-failed reads)
            0 + 0 duplicates
            0 + 0 mapped (0.00%:-nan%)
            0 + 0 paired in sequencing
            0 + 0 read1
            0 + 0 read2
            0 + 0 properly paired (-nan%:-nan%)
            0 + 0 with itself and mate mapped
            0 + 0 singletons (-nan%:-nan%)
            0 + 0 with mate mapped to a different chr
            0 + 0 with mate mapped to a different chr (mapQ>=5)

            Comment


            • #7
              Originally posted by westerman View Post
              I agree with swbarnes, after creating the color-space index then give bwa the color-space input file and not a nucleotide file.
              In the fsatq file that converted from csfasta, the reads are still in color space, as above.

              Comment


              • #8
                Since we do not have access to your files it is hard to troubleshoot problems. But a couple of pieces of general advice in troubleshooting a problem:

                1) Do not use any non-standard options; e.g., try 'aln' and 'samse' without any parameters.

                2) Try a smaller input file. Especially one with a known-to-be-good sequence. In other words take a bit of your ecoli reference and make it into a color-space read.

                Good luck in solving this.

                Comment


                • #9
                  Provided you have done what others have already suggested before ...

                  (and this may be a stupid suggestion) but are you sure the data you have is from your samples (i.e. there was no mix-up at the place where got it sequenced).

                  Comment


                  • #10
                    Originally posted by westerman View Post
                    Since we do not have access to your files it is hard to troubleshoot problems. But a couple of pieces of general advice in troubleshooting a problem:

                    1) Do not use any non-standard options; e.g., try 'aln' and 'samse' without any parameters.

                    2) Try a smaller input file. Especially one with a known-to-be-good sequence. In other words take a bit of your ecoli reference and make it into a color-space read.

                    Good luck in solving this.
                    Thank you.
                    All the parameters are specified as the manual says.
                    I think the input file is small enough, only 50000 reads.
                    Thank you for your help.
                    I'll look into it.

                    Comment


                    • #11
                      Originally posted by GenoMax View Post
                      Provided you have done what others have already suggested before ...

                      (and this may be a stupid suggestion) but are you sure the data you have is from your samples (i.e. there was no mix-up at the place where got it sequenced).
                      I think the data is not the problem cause I have used other align tools to do the alignment, and they all give perfect results. So I think the problem is BWA, maybe I didn't specify the parameters properly, I'll try to solve it.
                      Thank you!

                      Comment


                      • #12
                        I have solved the problem.
                        In step 2, the file's name should be specified as the pl says.
                        But I think it is a little inconvenient, so I modified the pl to let it work more efficiently.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Genetic Variation in Immunogenetics and Antibody Diversity
                          by seqadmin



                          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                          11-06-2024, 07:24 PM
                        • seqadmin
                          Choosing Between NGS and qPCR
                          by seqadmin



                          Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                          10-18-2024, 07:11 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Today, 11:09 AM
                        0 responses
                        22 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Today, 06:13 AM
                        0 responses
                        20 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 11-01-2024, 06:09 AM
                        0 responses
                        30 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 10-30-2024, 05:31 AM
                        0 responses
                        21 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X