Header Leaderboard Ad

Collapse

bowtie and maq questions

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie and maq questions

    Hi all,

    OK, after spending quite amount of time reading and researching on aligners, although I am still almost as novice as I was before the reading, I am able to run bowtie and maq with my computer (well, at least in some extent).

    In order to help me to learn more about how those aligners work, I ran two experiments like this:

    1. run bowtie as instructed in Tutorial with e_coli_1000.fq which is included in bowtie package
    Code:
    ~/Desktop/genome/HTdata/e_coli_536$ ../../Bowtie/bowtie-0.9.9.2/bowtie e_coli ../../Bowtie/bowtie-0.9.9.2/reads/e_coli_1000.fq > e_coli.bowtie.txt
    The output of this command is
    Code:
    Reported 699 alignments to 1 output stream(s)
    together with a nice output e_coli.bowtie.txt file that I can check what sequence is aligned, where and what the error of that sequence is.

    2. run maq easyrun with the same e_coli.fq file. Since the built-in e_coli in bowtie is E_coli_536, I went to NCBI ftp site and downloaded NC_008253.fna. The MAQ run command is:
    Code:
    :~/Desktop/genome/HTdata/e_coli_536$ maq.pl easyrun -d e_coli NC_008253 ../../Bowtie/bowtie-0.9.9.2/reads/e_coli_1000.fq > e_coli.maq.txt
    e_coli.maq.txt file shows
    Code:
    -- == statmap report ==
    
    -- # single end (SE) reads: 1000
    -- # mapped SE reads: 745 (/ 1000 = 74.5%)
    -- # paired end (PE) reads: 0
    -- # mapped PE reads: 0 (/ 0 = NA%)
    -- # reads that are mapped in pairs: 0 (/ 0 = NA%)
    -- # Q>=30 reads that are moved to meet mate-pair requirement: 0 (/ 0 = NA%)
    -- # Q<30 reads that are moved to meet mate-pair requirement: 0 (NA%)
    So I have some questions:

    a. Why BOWTIE and MAQ gave different results with the same data set (MAQ gave 745 mapped reads and BOWTIE gave 699)? How I can set parameters for both bowtie and maq to get the same results?

    b. e_coli.bowtie.txt is a nice text file together with a summary of the mapped reads and errors. How I can check with MAQ output files to have the same summary file, say a file with a summary of mapped reads and their errors?

    c. what software I can use for post-alignment analysis? I tried maq mapview but I can only see one mapped read at a time. Is there a software which can show a nice alignment view like BLAT with the error as well as the coordinates of the read on the reference?

    Sorry for such a long post and thank you all in advance. Any input will be greatly appreciated.

    D.

  • #2
    For question A:
    http://www.nature.com/nbt/journal/v2...t0509-455.html

    Comment


    • #3
      Question C:
      Have you tried consed viewer ( http://www.phrap.org/consed/consed.html#howToGet ) ?

      Comment


      • #4
        Further elaboration on A:

        My fault. The reads/e_coli_1000.fq file I include with Bowtie has instances where an N in the read lines up with a non-zero quality value. The Illumina pipeline (AFAIK) doesn't do this, and Maq automatically rounds quality values corresponding to Ns down to 0. Bowtie doesn't, hence the difference. You can fix the .fq file with this script:

        #!/usr/bin/perl -w

        while(<>) {
        my $name = $_;
        my $seq = <>; chomp($seq);
        my @seqa = split(//, $seq);
        my $name2 = <>;
        my $quals = <>; chomp($quals);
        my @qualsa = split(//, $quals);
        for(my $i = 0; $i <= $#seqa; $i++) {
        $qualsa[$i] = "!" if($seqa[$i] eq 'N');
        }
        print "$name$seq\n$name2";
        print join("", @qualsa) . "\n";
        }
        Then, if you run bowtie with its default parameters on the fastq output by the script, you should see it report 753 alignments.

        Sorry for the confusion,
        Ben

        Comment


        • #5
          Originally posted by Ben Langmead View Post
          Further elaboration on A:

          My fault. The reads/e_coli_1000.fq file I include with Bowtie has instances where an N in the read lines up with a non-zero quality value. The Illumina pipeline (AFAIK) doesn't do this, and Maq automatically rounds quality values corresponding to Ns down to 0. Bowtie doesn't, hence the difference. You can fix the .fq file with this script:


          Then, if you run bowtie with its default parameters on the fastq output by the script, you should see it report 753 alignments.

          Sorry for the confusion,
          Ben
          I got it! Now 753 vs 745 is closed enough . Thanks Ben for your script. Also thank to Pepe for a paper, it is really helpful to novice like me.

          Any other input about MAQ? I think there must have many MAQ users here in the forum

          Thanks,

          D.

          Comment


          • #6
            Originally posted by Coffeebean View Post
            Question C:
            Have you tried consed viewer ( http://www.phrap.org/consed/consed.html#howToGet ) ?
            No, I haven't. But Consed seems... picky to me , as in order to run, it requires other softwares like phred etc... Anyway, I will give it a try. Thanks Coffeebean.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
              by seqadmin


              ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

              01-24-2023, 01:19 PM
            • seqadmin
              Introduction to Single-Cell Sequencing
              by seqadmin
              Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

              The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
              ...
              01-09-2023, 03:10 PM
            • seqadmin
              AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
              by seqadmin
              Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

              Read type and length
              AVITI is a short-read benchtop sequencer that also offers an innovative...
              12-29-2022, 10:44 AM

            ad_right_rmr

            Collapse
            Working...
            X