Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Maq SNP filtering script bug?

    Hi,

    I have run across a problem when I tried to filtered the out.snp file by using SNPfilter of Maq
    So I test the Maq Demo, the same error happened again. Below is how I processed the analysis.
    I used the command "perl maq.pl SNPfilter out.snp >out.filtered.snp", then error was reported:

    "Use of uninitialized value in string ne at maq.pl line 286, <> line 59.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 59.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 60.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 60.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 62.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 62.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 101.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 101.
    Use of uninitialized value in string ne at maq.pl line 286, <> line 102.
    Use of uninitialized value in addition (+) at maq.pl line 286, <> line 102.
    ..............................
    ................................................."

    So I tracked original code of maq.pl, and found that at line 286 was like this:
    "$is_good = 0 unless ($t[4] >= $opts{q} || ($t[2] ne $t[9] && $t[4]+$t[10] >= $opts{q})); # consensus quality filter"
    but for my input file, out.snp, it did not contain $t[9] and $t[10], it only had 9 columns. That's why the error happened.

    Could anyone let me know how to solve this problem or what I did is wrong? Thanks.

  • #2
    hey! I'm not in a position to check now, but I think that SNPfilter needs the consensus as well.

    Comment


    • #3
      Hi ECO,

      Below is the quote from maq manual, I did not see any argument idicating its consensus, could you help me here? Thanks.


      maq.pl SNPfilter [-d minDep] [-D maxDep] [-Q maxMapQ] [-q minCnsQ] [-w indelWinSize] [-n minNeiQ] [-F in.indelpe] [-f in.indelsoa] [-s minScore] [-m maxAcross] [-a] [-N maxWinSNP] [-W densWinSize] in.cns2snp.snp > out.filtered.snp

      Rule out SNPs that are covered by few reads (specified by -d), by too many reads (specified by -D), near (specified by -w) to a potential indel, falling in a possible repetitve region (characterized by -Q), or having low-quality neighbouring bases (specified by -n). If maxWinSNP or more SNPs appear in any densWinSize window, they will also be filtered out together.

      OPTIONS:
      -d INT Minimum read depth required to call a SNP [3]
      -D INT Maximum read depth required to call a SNP (<255, otherwise ignored) [256]
      -Q INT Required maximum mapping quality of reads covering the SNP [40]
      -q INT Minimum consensus quality [20]
      -n INT Minimum adjacent consensus quality [20]
      -w INT Size of the window around the potential indels. SNPs that are close to indels will be suppressed [3]
      -F FILE The indelpe output [null]
      -f FILE The indelsoa output [null]
      -s INT Minimum score for a soa-indel to be considered [3]
      -m INT Maximum number of reads that can be mapped across a soa-indel [1]
      -a Alternative filter for single end alignment

      Comment


      • #4
        I dont know whats with the code... but the command sure works

        $ maq.pl SNPfilter Seq.fasta.snp
        gi|115315570 219 A R 31 255 1.00 63 62
        gi|115315570 1005 T Y 14 255 1.00 63 62
        gi|115315570 1576 G R 255 255 1.00 63 62
        ..

        Where
        $ head Seq.fasta.snp
        gi|115315570 219 A R 31 255 1.00 63 62
        gi|115315570 1005 T Y 14 255 1.00 63 62
        gi|115315570 1576 G R 255 255 1.00 63 62
        gi|115315570 1591 C Y 255 255 1.00 63 62
        gi|115315570 1595 G R 56 255 1.00 63 62
        gi|115315570 1689 C Y 255 255 1.00 63 62
        ..
        --
        bioinfosm

        Comment


        • #5
          I'll check when I get to my other computer, that perl error looks like a problem with your input file which starts on line 59.

          Comment


          • #6
            row 58: gi|162446888|ref|NC_010163.1| 73416 C S 159 45 1.31 63 62
            row 59: gi|162446888|ref|NC_010163.1| 73802 G T 3 55 2 0 2
            row 60: gi|162446888|ref|NC_010163.1| 74866 A C 3 42 2 0 2
            row 61: gi|162446888|ref|NC_010163.1| 75245 G S 40 49 2 63 62
            row62: gi|162446888|ref|NC_010163.1| 78151 C M 14 37 2 63 62


            The above rows (59,60,62) are the input lines example which are reported to have error. Apparently the fifth row's number is much lower than those of rows (58,61) which are not reported to be error.

            Could you or someone let me know why Maq.pl SNPfilter considers such lines have error? Thanks.
            Last edited by qiudao; 10-03-2008, 09:40 AM.

            Comment


            • #7
              maq SNP filter

              Hi,

              l looked a bit closer into the script and it seems to me that is really a bug.

              To me the line

              "$is_good = 0 unless ($t[4] >= $opts{q} || ($t[2] ne $t[9] && $t[4]+$t[10] >= $opts{q}));"

              means roughly translated:

              A snp is not good unless:
              - the consensus quality of the snp is greater or equal then the minimum consensus quality (-q, default: 20)

              - there is a second, different snp and the sum of both snp qualities is larger then the minimum consensus quality (-q, default: 20)

              I don't know what are the assumptions for this second condition, but in your case (in columns 59,60,62) the first condition failed and maq.pl could not evaluate the second condition because there is no second snp.

              I would suggest

              "$is_good = 0 unless ($t[4] >= $opts{q} || ($[9] && $t[2] ne $t[9] && $t[4]+$t[10] >= $opts{q}));"

              should fix your problem. Now the second condition fails already if there is no second snp.

              Any other optinions ?


              Cheers from Germany,

              Andy

              Comment


              • #8
                Maq does come with its own SNP caller...is the general feeling that it's not a very good one, and that's why people are writing their own?

                Comment


                • #9
                  Andpet,
                  Thanks for your reply. I agree with you. Defined the second snp first will solve this problem.
                  Thank you.

                  Comment


                  • #10
                    Originally posted by swbarnes2 View Post
                    Maq does come with its own SNP caller...is the general feeling that it's not a very good one, and that's why people are writing their own?
                    I also wish to have more discussion on SNPfiltering. MAQ SNPs are an inclusive list with false positives, but SNPfilter gets rid of a few good looking ones.

                    Any thoughts?
                    --
                    bioinfosm

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Recent Developments in Metagenomics
                      by seqadmin





                      Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                      09-23-2024, 06:35 AM
                    • seqadmin
                      Understanding Genetic Influence on Infectious Disease
                      by seqadmin




                      During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                      Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                      09-09-2024, 10:59 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 10-02-2024, 04:51 AM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-01-2024, 07:10 AM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-30-2024, 08:33 AM
                    0 responses
                    26 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-26-2024, 12:57 PM
                    0 responses
                    18 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X