Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SOAP2 -v doesn't work

    SOAP2 -v maximum number of mismatches allowed on a read

    I have tried -v 3 -v 2 -v 4 for 50bp read, but the maximum number of mismatch in the output is always 2. there are no read wish mismatch 3 or more.



    my command:
    soap -a reads/PT0012.3_1.fastq -b reads/PT0012.3_2.fastq -D genome.fa.index -m 0 -x 500000 -u 3.unmap -o 3.paired -2 3.unpaired -v 3 -r 2 -p 2




    Usage: soap [options]
    -a <str> query a file, *.fq, *.fa
    -b <str> query b file
    -D <str> reference sequences indexing table, *.index format
    -o <str> output alignment file(txt)
    -M <int> match mode for each read or the seed part of read, which shouldn't contain more than 2 mismaches, [4]
    0: exact match only
    1: 1 mismatch match only
    2: 2 mismatch match only
    4: find the best hits
    -u <str> output unmapped reads file
    -t output reads id instead reads name, [none]
    -l <int> align the initial n bps as a seed [256] means whole length of read
    -n <int> filter low-quality reads containing >n Ns before alignment, [5]
    -r [0,1,2] how to report repeat hits, 0=none; 1=random one; 2=all, [1]
    -m <int> minimal insert size allowed, [400]
    -x <int> maximal insert size allowed, [600]
    -2 <str> output file of unpaired alignment hits
    -v <int> maximum number of mismatches allowed on a read. [5] bp
    -s <int> minimal alignment length (for soft clip) [255] bp
    -g <int> one continuous gap size allowed on a read. [0] bp
    -R for long insert size of pair end reads RF. [none](means FR pair)
    -e <int> will not allow gap exist inside n-bp edge of a read, default=5
    -p <int> number of processors to use, [1]

    -h this help

  • #2
    You have to use it with some combination of -M # . I'm not sure why but that's what I've been told.

    - Steve

    Comment


    • #3
      -M <int> match mode for each read or the seed part of read, which shouldn't contain more than 2 mismaches, [4]
      0: exact match only
      1: 1 mismatch match only
      2: 2 mismatch match only
      4: find the best hits

      what‘ that mean? If i want to find alignment with 3 mismatch, how to set the parameter M?
      Last edited by baohua100; 08-26-2009, 08:32 PM.

      Comment


      • #4
        Sorry I can't help more, a co-worker told me of that difference btw SOAP and SOAP2 so that's about all I know. I'll ask him later today to see if he's figured out more of the rules.

        It would be nice if there was a better standard for documentation of Bioinformatics apps. Seems like the only way to use some of these programs correctly is to contact the author.

        Comment


        • #5
          I would like to know
          Last edited by karthikprabu; 11-25-2009, 07:45 AM.

          Comment


          • #6
            Hello everybody,

            Does anybody knows how to deal with that problem?
            I would like to work with 4 mismatches per read and I don't know how to do...

            Comment


            • #7
              I have the same problem,have anyone figured it out? pls let me know.Thanks!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                Yesterday, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              57 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              48 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              55 views
              0 likes
              Last Post seqadmin  
              Working...
              X