Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Novoalign Mismatches

    Hey I am trying to use novoalign however I need to set the mismatch rate to 3 per read . Is there any parameter that will do that with reads without quality scores. If not is there a way to do it with base quality scores.

    Thanks
    Leanne

  • #2
    I think if you give the format as fasta (-F FA) and set the align threshold to be 90 (-t 90) then it will only allow 3 mismatches. Or give mock quality scores and use the same threshold.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Hey,

      I have tried both and they have not worked its still outputting perfect alignments.

      Any other ideas?

      Thanks for your help,
      Leanne

      Comment


      • #4
        Could you give your command line and a snippet of the input file and what the output looks like? I was looking at some of my academic lab's scripts and this:
        novoalign -d $repeat_index_file -r All -R 5 -F FA -f $fasta_file > $repeat_align_file
        for example will give lots of mismatch alignments. Now, if I didn't use the -r All option it would return only the perfect alignments because I am aligning reads against an index of themselves, so the alignment finds itself and since that is the best it doesn't show any others. Don't know if the same situation might apply to you since this case is a little atypical (instead of aligning reads to a reference genome).

        If you know reads with mismatches are in the input, what does it show for them in the output? Shouldn't it give either a U, NM or R and display some info for each read? Or is it filtering them away for some reason?
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          Here is the command I am working with now
          novoalign -r All -o SAM -d 1L1MdA_I_NovoInd_LW -f SRR519779test_NoQaul_LWv2.fa -F FA 1>pracv2.sam 2>pracv2.metrics


          I am working with 2 reads
          >SRR519779.1063095ILLUMINA-8787AF:52:FC:1:29:10379:5418length=36
          TACCTTGACAGCAGAGTCTTGCCCAAC
          >SRR519779.2989698ILLUMINA-8787AF:52:FC:1:81:1791:16837length=36
          ACAGCAGAGTCTTGCCCAACACCCG

          Now when I use the -t default setting and -t 90 I get the following sam output
          SRR519779.1063095ILLUMINA-8787AF:52:FC:1:29:10379:5418length=36 0 L1MdA_I 441 24 27M * 0 0 TACCTTGACAGCAGAGTCTTGCCCAAC * PG:Z:novoalign AS:i:3 UQ:i:3 NM:i:1 MD:Z:7M19
          SRR519779.2989698ILLUMINA-8787AF:52:FC:1:81:1791:16837length=36 0 L1MdA_I 448 24 25M * 0 0 ACAGCAGAGTCTTGCCCAACACCCG * PG:Z:novoalign AS:i:3 UQ:i:3 NM:i:1 MD:Z:0M24

          Here is the sequence I am trying to align to

          >Ref
          CGACTCGAGACTCGAGCCCCGGGCTACCTTGCCAGCAGAGTCTTGCCCAACACCCGCAAGGGTCCACACGGGACTCCCCACGGGACCCTAAGACCTCTGGTGAGTGGATCACAGTGCCTGCCCCAATCCAATCGCGCGGAACTTGAGACTGCGGTACATAGGGAAGCAGGCTACCCGGGCCTGATCTGGGGCACAAGTCCCTTCCGCTCGACTCGWGACTCGAGCCCCGGGCTACCTTGCCAGCAGAGTCTTGCCCAACACCCGCAAGGGTCCACACGGGACTCCCCACGGGACCCTAAGACCTCTGGTGAGTGGATCACAGTGCCTGCCCCAATCCAATCGCGCGGAACTYGAGACTGCGGTACATAGGGAAGCAGGCTACCCGGGCCTGATCTGGGGCACAAGTCCCTTCCGCTCGACTCGAGACTCGAGCCCCGGGCTACCTTGMCAGCAGAGTCTTGCCCAACACCCGCAAGGGCCCACACGGGACTCCCCACGGGACCCTAAGACCTCTGGTGAGTGGAACACAGCGCCTACCCCAATCCAATCGCGTGGAACTTGAGACTGCGGTACATAGGGAAGCAGGCTACCCGGGCTTGATCTGGGGCACAAACCCCTTCCACTCCACTCGAGCCCCGGCTACCTTGCCAGCTGAGTCGCCTGACACCCGCAAGGGCCCACACAGGATTCCACACGTGATCCTAAGACCTCTAGTGAGTGGAACACAACTTCTGCCAGGAGTCTGGTTCGAACACCAGATATCTGGGTACCTGCCTTGCAAGAAGAGAGCTTGCCTGCAGAGAATACTCTGCCCACTGAAACTAAGGAGAGTGCTACCCTCCAGGTCTGCTCATAGAGGCTAACAGAGTCACCTGAAGAACAAGCTCTTAACAGTGACAACTAAAACAGCTAGCTTCAGAGATTACCAGATGGCGAAAGGCAAACGTAAGAATCCTACTAACAGAAATCAAGACCACTCACCATCATCAGAACGCAGCACTCCCACCCCACCTAGTCCTGGGCACCCCAACACAACCGAAAATCTAGACCCAGATTTAAAAACATTTCTCATGATGATGATAGAGGACATCAAGAAGGACTTTCATAAGTCACTTAAAGATTTACAGGAGAGCACTGCTAAAGAGTTACAGGCTCTTAAAGAAAAGCAGGAAAACACAGCCAAACAGGTGATGGAAATGAACAAAACCATACTAGAACTAAAAGGGGAAGTAGACACAATAAAGAAAACCCAAAGCGAGGCAACGCTGGAGATAGAAACCCTAGGAAAGAGATCTGGAACCATAGATGCGAGCATCAGCAACAGAATACAAGAAATGGAAGAGAGAATCTCAGGTGCAGAAGATTCCATAGAGAACATCGACACAACAGTCAAAGAAAATACAAAATGCAAAAGGATCCTAACTCAAAACATCCAGGTAATCCAGGACACAATGAGAAGACCAAACCTACGGATAATAGGAATTGATGAGAATGAAGATTTTCAACTTAAAGGGCCAGCTAATATCTTCAACAAAATAATAGAAGAAAACTTCCCAAACATAAAAAAAGAGATGCCCATGATCATACAAGAAGCATACAGAACTCCAAATAGACTGGACCAGAAAAGAAATTCCTCCCGACACATAATAATCAGAACAACAAATGCACTAAATAAAGATAGAATATTAAAAGCAGTAAGGGAGAAAGGTCAAGTAACATATAAAGGAAGGCCTATCAGAATTACACCAGACTTTTCACCAGAGACTATGAAAGCCAGAAGAGCCTGGACAGATGTTATACAGACACTAAGAGAACACAAATGCCAGCCCAGGCTACTATACCCGGCCAAACTCTCAATTACCATAGATGGAGAAACCAAAGTATTCCACGACAAAACCAAGTTCACACAATATCTTTCCACGAATCCAGCCCTTCAAAGGATAATAACAGAAAAGAAGCAATACAAGGACGGAAATCACGCCCTAGAACAACCAAGAAAGTAATCATTCAACAAACCAAAAAGAAGACAGCCACAAGAACAGAATGCCAACTCTAACAACAAAAATAAAAGGGAGCAACAATTACTTTTCCTTAATATCTCTTAATATCAATGGACTCAATTCCCCAATAAAAAGACATAGACTAACAGACTGGCTACACAAACAGGACCCAACATTCTGCTGCTTACAGGAAACCCATCTCAGGGAAAAAGACAGACACTACCTCAGAGTGAAAGGCTGGAAAACAATTTTCCAAGCAAATGGACTGAAGAAACAAGCTGGAGTAGCCATTTTAATATCGGATAAAATCGACTTCCAACCCAAAGTTATCAAAAAAGACAAGGAGGGACACTTCATACTCATCAAAGGTAAAATCCTCCAAGAGGAACTCTCAATTCTGAATATCTACGCACCAAATGCAAGGGCAGCCACATTCATTAGAGACACTTTAGTAAAGCTCAAAGCATACATTGCACCTCACACAATAATAGTGGGAGACTTCAACACACCACTTTCTTCAAAGGACAGATCGTGGAAACAGAAACTAAACAGGGACACAGTGAAACTAACAGAAGTTATGAAACAAATGGACCTGACAGATATCTACAGAACATTTTATCCTAAAACAAAAGGATATACCTTCTTCTCAGCACCTCACGGGACCTTCTCCAAAATTGACCATATAATTGGTCACAAAACAGGCCTCAATAGATACAAAAATATTGAAATTGTCCCATGTATCCTATCAGACCACCATGGCCTAAGACTGATCTTCAATAACAACATAAATAATGGAAAGCCAACATTCACGTGGAAACTGAATAACACTCTTCTCAATGATACCTTGGTCAAGGAAGGAATAAAGAAAGAAATTAAAGACTTTTTAGAGTTTAATGAAAATGAAGCCACAACGTACCCAAACCTATGGGACACAATGAAAGCATTTCTAAGAGGGAAACTCATAGCGCTGAGTGCCTCCAAGAAGAAACGGGAGACAGCACATACTAGCAGCTTGACAACACATCTAAAAGCCCTAGAAAAAAAGGAAGCAAATTCACCCAAGAGGAGTAGACGGCAGGAAATAATCAAACTCAGGGGTGAAATCAACCAAGTGGAAACAAGAAGAACTATTCAAAGAATTAACCAAACGAGGAGTTGGTTCTTTGAGAAAATCAACAAGATAGATAAACCCTTAGCTAGACTCACTAAAGGGCACAGGGACAAAATCCTAATTAACAAAATCAGAAATGAAAAGGGAGACATAACAACAGATCCTGAAGAAATCCAAAACACCATCAGATCCTTCTACAAAAGGCTATACTCAACAAAACTGGAAAACCTGGACGAAATGGACAAATTTCTGGACAGATACCAGGTACCAAAGTTGAATCAGGATCAAGTTGACCATCTAAACAGTCCCATATCACCTAAAGAAATAGAAGCAGTTATTAATAGTCTCCCAACCAAAAAAAGCCCAGGACCAGATGGGTTTAGTGCAGAGTTCTATCAGACCTTCAAAGAAGATCTAATTCCAATTCTGCACAAACTATTTCACAAAATAGAAGTAGAAGGTACTCTACCCAACTCATTTTATGAAGCCACTATTACTCTGATACCTAAACCACAGAAAGATCCAACAAAGATAGAGAACTTCAGACCAATTTCTCTTATGAATATCGATGCAAAAATCCTCAATAAAATTCTCGCTAACCGAATCCAAGAACACATTAAAGCAATCATCCATCCTGACCAAGTAGGTTTTATTCCAGGGATGCAGGGATGGTTTAATATACGAAAATCCATCAATGTAATCCATTATATAAACAAACTCAAAGACAAAAACCACATGATCATCTCGTTAGATGCAGAAAAAGCATTTGACAAGATCCAACACCCATTCATGATAAAAGTTTTGGAAAGATCAGGAATTCAAGGCCCATACCTAAACATGATAAAAGCAATCTACAGCAAACCAGTAGCCAACATCAAAGTAAATGGAGAGAAGCTGGAAGCAATCCCACTAAAATCAGGGACTAGACAAGGCTGCCCACTTTCTCCCTACCTTTTCAACATAGTACTTGAAGTATTAGCCAGAGCAATTCGACAACAAAAGGAGATCAAGGGGATACAAATTGGAAAAGAGGAAGTCAAAATATCACTTTTTGCAGATGATATGATAGTATATATAAGTGACCCTAAAAATTCTACCAGAGAACTCCTAAACCTGATAAACAGCTTCGGTGAAGTAGCTGGATATAAAATAAACTCAAACAAGTCAATGGCCTTTCTCTATACAAAGAATAAACAGGCTGAGAAAGAAATTAGGGAAACAACACCCTTCTCAATAGTCACAAATAATATAAAATATCTTGGCGTGACTCTAACTAAGGAGGTGAAAGATCTGTATGATAAAAACTTCAAATCTCTGAAGAAAGAAATTAAAGAAGATCTCAGAAGATGGAAAGATCTCCCATGCTCATGGATTGGCAGGATCAACATTGTAAAAATGGCTATCTTGCCAAAAGCAATCTACAGATTCAATGCAATCCCCATCAAAATTCCAACTCAATTCTTCAACGAATTGGAAGGAGCAATTTGCAAATTTGTCTGGAATAACAAAAAACCTAGGATAGCAAAAAGTCTTCTCAAGGATAAAAGAACTTCTGGCGGAATCACCATGCCAGACCTAAAGCTTTACTACAGAGCAATTGTGATAAAAACTGCATGGTACTGGTATAGAGACAGACAAGTAGACCAATGGAATAGAATTGAAGATCCAGAAATGAACCCACACACCTATGGTCACTTGATCTTCGACAAGGGAGCTAAAACCATCCAGTGGAAGAAAGACAGCATTTTCAACAATTGGTGCTGGCACAACTGGTTGTTATCGTGTAGAAGAATGCGAATCGATCCATACTTATCTCCTTGTACTAAGGTCAAATCTAAGTGGATCAAGGAACTTCACATAAAACCAGAGACACTGAAACTTATAGAGGAGAAAGTGGGGAAAAGCCTTGAAGATATGGGCACAGGGGAAAAATTCCTGAACAGAACAGCAATGGCTTGTGCTGTAAGATCGAGAATCGACAAATGGGACCTAATGAAACTCCAAAGTTTCTGCAAGGCAAAAGACACCGTCAATAAGACAAAAAGACCACCAACAGATTGGGAAAGGATCTTTACCTATCCTAAATCAGATAGGGGACTAATATCCAACATATATAAAGAACTCAAGAAGGTGGACTTCAGAAAATCAAATAACCCCATTAAAAAATGGGGCTCAGAACTGAACAAAGAATTCTCACCTGAGGAATACCGAATGGCAGAGAAGCACTTGAAAAAATGTTCAACATCCTTAATCATCAGGGAAATGCAAATCAAAACAACCCTGAGATTCCACCTCACACCAGTCAGAATGGCTAAGATCAAAAATTCAGGTGACAGCAGATGCTGGCGAGGATGTGGAGAAAGAGGAACACTCCTCCATTGTTGGTGGGAGTGCAGGCTTGTACAACCACTCTGGAAATCAGTCTGGCGGTTCCTCAGAAAACTGGACATAGTACTACCGGAGGATCCAGCAATACCTCTCCTGGGCATATATCCAGAAGATGCCCCAACAGGTAAGAAGGACACATGCTCCACTATGTTCATAGCAGCCTTATTTATAATAGCCAGAAGCTGGAAAGAACCTAGATGCCCCTCAACAGAGGAATGGATACAGAAAATGTGGTACATCTACACAATGGAGTACTACTCAGCTATTAAAAAGAATGAATTTATGAAATTCCTAGCCAAATGGATGGACCTGGAGGGCATCATCCTGAGTGAGGTAACACATTCACAAAGAAACTCACACAATATGTATTCACTGATAAGTGGATATTAGCCCCAAACCTAGGATACCCAAGATATAAGATATAATTTGCTAAACACATGAAACTCAAGGAGAATGAAGACTGAAGTGTGGACACTATGCCCCTCCTTAGATTTGGGAACAAAACACCCATGGAAGGAGTTACAGAGACGGAGTTTGGAGCTGAGATGAAAGGATGGACCATGTAGAGACTGCCATAGCCAGGGATCCACCCCATAATCAGCATCCAAACGCTGACACCATTGCATACACTAGCAAGATTTTATTGAAAGGACGCAGATGTAGCTGTCTCTTGTGAGACTATGCCGGGGCCCAGCAAACACAGAAGTGGATGCTCACAGTCAGCTAATGGATGGATCATAGGGCTCCCAATGGAGGAGCTAGAGAAAGTAGCCAAGGAGCTAAAGGGATCTGCAACCCTATAGGTGGAACAACATTATGAGCTAACCAGTACCCCGGAGCTCTTGACTCTAGCTGCATATATATCAAAAGATGGCCTAGTCGGCCATCACTGGAAAGAGAGGCCCATTGGACTTGCAAACTTTATATGCCCCAGTACAGGGGAATACCAGGGCCAAAAAGGGGGAGTGGGTGGGCAGGGGAGTGGGGGTGGGTGGATATGGGGGACTTTTGGTATAGCATTGGAAATGTAAATGAGTTAAATACCTAATAAAAAATGGAAAAAAA

          Comment


          • #6
            Just a note if there were 3 mismatches allowed I should be getting two more alignment results for the first read starting at bases 233 and 25

            Thanks again

            Comment


            • #7
              I'm a little hazy about this part, when does -r All stop reporting alignments? It must use some sort of cut-off. I would imagine for your short reads there would be a large jump in score between a perfect match and three mismatches so -t 90 might not be high enough. Try -E 10 and the -t option to the maximum score possible (depends on the novoalign version, I think). You could also try adding some extra reads that match the mismatch regions, match with 1 or 2 mismatches and see what they do. Sorry to be sort of guessing about this... it has been a while since I've played around in depth with novoalign.
              Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-25-2024, 11:49 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X