Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie --very-sensitive option

    hiii all,
    i have a paired end data with each read having 100nt

    i am using ./bowtie2 -q -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 -x mygenome -1 a.fastq -2 b.fastq

    while using this command do i have to specify the -M 1 option as well since i want only 1 best valid alignment?

    wht is the benefit of getting the output in sam format?

  • #2
    -M reporting mode

    hi,
    -M is a reporting mode which means, search for more than n (n being specified as -M n) valid alignments and report the best one. So to be sure that the one reported is the best alignment, you should pass a greater number to -M. So, -M 5 would mean, search for at least 5 valid alignments & report the best one. Passing a higher value to -M means that bowtie has extensively searched the alignment space before deciding the best one. But remember that a higher values slow down bowtie significantly.
    I used -M 5. For complex organisms with increased repeat content, it makes sense to pass such a value to to do justice to reads originated from repeat region which have possibility of >1 valid alignment. For microbes, a smaller value to -M would be fine.

    As about SAM format, it is the de facto standard format for alignment result. Getting in this format helps you to pipe the output to other downstream programs like transcript assemblers or variant callers.

    Comment


    • #3
      hii actually i find it little peculiar because when i am submitting the command like this:
      ./bowtie2 -q -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 -x mygenome -1 a.fastq -2 b.fastq

      it has given me 17gb data and when i just used -M 1 it has given me 2.2 GB OUTPUT
      my refgenome is a mammalian genome and it would have a large number of repeats
      i am wondering whether this much difference would be possible as told i am doing it to align paired ends (end to end mode) and i dont want any discordants ( --no-dicordants )

      Comment


      • #4
        Is your data RNA-Seq.. If yes, then you should be using TopHat aligner (or any other splice-junction aware aligner).

        What I can guess is that in your first cmdline, "-D 20 -R 3 ..... "
        you are using the bowtie2 --very-sensitive mode. Here you are asking Bowtie2 to search extensively for the best alignment for each read. This basically would mean that bowtie2 would find more than 1 valid alignment for each read and then report the best one as the default reporting mode is -M.

        But when you explicitly specify -M 1, then you are limiting bowtie2 by saying that it should search for (n+1) i.e. 2 valid alignments ONLY and then report the best one. Now, even if you use the --very-sensitive mode, passing -M 1 is essentially limiting bowtie2 from being sensitive, leave alone being very-sensitive.

        Since this is paired-end data, possibly -M 1 is returning non-discordant alignments for the mates and your no-discordant option leaves only 2.2 Gb as output.
        Moreover, you haven't passed the mate orientation option (--fr etc.) and the mate inner distance (-I and -X option). I am not sure that if bowtie2 is setting some default values for these parameters, are they optimal for your dataset.
        Pass these parameters and check. And if passing sensitive/ very-sensitive mode, leave -M option. bowtie2 would return the best alignment anyways.

        Comment


        • #5
          its not rnaseq data. i am using --no-discordant to make sure that it considers the paired end mode anyways. your description about -M has made me realize the problem. as for the -I and -X options i agree with the defaults min 0 and max 500. so i am not specifying it. thank you very much for ur replies really helpful

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-25-2024, 11:49 AM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          62 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          61 views
          0 likes
          Last Post seqadmin  
          Working...
          X