Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mparida85
    Member
    • Jan 2014
    • 17

    tophat mapping

    Hi
    All
    Here is a problem when I map with bowtie2/tophat2.
    The unmapped file shows sequences that didn't map due to parameters while running tophat or bad vendor quality discarded reads.
    But when I pick some of those sequences and do a blast with the reference genome I used for mapping I see alignment with mismatches less than what I asked for in the -N/--read-edit-distance option of topaht2.

    The scores of the blast alignment are within a range of 34-83. Does this mean that bowtie2 also applies a threshold alignment score for alignment.
    FYI unmapped reads are singletons (which means there pair mapped but they didn't)
    Please comment.

    Hope this makes sense.
    Rocky
    Last edited by mparida85; 01-29-2014, 12:21 PM.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    There are a couple things likely contributing to this. Firstly, bowtie2 does apply its own scoring threshold, modifiable with the --b2-score-min option. Secondly, remember that blast will do local alignments and bowtie2/tophat2 global alignments. So if the ends of the reads don't map then you're unlikely to map them with tophat2.

    One interesting thing to do would be to look into the run.log and see if modifying the --read-edit-distance actually changes the --score-min passed to bowtie2, which default to only allowing about 2 mismatch. If it doesn't, then increasing the --read-edit-distance probably won't do much (it'll only have an effect when mismatches occur at positions with low phred scores).

    Comment

    • sphil
      Senior Member
      • Apr 2010
      • 192

      #3
      You can also try to generate local alignments with bowtie2 and check if you get more aligned sequences.
      here are the options for local alignments:
      Preset options in --local mode

      --very-fast-local



      Same as: -D 5 -R 1 -N 0 -L 25 -i S,1,2.00

      --fast-local



      Same as: -D 10 -R 2 -N 0 -L 22 -i S,1,1.75

      --sensitive-local



      Same as: -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default in --local mode)

      --very-sensitive-local



      Same as: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50

      Comment

      • mparida85
        Member
        • Jan 2014
        • 17

        #4
        tophat/bowtie mapping

        Hi Guys
        Thanks for your replies.
        I never completely understood what --read-edit-distance does actually?
        I am using -N 5 --read-edit-dist 5 and --b2-sensitive. All I know is --read-edit-distance has to be >= -N option.

        dpryan please elaborate a little on that.
        I will highly appreciate your concern.
        Also thanks for your suggestion too sphil. I will definitely try that experiment.

        I am learning a lot from this blog. Also I just did some more digging into my unmapped.bam file and turns out there are paired end reads of which one read BLATs well and other read maps to nothing. May be the other read is a contaminant of some sort. I read in the manual of trim_galore (a fastq trimmer, simon andrews) that bowtie rejects pairs based on "whenever a start/end coordinate is contained within the other read".

        Does anyone has ideas on if that might fit to my issue?
        Last edited by mparida85; 02-01-2014, 10:06 PM.

        Comment

        • dpryan
          Devon Ryan
          • Jul 2011
          • 3478

          #5
          An edit distance is a generalization of the concept of number of mismatches (in point of fact, it's a common distance metric for string comparisons). The general idea is that the edit distance is the number of changes to string A required to produce string B. If the only difference between the two is mismatches (e.g. you have an A in one and a T at the same place in another), then the edit distance and number of mismatches are the same. If you have an insertion or deletion between the two strings, then the number of mismatches will be less than the edit distance, as the former lacks any conception of what an insertion or deletion is. Since having insertions/deletions is relatively common when dealing with sequencing data, the concept of an edit distance is rather more useful than the number of mismatches.

          Which part of my previous reply would you like me to expound upon?

          Regarding what you read in the trim_galore manual, keep in mind that this is dependent on the version of bowtie that you're using. Bowtie1 doesn't deal with overlapping reads well at all. Bowtie2, however, can deal properly with these, provided you allow it to. Bowtie2 defaults to allowing alignment where one mate is contained either partially or entirely within the other. It doesn't allow "dovetail" alignments unless you pass the "--dovetail" flag, which I don't think tophat2 allows.

          Relatedly, you might consider allowing "mixed" and "discordant" alignments, if you've told tophat2 to disallow them.

          Comment

          • mparida85
            Member
            • Jan 2014
            • 17

            #6
            reply to dpryan

            Hi
            dpryan
            You already explained read-edit-distance in your first paragraph. That's what I was requesting you to explain. Thank you a lot.

            FYI I am using bowtie2/tophat2 for mapping. I have allowed discordant and mixed alignments, because I can see them in my alignment summary reports.
            I don't think I allowed dovetail alignment. I can try running some of the unmapped reads to experiment with them.


            Question:
            a) If there are some overrepresented sequences in my fastq file and most of them are non-coding RNA(rRNA, mitochondrial) is it good practice to allow them to map using bowtie2/tophat2 pipeline because they have good phred quality score but their per base sequence content(a.k.a ATGC plot) shows bias in sequence content?

            I think as long as a sequence is of good quality we should allow it to map, doesn't matter where it came from, except adapter contaminantion and poor quality reads.

            Again I cannot thank you enough for your time and knowledge that you are sharing with me.
            Rocky

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              I would normally only trim off adapter sequence and low-quality bases. Reads will often show bias in the first 10-13 bases, that's nothing to worry too much about. Similarly, having rRNA show up as an over-represented sequence is pretty normal and nothing to worry about. I should add that having a high duplication rate is also normal for RNAseq datasets.

              Comment

              • mparida85
                Member
                • Jan 2014
                • 17

                #8
                rRNA reads

                Hi dpryan
                Question:
                1) do we remove the rRNA reads before calculating FPKM using cufflinks?
                The reason I ask this question is because I am seeing some differentially expressed rRNA genes in my significantly differentially expressed genes list.
                Please comment.
                Rocky

                Comment

                • dpryan
                  Devon Ryan
                  • Jul 2011
                  • 3478

                  #9
                  It's usually recommended to do so, at least unless you're actually interested in looking at rRNA. I think cufflinks has an option where you can mask some regions from analysis. My understanding, at least, is that that's geared toward avoiding rRNAs or other supper highly expressed transcripts that are likely to suppress FPKM/RPKM scores and increase variance.

                  Comment

                  Latest Articles

                  Collapse

                  • GATTACAT
                    Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by GATTACAT
                    Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                    Today, 11:43 AM
                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Yesterday, 05:37 AM
                  0 responses
                  7 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  17 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  52 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  110 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...