Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • michfish86
    replied
    While my de novo assembly runs, I wanted to get clarification on a what could be causing the large difference in alignment percentage between Bowtie and Bowtie2 and BBMap. Reminder of alignment scores below:

    Bowtie: 43%
    Bowtie2: 71%
    BBMap: 78%

    I keep seeing that Bowtie2 and BBMap are "more sensitive" or "more flexible" but I can't figure out what that means and I'm skeptical of the higher alignment scores because I'm afraid of including a bunch of false positives.

    Two potential reasons that I'm aware of for Bowtie being substantially lower is because Bowtie doesn't allow ambiguous characters (e.g., N) in alignments and it disqualifies discordant reads, whereas Bowtie2 still seeks alignments despite these imperfections. I'd appreciate any additional insight in what could be causing these disparities since I don't want to blindly accept the higher alignment scores just because they're higher!

    Edit: Update: Running Bowtie2 with the --no-mixed option turned on reduces the alignment percent to 54%. It appears that Bowtie is "--no-mixed" by default, therefore this could be a major contributor to the difference between my Bowtie and Bowtie2 percentages. In "--no-mixed" mode, alignments must involve both pairs of reads, therefore, in Bowtie I may be losing some reads just because only one of the reads constituting a "pair" aligns. Again, I'd appreciate anyone else's insights.
    Last edited by michfish86; 08-25-2016, 12:12 PM.

    Leave a comment:


  • GenoMax
    replied
    BBMap suite has seal.sh and tadpole.sh. You may want to look at them as substitute assemblers (though trinity has been the assembler of choice for eukaryotes of late).

    Leave a comment:


  • michfish86
    replied
    Originally posted by GenoMax View Post
    While that is tempting do you have reasons to suspect that may be case. Is the genome odd (large with multiple chromosomes, ploidy difference)?
    Looking back at the paper that accompanies the reference transcriptome, they used several different tissue types, but did not include the exact tissue that I sequenced. The genome isn't particularly huge (C-value 4.75), but the chromosomal arraignments have been tough to nail down. One study that looked into it found a diploid number of 110 with a range of 94-185.

    Even though you may not have enough data you could try a de novo assembly and see what you get. Be sure to map the reads back to your own assembly to see what percentage of them map?
    I will give it a shot.

    Thanks again!!

    Leave a comment:


  • michfish86
    replied
    Originally posted by GenoMax View Post
    Where do you see the 20% map?
    I think Macspider's phrasing a just a little confusing. I think s/he means that more than 20% of my reads are not mapping.

    Leave a comment:


  • GenoMax
    replied
    Originally posted by michfish86 View Post
    Edit: Could the reason for the lack of matching to the reference transcriptome just be the presence of novel transcripts in my samples that were not present in the individuals in the other study that generated the reference? For instance, because the reference may not have included the specific type of tissue that I'm evaluating?
    While that is tempting do you have reasons to suspect that may be case. Is the genome odd (large with multiple chromosomes, ploidy difference)?

    Even though you may not have enough data you could try a de novo assembly and see what you get. Be sure to map the reads back to your own assembly to see what percentage of them map?

    Leave a comment:


  • GenoMax
    replied
    Originally posted by Macspider View Post
    It looks like you still don't map more than 20% or your reads though... Maybe you didn't trim them correctly? How did you do the trimming process? Maybe they still contain some adapter sequences and this penalizes many alignments.
    Where do you see the 20% map?

    Leave a comment:


  • michfish86
    replied
    For trimming, I used Trimmomatic. I used two different TruSeq3 Illumina adapter sequence sets that come with the program to remove the correct adapters. To verify that it worked, I ran FastQC before and after trimming. Before trimming, the "over-represented sequences" output was dominated by the adapter sequences and after trimming there were no adapter sequences listed.

    Edit: Could the reason for the lack of matching to the reference transcriptome just be the presence of novel transcripts in my samples that were not present in the individuals in the other study that generated the reference? For instance, because the reference may not have included the specific type of tissue that I'm evaluating?
    Last edited by michfish86; 08-24-2016, 04:04 AM.

    Leave a comment:


  • Macspider
    replied
    It looks like you still don't map more than 20% or your reads though... Maybe you didn't trim them correctly? How did you do the trimming process? Maybe they still contain some adapter sequences and this penalizes many alignments.

    Leave a comment:


  • michfish86
    replied
    GenoMax: Thanks for suggesting BBMap. It seems like a very nice program/suite and quite intuitive to get running. Instead of retyping it all, I've attached a screenshot of my BBMap output. It appears to have mapped about 78% of the reads. I'm still trying to learn about the BBMap output so I'd appreciate any feedback you have.

    Thanks again for helping me!

    Edit: Just noticed I left off the read 2 output stats. They are very similar to read 1's output.
    Attached Files
    Last edited by michfish86; 08-23-2016, 01:11 PM.

    Leave a comment:


  • GenoMax
    replied
    @michfish86: I am going to suggest that you try BBMap for these alignment. There is a separate thread for the program (suite) here. @Brian (author of BBMap) and I can help if you run into problems.

    Leave a comment:


  • Macspider
    replied
    Maybe variability in supernumerary chromosomes between my reference and my samples is reducing my number of alignments?
    I'm not sure about this. Because the supernumerarity itself doesn't affect the alignment if the sequence is the same for each supernumerary chromosome. If the sequence diverges, then you lose a lot of reads because you allow only 3 mismatches and many of your reads were obtained from supernumerary chromosomes.

    Do you know if the supernumerarity comes with a divergence in the sequence?
    May I suggest you to try to align them with any other alignment program that allows you to provide any number of mismatches? I don't know which one could do the trick, though!

    EDIT: If I were you I would give BLAT a try. It might be slow but you can specify the minimum sequence identity between target and query and if you do some parallel tryouts like at 80%, 85%, 90%, 95%, 99% then you will see at which of these steps you gain mapped reads.
    Last edited by Macspider; 08-23-2016, 08:12 AM.

    Leave a comment:


  • michfish86
    replied
    What I can say is that you could try to increase the number of mismatches allowed to 3, 4 and 5 and see how many reads you map in these cases.
    Unfortunately, Bowtie only allows up to 3 mismatches. I tried increasing the number of mismatches from 2 to 3 and it resulted in about a 1% alignment increase. Bowtie2 only allows 0 or 1 mismatches


    Is your species allotetraploid? (it might help)
    No, it is not, but your question led me to some new information. The species has a large number of small chromosomes (>100) and evidence of supernumerary chromosomes. Maybe variability in supernumerary chromosomes between my reference and my samples is reducing my number of alignments?

    I think you should really make a list of all the parameters that bowtie2 applies by default and see which ones are NOT used in bowtie, and then report the list here.
    Please correct me if I'm wrong, but I'm not sure this would be helpful because the programs work in such different ways. In particular, I think the big increases for alignments in Bowtie2 is due to it allowing for gapped alignments (http://bowtie-bio.sourceforge.net/bo...-from-bowtie-1). This is why I think Bowtie2 inflates the number of alignments- since I'm aligning transcripts to a transcriptome, there should be no gaps (right?).

    However, looking through default settings for both programs, it appears the only major difference is in insert sizes, which we've already eliminated as an issue.

    Leave a comment:


  • Macspider
    replied
    I am surprised by the fact that bowtie2 maps ~ 70% of the data while bowtie maps only 45%. I think you should really make a list of all the parameters that bowtie2 applies by default and see which ones are NOT used in bowtie, and then report the list here.

    What I can say is that you could try to increase the number of mismatches allowed to 3, 4 and 5 and see how many reads you map in these cases. Is your species allotetraploid? (it might help)

    Leave a comment:


  • michfish86
    replied
    - How many mismatches did you set? (default is 2 if I don't go wrong)
    I kept the default, which is indeed 2 mismatches.

    - How did you write the read files in the command?
    I'm sorry but I'm not certain what you mean besides what I indicated in the Trinity command in my initial post.

    - Did you try to do the same in Bowtie2, which has some default parameters and maybe that helps?
    Because Trinity seems limited on available options for Bowtie, I ran the data through Bowtie and Bowtie2 outside of Trinity for the sake of addressing your question. I just ran default values in both. The code and the results for the same individual using each program are below.

    I also want to note that my current understanding is that when aligning to a reference transcriptome, an ungapped aligner (i.e., Bowtie) is preferable to a gapped aligner (i.e., Bowtie2). However, I see people using Bowtie2 for reference transcriptome alignments in the literature so perhaps I am wrong.

    Bowtie

    ./bowtie REF -1 R1_reads -2 R2_reads --un unaligned --time

    # reads processed 659098
    # reads with at least one reported alignment: 281199 (42.66%)
    # reads that failed to align: 377899 (57.34%)
    Reported 281199 paired-end alignments to 1 output stream(s)


    Bowtie2

    ./bowtie2 -x REF_Bowtie2 -1 R1_reads -2 R2_reads -S Results_Bowtie2 --un unaligned_Bowtie2

    659098 (100.00%) were paired; of these:
    299825 (45.49%) aligned concordantly 0 times
    248293 (37.67%) aligned concordantly exactly 1 time
    110980 (16.84%) aligned concordantly >1 times
    ----
    299825 pairs aligned concordantly 0 times; of these:
    39197 (13.07%) aligned discordantly 1 time
    ----
    260628 pairs aligned 0 times concordantly or discordantly; of these:
    521256 mates make up the pairs; of these:
    385635 (73.98%) aligned 0 times
    71968 (13.81%) aligned exactly 1 time
    63653 (12.21%) aligned >1 times

    70.75% overall alignment rate


    - If you used indeed Bowtie2, the default max insert size is 500 nt and the min is (I think) 0 nt. I see that you declare 800 nt as max insert size, Maybe half of your libraries have an insert size bigger than that?
    To check into insert sizes, I ran Bowtie with 3 different insert size values.

    250 (default): 42.66% alignment (same run as above)
    800: 45.88% alignment
    1500: 45.88% alignment

    Looks like insert size of 800 isn't an issue! I appreciate any additional insight that anyone can provide!!

    Leave a comment:


  • Macspider
    replied
    Before reaching out to the authors, I'd suggest that you post here the BOWTIE command you ran (there must be somethign in the trinity log), to add info on the one from trinity you posted.
    - How many mismatches did you set? (default is 2 if I don't go wrong)
    - How did you write the read files in the command?
    - Did you try to do the same in Bowtie2, which has some default parameters and maybe that helps?
    - If you used indeed Bowtie2, the default max insert size is 500 nt and the min is (I think) 0 nt. I see that you declare 800 nt as max insert size, Maybe half of your libraries have an insert size bigger than that?
    Last edited by Macspider; 08-23-2016, 12:15 AM.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin


    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    [Article Coming Soon!]...
    Yesterday, 08:07 AM
  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM
  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
25 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
1 response
31 views
0 likes
Last Post EmiTom
by EmiTom
 
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
20 views
0 likes
Last Post seqadmin  
Working...
X