While my de novo assembly runs, I wanted to get clarification on a what could be causing the large difference in alignment percentage between Bowtie and Bowtie2 and BBMap. Reminder of alignment scores below:
Bowtie: 43%
Bowtie2: 71%
BBMap: 78%
I keep seeing that Bowtie2 and BBMap are "more sensitive" or "more flexible" but I can't figure out what that means and I'm skeptical of the higher alignment scores because I'm afraid of including a bunch of false positives.
Two potential reasons that I'm aware of for Bowtie being substantially lower is because Bowtie doesn't allow ambiguous characters (e.g., N) in alignments and it disqualifies discordant reads, whereas Bowtie2 still seeks alignments despite these imperfections. I'd appreciate any additional insight in what could be causing these disparities since I don't want to blindly accept the higher alignment scores just because they're higher!
Edit: Update: Running Bowtie2 with the --no-mixed option turned on reduces the alignment percent to 54%. It appears that Bowtie is "--no-mixed" by default, therefore this could be a major contributor to the difference between my Bowtie and Bowtie2 percentages. In "--no-mixed" mode, alignments must involve both pairs of reads, therefore, in Bowtie I may be losing some reads just because only one of the reads constituting a "pair" aligns. Again, I'd appreciate anyone else's insights.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
BBMap suite has seal.sh and tadpole.sh. You may want to look at them as substitute assemblers (though trinity has been the assembler of choice for eukaryotes of late).
Leave a comment:
-
Originally posted by GenoMax View PostWhile that is tempting do you have reasons to suspect that may be case. Is the genome odd (large with multiple chromosomes, ploidy difference)?
Even though you may not have enough data you could try a de novo assembly and see what you get. Be sure to map the reads back to your own assembly to see what percentage of them map?
Thanks again!!
Leave a comment:
-
Originally posted by GenoMax View PostWhere do you see the 20% map?
Leave a comment:
-
Originally posted by michfish86 View PostEdit: Could the reason for the lack of matching to the reference transcriptome just be the presence of novel transcripts in my samples that were not present in the individuals in the other study that generated the reference? For instance, because the reference may not have included the specific type of tissue that I'm evaluating?
Even though you may not have enough data you could try a de novo assembly and see what you get. Be sure to map the reads back to your own assembly to see what percentage of them map?
Leave a comment:
-
Originally posted by Macspider View PostIt looks like you still don't map more than 20% or your reads though... Maybe you didn't trim them correctly? How did you do the trimming process? Maybe they still contain some adapter sequences and this penalizes many alignments.
Leave a comment:
-
For trimming, I used Trimmomatic. I used two different TruSeq3 Illumina adapter sequence sets that come with the program to remove the correct adapters. To verify that it worked, I ran FastQC before and after trimming. Before trimming, the "over-represented sequences" output was dominated by the adapter sequences and after trimming there were no adapter sequences listed.
Edit: Could the reason for the lack of matching to the reference transcriptome just be the presence of novel transcripts in my samples that were not present in the individuals in the other study that generated the reference? For instance, because the reference may not have included the specific type of tissue that I'm evaluating?Last edited by michfish86; 08-24-2016, 04:04 AM.
Leave a comment:
-
It looks like you still don't map more than 20% or your reads though... Maybe you didn't trim them correctly? How did you do the trimming process? Maybe they still contain some adapter sequences and this penalizes many alignments.
Leave a comment:
-
GenoMax: Thanks for suggesting BBMap. It seems like a very nice program/suite and quite intuitive to get running. Instead of retyping it all, I've attached a screenshot of my BBMap output. It appears to have mapped about 78% of the reads. I'm still trying to learn about the BBMap output so I'd appreciate any feedback you have.
Thanks again for helping me!
Edit: Just noticed I left off the read 2 output stats. They are very similar to read 1's output.Attached FilesLast edited by michfish86; 08-23-2016, 01:11 PM.
Leave a comment:
-
@michfish86: I am going to suggest that you try BBMap for these alignment. There is a separate thread for the program (suite) here. @Brian (author of BBMap) and I can help if you run into problems.
Leave a comment:
-
Maybe variability in supernumerary chromosomes between my reference and my samples is reducing my number of alignments?
Do you know if the supernumerarity comes with a divergence in the sequence?
May I suggest you to try to align them with any other alignment program that allows you to provide any number of mismatches? I don't know which one could do the trick, though!
EDIT: If I were you I would give BLAT a try. It might be slow but you can specify the minimum sequence identity between target and query and if you do some parallel tryouts like at 80%, 85%, 90%, 95%, 99% then you will see at which of these steps you gain mapped reads.Last edited by Macspider; 08-23-2016, 08:12 AM.
Leave a comment:
-
What I can say is that you could try to increase the number of mismatches allowed to 3, 4 and 5 and see how many reads you map in these cases.
Is your species allotetraploid? (it might help)
I think you should really make a list of all the parameters that bowtie2 applies by default and see which ones are NOT used in bowtie, and then report the list here.
However, looking through default settings for both programs, it appears the only major difference is in insert sizes, which we've already eliminated as an issue.
Leave a comment:
-
I am surprised by the fact that bowtie2 maps ~ 70% of the data while bowtie maps only 45%. I think you should really make a list of all the parameters that bowtie2 applies by default and see which ones are NOT used in bowtie, and then report the list here.
What I can say is that you could try to increase the number of mismatches allowed to 3, 4 and 5 and see how many reads you map in these cases. Is your species allotetraploid? (it might help)
Leave a comment:
-
- How many mismatches did you set? (default is 2 if I don't go wrong)
- How did you write the read files in the command?
- Did you try to do the same in Bowtie2, which has some default parameters and maybe that helps?
I also want to note that my current understanding is that when aligning to a reference transcriptome, an ungapped aligner (i.e., Bowtie) is preferable to a gapped aligner (i.e., Bowtie2). However, I see people using Bowtie2 for reference transcriptome alignments in the literature so perhaps I am wrong.
Bowtie
./bowtie REF -1 R1_reads -2 R2_reads --un unaligned --time
# reads processed 659098
# reads with at least one reported alignment: 281199 (42.66%)
# reads that failed to align: 377899 (57.34%)
Reported 281199 paired-end alignments to 1 output stream(s)
Bowtie2
./bowtie2 -x REF_Bowtie2 -1 R1_reads -2 R2_reads -S Results_Bowtie2 --un unaligned_Bowtie2
659098 (100.00%) were paired; of these:
299825 (45.49%) aligned concordantly 0 times
248293 (37.67%) aligned concordantly exactly 1 time
110980 (16.84%) aligned concordantly >1 times
----
299825 pairs aligned concordantly 0 times; of these:
39197 (13.07%) aligned discordantly 1 time
----
260628 pairs aligned 0 times concordantly or discordantly; of these:
521256 mates make up the pairs; of these:
385635 (73.98%) aligned 0 times
71968 (13.81%) aligned exactly 1 time
63653 (12.21%) aligned >1 times
70.75% overall alignment rate
- If you used indeed Bowtie2, the default max insert size is 500 nt and the min is (I think) 0 nt. I see that you declare 800 nt as max insert size, Maybe half of your libraries have an insert size bigger than that?
250 (default): 42.66% alignment (same run as above)
800: 45.88% alignment
1500: 45.88% alignment
Looks like insert size of 800 isn't an issue! I appreciate any additional insight that anyone can provide!!
Leave a comment:
-
Before reaching out to the authors, I'd suggest that you post here the BOWTIE command you ran (there must be somethign in the trinity log), to add info on the one from trinity you posted.
- How many mismatches did you set? (default is 2 if I don't go wrong)
- How did you write the read files in the command?
- Did you try to do the same in Bowtie2, which has some default parameters and maybe that helps?
- If you used indeed Bowtie2, the default max insert size is 500 nt and the min is (I think) 0 nt. I see that you declare 800 nt as max insert size, Maybe half of your libraries have an insert size bigger than that?Last edited by Macspider; 08-23-2016, 12:15 AM.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
[Article Coming Soon!]...-
Channel: Articles
Yesterday, 08:07 AM -
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
1 response
31 views
0 likes
|
Last Post
by EmiTom
Yesterday, 06:46 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: