Ouch. Cloudflare ate my response
Anyway -
You can split into mouse and non-mouse reads with BBMap like this:
bbmap.sh ref=mm9.fa in=reads.fq outm=mouse.fq outu=nonmouse.fq
For more elaborate splitting into one set of reads per organism (specifically, per reference file), you can use BBSplit:
bbsplit.sh ref=mm9.fa,virus1.fa,virus2.fa in=reads.fq basename=out_%.fq outu=unmapped.fq
Each organism needs to be represented by a single file (using cat, as Genomax mentioned).
Aligners have limits to the difference between a read and a reference for successful aligning. The higher the identity of the alignment, the more likely it is to be correct; so, aligners generally focus on alignments with 90% similarity or higher. You can adjust this in BBMap using the "idfilter" flag. There is no real concept of 0% similarity; even a random sequence will align to the mouse genome with at least 25% identity or so. "map" just means "The aligner thinks it came from this location", so it varies by aligner. Bowtie rejects any alignments with any indels or more than 3 mismatches.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by Nastya View PostHello,
I'm trying to use BOWTIE to find less aligned sequences to the mouse genome, and even better sequence that has 0 % matching.
Is it possible via Bowtie?
More over, I have the following read 'GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA' which matches 100% to chr12 in mm10 (mouse) genome. For this match (see below the results) I got maximum score.
I don't understand why if I change 3 bases randomly in the middle, or add 4 random bases at 3'/5' it does not align the read anymore, and don't display score overthought there is still big similarity. Don't understand why it misses a short alignments (10-30 bases) in a 100 bases read for example.
The command I used was:
CL:"C:\bowtie2\bowtie2-align-s.exe --wrapper basic-0 --local -N 1 -L 2 --gbar 100 --ma 2 --mp 0,0 --score-min L,0,0 -x D:/Augmanity/index/mm10 -f C:/bowtie2/reads/100LengthReads.fa --passthrough"
Attached the results:
Mouse-exact_seq 0 chr12 56691388 44 42M * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:84 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:42 YT:Z:UU
Mouse-3_add_bases 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCATTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
Mouse-4-mismatches 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAAAAAAGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
I will appreciate any help. Thank you.
Anastasia
Leave a comment:
-
Originally posted by Nastya View PostHi,
Thank you!
I'm quite new with these programs, so have many basic questions
If I understood correctly, the reads that won't aligned to the mouse genome should be found in clean.fq file.
Originally posted by Nastya View Post
* What is considered as a "map" read? only when it matches exactly to the reference?
Originally posted by Nastya View Post* Is there an example with the outputs files I can test to be sure that I use it correctly?
Originally posted by Nastya View Post* I tried to run the following command: bbmap.sh ref=lambda_virus.fa
and didn't see that the index was created.
Originally posted by Nastya View Post* How do I index a reference that is build from a several fa. files (for ex. the entire mouse genome)?
Thank you !
Leave a comment:
-
bbMap - Getting started
Hi,
Thank you!
I'm quite new with these programs, so have many basic questions
If I understood correctly, the reads that won't aligned to the mouse genome should be found in clean.fq file.
* What is considered as a "map" read? only when it matches exactly to the reference?
* Is there an example with the outputs files I can test to be sure that I use it correctly?
* I tried to run the following command: bbmap.sh ref=lambda_virus.fa
and didn't see that the index was created.
* How do I index a reference that is build from a several fa. files (for ex. the entire mouse genome)?
Thank you !
Leave a comment:
-
If you need a tool to separate sequences that are NOT aligning to the mouse genome then look at BBSplit.sh as an option: http://seqanswers.com/forums/showthread.php?t=41288
Leave a comment:
-
Bowtie - seaching for less aligned reads
Hello,
I'm trying to use BOWTIE to find less aligned sequences to the mouse genome, and even better sequence that has 0 % matching.
Is it possible via Bowtie?
More over, I have the following read 'GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA' which matches 100% to chr12 in mm10 (mouse) genome. For this match (see below the results) I got maximum score.
I don't understand why if I change 3 bases randomly in the middle, or add 4 random bases at 3'/5' it does not align the read anymore, and don't display score overthought there is still big similarity. Don't understand why it misses a short alignments (10-30 bases) in a 100 bases read for example.
The command I used was:
CL:"C:\bowtie2\bowtie2-align-s.exe --wrapper basic-0 --local -N 1 -L 2 --gbar 100 --ma 2 --mp 0,0 --score-min L,0,0 -x D:/Augmanity/index/mm10 -f C:/bowtie2/reads/100LengthReads.fa --passthrough"
Attached the results:
Mouse-exact_seq 0 chr12 56691388 44 42M * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:84 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:42 YT:Z:UU
Mouse-3_add_bases 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCATTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
Mouse-4-mismatches 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAAAAAAGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
I will appreciate any help. Thank you.
AnastasiaLast edited by Nastya; 08-30-2015, 11:47 PM.
Latest Articles
Collapse
-
by seqadmin
While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...-
Channel: Articles
Today, 07:15 AM -
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 08:18 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Today, 08:18 AM
|
||
Started by seqadmin, Today, 08:04 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 08:04 AM
|
||
Started by seqadmin, 06-03-2024, 06:55 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
06-03-2024, 06:55 AM
|
||
Started by seqadmin, 05-30-2024, 03:16 PM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
05-30-2024, 03:16 PM
|
Leave a comment: