Should I use --no-1mm-upfront parameter with bowtie2 to allow exactly 1 vs 2 mismatches? If so how to use it?
Does anyone know if I use 1 as cutoff for MAPQ to discriminate the exactly 1 time aligned vs >1 time aligned reads?
Look forward to your reply,
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
The MAPQ values don't directly correspond to what the summary describes as ambiguously mapped (the MAPQ value is more reliable). I don't recall exactly how the summary information is determined, one would have to go through the code to check (it's not documented anywhere).
Leave a comment:
-
I use bowtie 2.
Should the MAPQ value should be strictly > 1?
I counted with awk the number of reads based of the cutoff 1
awk '{print $5}' myfile.sam | awk '{if ($1 <1) print $1}' | wc -l
and got
> 1
number of reads 1441692
>= 1
number of reads 6878007
< 1
number of reads 3302748
but these values don't match the stats generated by bowtie2 (see below). Any cutoff for MAPQ should give the numbers below for unambigous aligned exactly 1 time and ambigous aligned > 1 time.
1095004 (10.76%) aligned exactly 1 time
5920439 (58.15%) aligned >1 times
Leave a comment:
-
Have a look at the MAPQ scores. If this is bowtie1, then I think it used 255 for unique (though I haven't used it in long enough that I don't remember anymore). If this is bowtie2, then just filter by some meaningful MAPQ threshold (10 is likely reasonable, but anything >1 should work).
Leave a comment:
-
If they have to be ambigous, that's fine. I just need to separate the ambigous from unambigous to generate different outputs. As I have many reads, I need an information in the sam file based on which I could separate the 2 different types of reads? Is there any such info that I can find in the sam file generated from bowtie? Should I have used a specific bowtie parameter to include this info in the sam file?
Leave a comment:
-
There's generally nothing that you can do to reduce the rate of ambiguously mapping reads, they're likely just ambiguous. You might just take a look in IGV or some other browser and see where some of these align. That'll be more informative than speculating.
Leave a comment:
-
Now I consider min of 20b long and don't get warnings any more but the number of >1 aligned reads is >50%. Do they correspond to ambigous or repeated regions or genes or can I do any thing to reduce this rate?
How to recognize the ambigous from unambigous mapped reads in the sam file generated from bowtie? ambigous means repeated regions or genes.
3165309 (31.09%) aligned 0 times
1095004 (10.76%) aligned exactly 1 time
5920439 (58.15%) aligned >1 times
Leave a comment:
-
I don't think bowtie will handle anything <12, though I wouldn't normally bother with anything <20, since it's unlikely to map uniquely.
BTW, are you the same Carol that I just replied to on the samtools email list?
Leave a comment:
-
you were right!
Which min length of reads should I consider and discard the rest?
Leave a comment:
-
They examples you posted appear correct, but perhaps it's complaining about a line that you didn't post. You could just use
Code:awk '{if(length($0) <= 2) print NR, $0}' file_with_reads
Leave a comment:
-
Originally posted by dpryan View PostIt looks like the reads are just incorrectly formatted. It's seeing some of them as being only 1 or 2 bases, which would seem unlikely.
How do you see if some them are only 1 or 2 bases?
Are they not in a raw seq letter formats and if not, how should they be as raw seq format accepted by bowtie with -r parameter?
Cheers,
Leave a comment:
-
It looks like the reads are just incorrectly formatted. It's seeing some of them as being only 1 or 2 bases, which would seem unlikely.
Leave a comment:
-
aligned seq > 1 times and skipping reads because of seed mismatches
Hi
1-
I have a higher rate than 50% of aligned seq > 1 times with bowtie2 with my data set. Is this fine or should I do any thing to avoid? I get the high rate whether I allow 1 or 2 mismatches and the rest of parameters are by default. as the reads are in raw format, I used -r with bowtie
781 reads; of these:
781 (100.00%) were unpaired; of these:
221 (28.30%) aligned 0 times
80 (10.24%) aligned exactly 1 time
480 (61.46%) aligned >1 times
71.70% overall alignment rate
reads format
TTAAGTTATTAAGGGCGCACG
AGATCGGAAGAGCGGTTCAG
TTAAGTTATTAAGGGCGCAC
TTAAGTTATTAAGGGCGCAC
GATTGTAGATGCCACGCAAA
2- The above data set is a sample (780 reads) of my complete data set. When I use bowtie with the whole data set, I get several warnings as follows: Where does the problem come from and what is the solution? I have checked some of these reads for which I get warnings and they have the same length as others (see reads format above). I get the same warnings if I use -N1 or -N2.
Warning: skipping read '10149870' because length (1) <= # seed mismatches (1)
Warning: skipping read '10149870' because it was < 2 characters long
Look forward to your reply,
CarolTags: None
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has...-
Channel: Articles
Today, 01:49 PM -
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben MartÃnez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 09:29 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
Today, 09:29 AM
|
||
Started by seqadmin, Today, 09:06 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Today, 09:06 AM
|
||
Started by seqadmin, Today, 08:03 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Today, 08:03 AM
|
||
Started by seqadmin, 11-22-2024, 07:36 AM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
11-22-2024, 07:36 AM
|
Leave a comment: