Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • carolW
    replied
    Should I use --no-1mm-upfront parameter with bowtie2 to allow exactly 1 vs 2 mismatches? If so how to use it?

    Does anyone know if I use 1 as cutoff for MAPQ to discriminate the exactly 1 time aligned vs >1 time aligned reads?

    Look forward to your reply,

    Leave a comment:


  • dpryan
    replied
    The MAPQ values don't directly correspond to what the summary describes as ambiguously mapped (the MAPQ value is more reliable). I don't recall exactly how the summary information is determined, one would have to go through the code to check (it's not documented anywhere).

    Leave a comment:


  • carolW
    replied
    I use bowtie 2.

    Should the MAPQ value should be strictly > 1?

    I counted with awk the number of reads based of the cutoff 1
    awk '{print $5}' myfile.sam | awk '{if ($1 <1) print $1}' | wc -l

    and got

    > 1
    number of reads 1441692
    >= 1
    number of reads 6878007
    < 1
    number of reads 3302748

    but these values don't match the stats generated by bowtie2 (see below). Any cutoff for MAPQ should give the numbers below for unambigous aligned exactly 1 time and ambigous aligned > 1 time.

    1095004 (10.76%) aligned exactly 1 time
    5920439 (58.15%) aligned >1 times

    Leave a comment:


  • dpryan
    replied
    Have a look at the MAPQ scores. If this is bowtie1, then I think it used 255 for unique (though I haven't used it in long enough that I don't remember anymore). If this is bowtie2, then just filter by some meaningful MAPQ threshold (10 is likely reasonable, but anything >1 should work).

    Leave a comment:


  • carolW
    replied
    If they have to be ambigous, that's fine. I just need to separate the ambigous from unambigous to generate different outputs. As I have many reads, I need an information in the sam file based on which I could separate the 2 different types of reads? Is there any such info that I can find in the sam file generated from bowtie? Should I have used a specific bowtie parameter to include this info in the sam file?

    Leave a comment:


  • dpryan
    replied
    There's generally nothing that you can do to reduce the rate of ambiguously mapping reads, they're likely just ambiguous. You might just take a look in IGV or some other browser and see where some of these align. That'll be more informative than speculating.

    Leave a comment:


  • carolW
    replied
    Now I consider min of 20b long and don't get warnings any more but the number of >1 aligned reads is >50%. Do they correspond to ambigous or repeated regions or genes or can I do any thing to reduce this rate?

    How to recognize the ambigous from unambigous mapped reads in the sam file generated from bowtie? ambigous means repeated regions or genes.

    3165309 (31.09%) aligned 0 times
    1095004 (10.76%) aligned exactly 1 time
    5920439 (58.15%) aligned >1 times

    Leave a comment:


  • dpryan
    replied
    I don't think bowtie will handle anything <12, though I wouldn't normally bother with anything <20, since it's unlikely to map uniquely.

    BTW, are you the same Carol that I just replied to on the samtools email list?

    Leave a comment:


  • carolW
    replied
    you were right!

    Which min length of reads should I consider and discard the rest?

    Leave a comment:


  • dpryan
    replied
    They examples you posted appear correct, but perhaps it's complaining about a line that you didn't post. You could just use

    Code:
     awk '{if(length($0) <= 2) print NR, $0}' file_with_reads
    to see if there actually are lines with 1 or 2 bases. I'm guessing that the "aligned > 1 times" issue is due to the reads being so short (21 bases is really short). Perhaps blast a couple to confirm this.

    Leave a comment:


  • carolW
    replied
    Originally posted by dpryan View Post
    It looks like the reads are just incorrectly formatted. It's seeing some of them as being only 1 or 2 bases, which would seem unlikely.
    Thanks for your reply.

    How do you see if some them are only 1 or 2 bases?

    Are they not in a raw seq letter formats and if not, how should they be as raw seq format accepted by bowtie with -r parameter?

    Cheers,

    Leave a comment:


  • dpryan
    replied
    It looks like the reads are just incorrectly formatted. It's seeing some of them as being only 1 or 2 bases, which would seem unlikely.

    Leave a comment:


  • aligned seq > 1 times and skipping reads because of seed mismatches

    Hi
    1-
    I have a higher rate than 50% of aligned seq > 1 times with bowtie2 with my data set. Is this fine or should I do any thing to avoid? I get the high rate whether I allow 1 or 2 mismatches and the rest of parameters are by default. as the reads are in raw format, I used -r with bowtie

    781 reads; of these:
    781 (100.00%) were unpaired; of these:
    221 (28.30%) aligned 0 times
    80 (10.24%) aligned exactly 1 time
    480 (61.46%) aligned >1 times
    71.70% overall alignment rate

    reads format
    TTAAGTTATTAAGGGCGCACG
    AGATCGGAAGAGCGGTTCAG
    TTAAGTTATTAAGGGCGCAC
    TTAAGTTATTAAGGGCGCAC
    GATTGTAGATGCCACGCAAA

    2- The above data set is a sample (780 reads) of my complete data set. When I use bowtie with the whole data set, I get several warnings as follows: Where does the problem come from and what is the solution? I have checked some of these reads for which I get warnings and they have the same length as others (see reads format above). I get the same warnings if I use -N1 or -N2.

    Warning: skipping read '10149870' because length (1) <= # seed mismatches (1)
    Warning: skipping read '10149870' because it was < 2 characters long

    Look forward to your reply,

    Carol

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin







    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has...
    Today, 01:49 PM
  • seqadmin
    Genetic Variation in Immunogenetics and Antibody Diversity
    by seqadmin



    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
    11-06-2024, 07:24 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 09:29 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, Today, 09:06 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, Today, 08:03 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-22-2024, 07:36 AM
0 responses
61 views
0 likes
Last Post seqadmin  
Working...
X