Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • aligned seq > 1 times and skipping reads because of seed mismatches

    Hi
    1-
    I have a higher rate than 50% of aligned seq > 1 times with bowtie2 with my data set. Is this fine or should I do any thing to avoid? I get the high rate whether I allow 1 or 2 mismatches and the rest of parameters are by default. as the reads are in raw format, I used -r with bowtie

    781 reads; of these:
    781 (100.00%) were unpaired; of these:
    221 (28.30%) aligned 0 times
    80 (10.24%) aligned exactly 1 time
    480 (61.46%) aligned >1 times
    71.70% overall alignment rate

    reads format
    TTAAGTTATTAAGGGCGCACG
    AGATCGGAAGAGCGGTTCAG
    TTAAGTTATTAAGGGCGCAC
    TTAAGTTATTAAGGGCGCAC
    GATTGTAGATGCCACGCAAA

    2- The above data set is a sample (780 reads) of my complete data set. When I use bowtie with the whole data set, I get several warnings as follows: Where does the problem come from and what is the solution? I have checked some of these reads for which I get warnings and they have the same length as others (see reads format above). I get the same warnings if I use -N1 or -N2.

    Warning: skipping read '10149870' because length (1) <= # seed mismatches (1)
    Warning: skipping read '10149870' because it was < 2 characters long

    Look forward to your reply,

    Carol

  • #2
    It looks like the reads are just incorrectly formatted. It's seeing some of them as being only 1 or 2 bases, which would seem unlikely.

    Comment


    • #3
      Originally posted by dpryan View Post
      It looks like the reads are just incorrectly formatted. It's seeing some of them as being only 1 or 2 bases, which would seem unlikely.
      Thanks for your reply.

      How do you see if some them are only 1 or 2 bases?

      Are they not in a raw seq letter formats and if not, how should they be as raw seq format accepted by bowtie with -r parameter?

      Cheers,

      Comment


      • #4
        They examples you posted appear correct, but perhaps it's complaining about a line that you didn't post. You could just use

        Code:
         awk '{if(length($0) <= 2) print NR, $0}' file_with_reads
        to see if there actually are lines with 1 or 2 bases. I'm guessing that the "aligned > 1 times" issue is due to the reads being so short (21 bases is really short). Perhaps blast a couple to confirm this.

        Comment


        • #5
          you were right!

          Which min length of reads should I consider and discard the rest?

          Comment


          • #6
            I don't think bowtie will handle anything <12, though I wouldn't normally bother with anything <20, since it's unlikely to map uniquely.

            BTW, are you the same Carol that I just replied to on the samtools email list?

            Comment


            • #7
              Now I consider min of 20b long and don't get warnings any more but the number of >1 aligned reads is >50%. Do they correspond to ambigous or repeated regions or genes or can I do any thing to reduce this rate?

              How to recognize the ambigous from unambigous mapped reads in the sam file generated from bowtie? ambigous means repeated regions or genes.

              3165309 (31.09%) aligned 0 times
              1095004 (10.76%) aligned exactly 1 time
              5920439 (58.15%) aligned >1 times

              Comment


              • #8
                There's generally nothing that you can do to reduce the rate of ambiguously mapping reads, they're likely just ambiguous. You might just take a look in IGV or some other browser and see where some of these align. That'll be more informative than speculating.

                Comment


                • #9
                  If they have to be ambigous, that's fine. I just need to separate the ambigous from unambigous to generate different outputs. As I have many reads, I need an information in the sam file based on which I could separate the 2 different types of reads? Is there any such info that I can find in the sam file generated from bowtie? Should I have used a specific bowtie parameter to include this info in the sam file?

                  Comment


                  • #10
                    Have a look at the MAPQ scores. If this is bowtie1, then I think it used 255 for unique (though I haven't used it in long enough that I don't remember anymore). If this is bowtie2, then just filter by some meaningful MAPQ threshold (10 is likely reasonable, but anything >1 should work).

                    Comment


                    • #11
                      I use bowtie 2.

                      Should the MAPQ value should be strictly > 1?

                      I counted with awk the number of reads based of the cutoff 1
                      awk '{print $5}' myfile.sam | awk '{if ($1 <1) print $1}' | wc -l

                      and got

                      > 1
                      number of reads 1441692
                      >= 1
                      number of reads 6878007
                      < 1
                      number of reads 3302748

                      but these values don't match the stats generated by bowtie2 (see below). Any cutoff for MAPQ should give the numbers below for unambigous aligned exactly 1 time and ambigous aligned > 1 time.

                      1095004 (10.76%) aligned exactly 1 time
                      5920439 (58.15%) aligned >1 times

                      Comment


                      • #12
                        The MAPQ values don't directly correspond to what the summary describes as ambiguously mapped (the MAPQ value is more reliable). I don't recall exactly how the summary information is determined, one would have to go through the code to check (it's not documented anywhere).

                        Comment


                        • #13
                          Should I use --no-1mm-upfront parameter with bowtie2 to allow exactly 1 vs 2 mismatches? If so how to use it?

                          Does anyone know if I use 1 as cutoff for MAPQ to discriminate the exactly 1 time aligned vs >1 time aligned reads?

                          Look forward to your reply,

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Choosing Between NGS and qPCR
                            by seqadmin



                            Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                            10-18-2024, 07:11 AM
                          • seqadmin
                            Non-Coding RNA Research and Technologies
                            by seqadmin




                            Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                            Nobel Prize for MicroRNA Discovery
                            This week,...
                            10-07-2024, 08:07 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 11-01-2024, 06:09 AM
                          0 responses
                          15 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-30-2024, 05:31 AM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-24-2024, 06:58 AM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-23-2024, 08:43 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X