Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • aligned seq > 1 times and skipping reads because of seed mismatches

    Hi
    1-
    I have a higher rate than 50% of aligned seq > 1 times with bowtie2 with my data set. Is this fine or should I do any thing to avoid? I get the high rate whether I allow 1 or 2 mismatches and the rest of parameters are by default. as the reads are in raw format, I used -r with bowtie

    781 reads; of these:
    781 (100.00%) were unpaired; of these:
    221 (28.30%) aligned 0 times
    80 (10.24%) aligned exactly 1 time
    480 (61.46%) aligned >1 times
    71.70% overall alignment rate

    reads format
    TTAAGTTATTAAGGGCGCACG
    AGATCGGAAGAGCGGTTCAG
    TTAAGTTATTAAGGGCGCAC
    TTAAGTTATTAAGGGCGCAC
    GATTGTAGATGCCACGCAAA

    2- The above data set is a sample (780 reads) of my complete data set. When I use bowtie with the whole data set, I get several warnings as follows: Where does the problem come from and what is the solution? I have checked some of these reads for which I get warnings and they have the same length as others (see reads format above). I get the same warnings if I use -N1 or -N2.

    Warning: skipping read '10149870' because length (1) <= # seed mismatches (1)
    Warning: skipping read '10149870' because it was < 2 characters long

    Look forward to your reply,

    Carol

  • #2
    It looks like the reads are just incorrectly formatted. It's seeing some of them as being only 1 or 2 bases, which would seem unlikely.

    Comment


    • #3
      Originally posted by dpryan View Post
      It looks like the reads are just incorrectly formatted. It's seeing some of them as being only 1 or 2 bases, which would seem unlikely.
      Thanks for your reply.

      How do you see if some them are only 1 or 2 bases?

      Are they not in a raw seq letter formats and if not, how should they be as raw seq format accepted by bowtie with -r parameter?

      Cheers,

      Comment


      • #4
        They examples you posted appear correct, but perhaps it's complaining about a line that you didn't post. You could just use

        Code:
         awk '{if(length($0) <= 2) print NR, $0}' file_with_reads
        to see if there actually are lines with 1 or 2 bases. I'm guessing that the "aligned > 1 times" issue is due to the reads being so short (21 bases is really short). Perhaps blast a couple to confirm this.

        Comment


        • #5
          you were right!

          Which min length of reads should I consider and discard the rest?

          Comment


          • #6
            I don't think bowtie will handle anything <12, though I wouldn't normally bother with anything <20, since it's unlikely to map uniquely.

            BTW, are you the same Carol that I just replied to on the samtools email list?

            Comment


            • #7
              Now I consider min of 20b long and don't get warnings any more but the number of >1 aligned reads is >50%. Do they correspond to ambigous or repeated regions or genes or can I do any thing to reduce this rate?

              How to recognize the ambigous from unambigous mapped reads in the sam file generated from bowtie? ambigous means repeated regions or genes.

              3165309 (31.09%) aligned 0 times
              1095004 (10.76%) aligned exactly 1 time
              5920439 (58.15%) aligned >1 times

              Comment


              • #8
                There's generally nothing that you can do to reduce the rate of ambiguously mapping reads, they're likely just ambiguous. You might just take a look in IGV or some other browser and see where some of these align. That'll be more informative than speculating.

                Comment


                • #9
                  If they have to be ambigous, that's fine. I just need to separate the ambigous from unambigous to generate different outputs. As I have many reads, I need an information in the sam file based on which I could separate the 2 different types of reads? Is there any such info that I can find in the sam file generated from bowtie? Should I have used a specific bowtie parameter to include this info in the sam file?

                  Comment


                  • #10
                    Have a look at the MAPQ scores. If this is bowtie1, then I think it used 255 for unique (though I haven't used it in long enough that I don't remember anymore). If this is bowtie2, then just filter by some meaningful MAPQ threshold (10 is likely reasonable, but anything >1 should work).

                    Comment


                    • #11
                      I use bowtie 2.

                      Should the MAPQ value should be strictly > 1?

                      I counted with awk the number of reads based of the cutoff 1
                      awk '{print $5}' myfile.sam | awk '{if ($1 <1) print $1}' | wc -l

                      and got

                      > 1
                      number of reads 1441692
                      >= 1
                      number of reads 6878007
                      < 1
                      number of reads 3302748

                      but these values don't match the stats generated by bowtie2 (see below). Any cutoff for MAPQ should give the numbers below for unambigous aligned exactly 1 time and ambigous aligned > 1 time.

                      1095004 (10.76%) aligned exactly 1 time
                      5920439 (58.15%) aligned >1 times

                      Comment


                      • #12
                        The MAPQ values don't directly correspond to what the summary describes as ambiguously mapped (the MAPQ value is more reliable). I don't recall exactly how the summary information is determined, one would have to go through the code to check (it's not documented anywhere).

                        Comment


                        • #13
                          Should I use --no-1mm-upfront parameter with bowtie2 to allow exactly 1 vs 2 mismatches? If so how to use it?

                          Does anyone know if I use 1 as cutoff for MAPQ to discriminate the exactly 1 time aligned vs >1 time aligned reads?

                          Look forward to your reply,

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Recent Advances in Sequencing Technologies
                            by seqadmin







                            Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                            Long-Read Sequencing
                            Long-read sequencing has...
                            12-02-2024, 01:49 PM
                          • seqadmin
                            Genetic Variation in Immunogenetics and Antibody Diversity
                            by seqadmin



                            The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                            11-06-2024, 07:24 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 12-02-2024, 09:29 AM
                          0 responses
                          123 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-02-2024, 09:06 AM
                          0 responses
                          47 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-02-2024, 08:03 AM
                          0 responses
                          38 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 11-22-2024, 07:36 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X