Hi everyone,
I am using Maq for alignments and have found the 3' adapter trimming to be very informative about my overall run/sample prep quality. However, I am not clear about how this actually is working and have a couple questions...
For instance, I have in one lane 13,145,392 quality filtered reads. Using the adapter trimming option I get 10,270,661 possible reads with adapter contamination, and a total of 2,949,120 paired reads mapping (3,404,692 total mapped). So this mapping number is greater than the number of reads NOT containing adapters 13,145,392-10,270,661 = 2,874,731.
So, does Maq simply trim off any adapter and continue with alignment if the read is of sufficient length? Am I reading this correctly?
Next, I have aligned the same exact reads to 3 different regions and for each I get 3 different counts for possible adapter contamination with everything else being equal. For the lane I mentioned above, I get a) 10,270,661 b) 10,473,317 and c) 10,299,171 counts for adapters but again these are the exact same reads, just a different region for alignment -- not huge differences, but differences nonetheless
I whipped together a super simple perl script to count 3' adapters in my FASTQs and get nowhere near the same number..
Again, super simple, but with this for the aforementioned lane I get 6,615,038 reads containing adapter....
Does anybody have some insight to any of these issues? Thanks everyone!
I am using Maq for alignments and have found the 3' adapter trimming to be very informative about my overall run/sample prep quality. However, I am not clear about how this actually is working and have a couple questions...
For instance, I have in one lane 13,145,392 quality filtered reads. Using the adapter trimming option I get 10,270,661 possible reads with adapter contamination, and a total of 2,949,120 paired reads mapping (3,404,692 total mapped). So this mapping number is greater than the number of reads NOT containing adapters 13,145,392-10,270,661 = 2,874,731.
So, does Maq simply trim off any adapter and continue with alignment if the read is of sufficient length? Am I reading this correctly?
Next, I have aligned the same exact reads to 3 different regions and for each I get 3 different counts for possible adapter contamination with everything else being equal. For the lane I mentioned above, I get a) 10,270,661 b) 10,473,317 and c) 10,299,171 counts for adapters but again these are the exact same reads, just a different region for alignment -- not huge differences, but differences nonetheless
I whipped together a super simple perl script to count 3' adapters in my FASTQs and get nowhere near the same number..
Code:
$adapt = GATCGGAA; $count = 0; while (<>) { $line = $_; chomp $line; if ($line =~ m/^$adapt/) { $count++; } } print "\nThere are $count sequences with adapter!\n\n";
Does anybody have some insight to any of these issues? Thanks everyone!