Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • nucacidhunter
    replied
    The capture is Nextera whole exome, sequenced in Illumina Hiseq pairend 100bp.
    I wonder if reads with over-represented Kmers map to genome or target exons.

    Leave a comment:


  • dnusol
    replied
    Hi,

    we are seeing a similar issue using the Agilent QXT kit, on captured and whole genome experiments. This kit also uses transposases.

    HTH

    Dave

    Leave a comment:


  • Kmok
    replied
    Kmers in mid part of sequence

    Is there an explanation for Kmers in the mid part of sequence?
    The capture is Nextera whole exome, sequenced in Illumina Hiseq pairend 100bp.
    The Kmers persist after Trimmomatic. The quality of the data from fastqc after the trimming is better. Such appearance occurs in multiple samples. I have asked Illumina 2 weeks ago but still pending answers.

    Thanks
    Attached Files

    Leave a comment:


  • roliwilhelm
    replied
    @kmcarr: That paper was very useful; thanks for sharing it. It is also the same paper the Illumina representative referenced. It enabled me to match some of the recurring sequences in the first 14bp of my reads to the Tn5 recognition site they cite.

    I also realized that the proportion of reads with this bias is quite small (0.3%), though initially I thought it was far greater of an effect. This misconception was due to a miscalculation on my part. I summed the "counts" column for the top 7 overrepresented k-mer in the FastQC report and divided by the totoal number of sequences in my library and came up with > 95% of reads containing "over-represented" sequences. In reality, the "counts" column is the total observed frequency, not the number of occurrences at the start of the read, so this was a vast overestimate.

    Thank you all for your thoughtful responses.

    Leave a comment:


  • pmiguel
    replied
    Originally posted by roliwilhelm View Post
    Hello All,

    I summarized all of the information in a blog post.

    Thanks!
    By the way, the image from your blog:


    shows an increase in A composition towards the end of your reads. I think this usually means that there are a high frequency of very short amplicons reads in your data set. That is, many of them have read through the insert, the right adapter and into the polyA (or polyT, depending on your strand of reference) attachment of the flow cell oligos to the surface of the flowcell.

    Did you run FastQC on the clipped reads? If so, my guess is that your clipper is missing lots of adapters.

    By the way, one factor that makes the default settings for FastQC a poor choice for this sort of analysis are the unequal bin widths it uses. Yeah, I know it isn't convenient to scroll right really far in your browser to see the whole image, but given the distortion it causes I prefer to have to do that.

    --
    Phillip

    Leave a comment:


  • pmiguel
    replied
    A couple of points:
    (1) Transposases commonly have target site preferences. Already said, but apparently needs to be repeated. There is nothing surprising about a transposase retaining those site preferences as it inserts into the DNA of a variety of different species. DNA is DNA, right?
    (2) I think this preference makes it non-ideal for the construction of genomic shotgun libraries. But, let's not exaggerate the situation. The deflections from perfect randomness look to be in the 10-20% range. Most assemblers probably work better with less biased end points. But there are lots of fluctuations from the non-ideal in our data sets. You assess the pros and cons and move on.

    --
    Phillip
    Last edited by pmiguel; 05-05-2014, 04:33 AM.

    Leave a comment:


  • nucacidhunter
    replied
    I would like to make a distinction in 5’ bias observed in TruSeq RNA libraries and transposon based Nextera. During first strand synthesis, random hexamers with higher GC content are more likely to pair with their complementary bases for long enough to prime cDNA synthesis and therefore there is tendency toward higher GC in 5’ six nucleotides. I have seen this trend in EpiGnome kit used for of library prep from bisulfite converted DNA which uses random hexamers to prime complementary strand synthesis. Mapping reads from non-converted library reads prepared with that kit also reveals more mismatches at initial 1-4 nucleotides indicating that full complementarity along template is not required for progression of synthesis and two 3’ end nucleotide of hexamers provides enough contact for polymerase activity.

    Tn5 transposase and by extension Nextera transposase uses a cut and paste mechanism to integrate its recognition sequence into DNA. During transposition a 9 base single stranded gaps is left in the fragments which results in duplication of termini. This gap is filled during initial 3 min incubation at 72°C before PCR cycling. If all the fragments in a library are sequenced by saturation (deeper sequencing or limited template use), duplicated region could be recognised and I think that Molecula uses this to stich back short read fragments to form longer synthetic reads. The unbalanced 5’ region observed in FASTQ graphs extends 9 bases in Nextera library reads and end duplication in combination with insertion site bias, might explain this observation.

    Leave a comment:


  • kmcarr
    replied
    Have you had a look at this paper "Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition", Adey et al. Genome Biology 2010, 11:R119? I would draw your attention to Supplementary Figure 1. The authors show a consistent base composition bias in the region surrounding the transposition site. This composition is found in both E. coli and H. sapiens gDNA. Despite the bias in locations of transposase activity the authors did not detect any bias in genome coverage in E. coli, H. sapiens or D. melanogaster compared to physical fragmentation (sonication) or endonuclease cleavage.

    I don't really follow your argument that consistency of the base composition suggests that the effect is not due to the transposase. Such may be true in the case of the other fragmentation methods (and the authors of the above paper suggest this) as they include post fragmentation steps such as end repair and A-tailing which may introduce their own biases. The Nextera protocol includes only a PCR amplification, which primes off the inserted transposon, post fragmentation. An argument could be made that the PCR amplification of the fragmented DNA could contribute to a composition bias downstream of the fragmentation site but can not explain the composition bias upstream of the site as that chunk of DNA is long gone by the time PCR happens.

    Leave a comment:


  • roliwilhelm
    replied
    Thanks for your comment GenoMax, I would give you a penny if we had any left up here in Canada.

    Perhaps I wasn't completely clear, but I'm not using multiple displacement amplification of my DNA, nor do I believe that there are any random hexamer priming steps in the Nextera library prep that I used. The information you linked to is related to those forms of sequencing prep.

    But, I am in doubt about my understanding of the Nextera process, especially since the repeats appear to be random hexamers!

    (Also: I couldn't find any examples of this on the FastQC help page, even though there was some suggestion there would be)

    Leave a comment:


  • GenoMax
    replied
    Originally posted by roliwilhelm View Post
    Obviously these answers aren't completely relevant to the technical concerns of processing the data for assembly, but I would like to know more.
    See posts #261 and 263: http://seqanswers.com/forums/showthr...t=4846&page=14

    Leave a comment:


  • roliwilhelm
    replied
    I didn't think that the Nextera kits used random hexamers for amplification? I assumed that the tagmentation step inserted the sequence needed for annealing. Am I incorrect? Here's the best description of the process I could find.

    You do make a good point, since all of the recurring sequences are hexamers.

    Still, how would the hexamers which are initiating strand amplification end up included in the read during extension? Why would that occur more frequently and predictably at the start of the read?

    Obviously these answers aren't completely relevant to the technical concerns of processing the data for assembly, but I would like to know more.
    Last edited by roliwilhelm; 05-02-2014, 11:36 PM.

    Leave a comment:


  • dpryan
    replied
    Yeah, the random hexamer priming effect is almost always identical, regardless of who makes the library. This is unsurprising since the library prep. components are identical.

    Leave a comment:


  • roliwilhelm
    replied
    New Evidence of Strangeness re: a consistent k-mer bias for various Nextera preps

    Hello All,

    Well, I've actively pursued a similar question as the initial post and have found a variety of perspectives on the matter, but none really do the problem justice. It appears to be a far reaching phenomenon that appears across a variety of samples from a variety of users. I was able to find four different postings on the subject and EVERY single FastQC graph they show has an identical, or near identical patterning. I summarized all of the information in a blog post. I will be forwarding it to Illumina for their response. BUT, please comment if you think I'm missing something obvious. In short, I find the pattern too consistent for just transposon bias. I would expect there to be more variability in such an affect, one that would be less prominent in four out of four cases publicly reported.

    Thanks!
    Last edited by roliwilhelm; 05-02-2014, 07:10 PM.

    Leave a comment:


  • pbluescript
    replied
    Originally posted by mxr1895 View Post
    Hi, what were you using your reads for?
    I have the same issue with 80 multiplexed Nextera libraries run on a HiSeq. Their QC graphs all look the same for the first 13bp.
    I'm wondering if I should just trim them?
    I wouldn't bother trimming them. You could always take a sample of your reads and map them trimmed and untrimmed to see which works better. Whenever I did this, I never saw big differences.

    Leave a comment:


  • mxr1895
    replied
    Originally posted by pbluescript View Post
    I have seen Nextera libraries show a very similar bias. My guess is that this is just an artifact of the library prep. In the past, I would trim off these regions before mapping, but then I found that it didn't make a big difference, so I just left them there.
    Hi, what were you using your reads for?
    I have the same issue with 80 multiplexed Nextera libraries run on a HiSeq. Their QC graphs all look the same for the first 13bp.
    I'm wondering if I should just trim them?
    Attached Files

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 07:41 AM
0 responses
6 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-11-2024, 07:45 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-10-2024, 07:59 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-09-2024, 08:22 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Working...
X