Seqanswers Leaderboard Ad

**pmiguel** · 05-05-2014, 04:09 AM

A couple of points:
(1) Transposases commonly have target site preferences. Already said, but apparently needs to be repeated. There is nothing surprising about a transposase retaining those site preferences as it inserts into the DNA of a variety of different species. DNA is DNA, right?
(2) I think this preference makes it non-ideal for the construction of genomic shotgun libraries. But, let's not exaggerate the situation. The deflections from perfect randomness look to be in the 10-20% range. Most assemblers probably work better with less biased end points. But there are lots of fluctuations from the non-ideal in our data sets. You assess the pros and cons and move on.

--
Phillip

**pmiguel** · 05-05-2014, 04:32 AM

Originally posted by roliwilhelm View Post

Hello All,

I summarized all of the information in a blog post.

Thanks!

By the way, the image from your blog:

shows an increase in A composition towards the end of your reads. I think this usually means that there are a high frequency of very short amplicons reads in your data set. That is, many of them have read through the insert, the right adapter and into the polyA (or polyT, depending on your strand of reference) attachment of the flow cell oligos to the surface of the flowcell.

Did you run FastQC on the clipped reads? If so, my guess is that your clipper is missing lots of adapters.

By the way, one factor that makes the default settings for FastQC a poor choice for this sort of analysis are the unequal bin widths it uses. Yeah, I know it isn't convenient to scroll right really far in your browser to see the whole image, but given the distortion it causes I prefer to have to do that.

--
Phillip

**roliwilhelm** · 05-09-2014, 10:33 AM

@kmcarr: That paper was very useful; thanks for sharing it. It is also the same paper the Illumina representative referenced. It enabled me to match some of the recurring sequences in the first 14bp of my reads to the Tn5 recognition site they cite.

I also realized that the proportion of reads with this bias is quite small (0.3%), though initially I thought it was far greater of an effect. This misconception was due to a miscalculation on my part. I summed the "counts" column for the top 7 overrepresented k-mer in the FastQC report and divided by the totoal number of sequences in my library and came up with > 95% of reads containing "over-represented" sequences. In reality, the "counts" column is the total observed frequency, not the number of occurrences at the start of the read, so this was a vast overestimate.

Thank you all for your thoughtful responses.

**Kmok** · 07-22-2014, 10:50 AM

Kmers in mid part of sequence

Is there an explanation for Kmers in the mid part of sequence?
The capture is Nextera whole exome, sequenced in Illumina Hiseq pairend 100bp.
The Kmers persist after Trimmomatic. The quality of the data from fastqc after the trimming is better. Such appearance occurs in multiple samples. I have asked Illumina 2 weeks ago but still pending answers.

Thanks

Attached Files

**dnusol** · 07-23-2014, 12:43 AM

Hi,

we are seeing a similar issue using the Agilent QXT kit, on captured and whole genome experiments. This kit also uses transposases.

HTH

Dave

**nucacidhunter** · 07-23-2014, 01:23 AM

The capture is Nextera whole exome, sequenced in Illumina Hiseq pairend 100bp.

I wonder if reads with over-represented Kmers map to genome or target exons.

Topics	Statistics	Last Post
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, Yesterday, 06:55 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 06:55 AM
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 24 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability by seqadmin Started by seqadmin, 05-29-2024, 01:32 PM	0 responses 28 views 0 likes	Last Post by seqadmin 05-29-2024, 01:32 PM
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 215 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News