Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    A couple of points:
    (1) Transposases commonly have target site preferences. Already said, but apparently needs to be repeated. There is nothing surprising about a transposase retaining those site preferences as it inserts into the DNA of a variety of different species. DNA is DNA, right?
    (2) I think this preference makes it non-ideal for the construction of genomic shotgun libraries. But, let's not exaggerate the situation. The deflections from perfect randomness look to be in the 10-20% range. Most assemblers probably work better with less biased end points. But there are lots of fluctuations from the non-ideal in our data sets. You assess the pros and cons and move on.

    --
    Phillip
    Last edited by pmiguel; 05-05-2014, 04:33 AM.

    Comment


    • #17
      Originally posted by roliwilhelm View Post
      Hello All,

      I summarized all of the information in a blog post.

      Thanks!
      By the way, the image from your blog:


      shows an increase in A composition towards the end of your reads. I think this usually means that there are a high frequency of very short amplicons reads in your data set. That is, many of them have read through the insert, the right adapter and into the polyA (or polyT, depending on your strand of reference) attachment of the flow cell oligos to the surface of the flowcell.

      Did you run FastQC on the clipped reads? If so, my guess is that your clipper is missing lots of adapters.

      By the way, one factor that makes the default settings for FastQC a poor choice for this sort of analysis are the unequal bin widths it uses. Yeah, I know it isn't convenient to scroll right really far in your browser to see the whole image, but given the distortion it causes I prefer to have to do that.

      --
      Phillip

      Comment


      • #18
        @kmcarr: That paper was very useful; thanks for sharing it. It is also the same paper the Illumina representative referenced. It enabled me to match some of the recurring sequences in the first 14bp of my reads to the Tn5 recognition site they cite.

        I also realized that the proportion of reads with this bias is quite small (0.3%), though initially I thought it was far greater of an effect. This misconception was due to a miscalculation on my part. I summed the "counts" column for the top 7 overrepresented k-mer in the FastQC report and divided by the totoal number of sequences in my library and came up with > 95% of reads containing "over-represented" sequences. In reality, the "counts" column is the total observed frequency, not the number of occurrences at the start of the read, so this was a vast overestimate.

        Thank you all for your thoughtful responses.

        Comment


        • #19
          Kmers in mid part of sequence

          Is there an explanation for Kmers in the mid part of sequence?
          The capture is Nextera whole exome, sequenced in Illumina Hiseq pairend 100bp.
          The Kmers persist after Trimmomatic. The quality of the data from fastqc after the trimming is better. Such appearance occurs in multiple samples. I have asked Illumina 2 weeks ago but still pending answers.

          Thanks
          Attached Files

          Comment


          • #20
            Hi,

            we are seeing a similar issue using the Agilent QXT kit, on captured and whole genome experiments. This kit also uses transposases.

            HTH

            Dave

            Comment


            • #21
              The capture is Nextera whole exome, sequenced in Illumina Hiseq pairend 100bp.
              I wonder if reads with over-represented Kmers map to genome or target exons.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Genetic Variation in Immunogenetics and Antibody Diversity
                by seqadmin



                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                11-06-2024, 07:24 PM
              • seqadmin
                Choosing Between NGS and qPCR
                by seqadmin



                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                10-18-2024, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 11-08-2024, 11:09 AM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-08-2024, 06:13 AM
              0 responses
              38 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-01-2024, 06:09 AM
              0 responses
              35 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-30-2024, 05:31 AM
              0 responses
              23 views
              0 likes
              Last Post seqadmin  
              Working...
              X