Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    A couple of points:
    (1) Transposases commonly have target site preferences. Already said, but apparently needs to be repeated. There is nothing surprising about a transposase retaining those site preferences as it inserts into the DNA of a variety of different species. DNA is DNA, right?
    (2) I think this preference makes it non-ideal for the construction of genomic shotgun libraries. But, let's not exaggerate the situation. The deflections from perfect randomness look to be in the 10-20% range. Most assemblers probably work better with less biased end points. But there are lots of fluctuations from the non-ideal in our data sets. You assess the pros and cons and move on.

    --
    Phillip
    Last edited by pmiguel; 05-05-2014, 04:33 AM.

    Comment


    • #17
      Originally posted by roliwilhelm View Post
      Hello All,

      I summarized all of the information in a blog post.

      Thanks!
      By the way, the image from your blog:


      shows an increase in A composition towards the end of your reads. I think this usually means that there are a high frequency of very short amplicons reads in your data set. That is, many of them have read through the insert, the right adapter and into the polyA (or polyT, depending on your strand of reference) attachment of the flow cell oligos to the surface of the flowcell.

      Did you run FastQC on the clipped reads? If so, my guess is that your clipper is missing lots of adapters.

      By the way, one factor that makes the default settings for FastQC a poor choice for this sort of analysis are the unequal bin widths it uses. Yeah, I know it isn't convenient to scroll right really far in your browser to see the whole image, but given the distortion it causes I prefer to have to do that.

      --
      Phillip

      Comment


      • #18
        @kmcarr: That paper was very useful; thanks for sharing it. It is also the same paper the Illumina representative referenced. It enabled me to match some of the recurring sequences in the first 14bp of my reads to the Tn5 recognition site they cite.

        I also realized that the proportion of reads with this bias is quite small (0.3%), though initially I thought it was far greater of an effect. This misconception was due to a miscalculation on my part. I summed the "counts" column for the top 7 overrepresented k-mer in the FastQC report and divided by the totoal number of sequences in my library and came up with > 95% of reads containing "over-represented" sequences. In reality, the "counts" column is the total observed frequency, not the number of occurrences at the start of the read, so this was a vast overestimate.

        Thank you all for your thoughtful responses.

        Comment


        • #19
          Kmers in mid part of sequence

          Is there an explanation for Kmers in the mid part of sequence?
          The capture is Nextera whole exome, sequenced in Illumina Hiseq pairend 100bp.
          The Kmers persist after Trimmomatic. The quality of the data from fastqc after the trimming is better. Such appearance occurs in multiple samples. I have asked Illumina 2 weeks ago but still pending answers.

          Thanks
          Attached Files

          Comment


          • #20
            Hi,

            we are seeing a similar issue using the Agilent QXT kit, on captured and whole genome experiments. This kit also uses transposases.

            HTH

            Dave

            Comment


            • #21
              The capture is Nextera whole exome, sequenced in Illumina Hiseq pairend 100bp.
              I wonder if reads with over-represented Kmers map to genome or target exons.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM
              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:55 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-30-2024, 03:16 PM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-29-2024, 01:32 PM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-24-2024, 07:15 AM
              0 responses
              215 views
              0 likes
              Last Post seqadmin  
              Working...
              X