Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • The_Roads
    replied
    Hi PPARG,

    Thanks for the reply. The internal priming i mentioned was meant as internal priming during sequencing, it was something that was suggested by an illumina tech when we were discussing the problem reads. These errors turn up in our data at very low frequency <1000 per 1.5M aligned reads and are evenly distributed in all samples and across all ref seqs we've looked at while doing dna sequencing. hence i think they are seq errors.

    The rearranged reads occur at similar frequencies but they are not overlapping pairs. what i meant was for the sequence ABCDEFG we see similar frequencies of pairs that align, for instance, C-E as E-C. in our real biological rearrangements we only ever see one form ie only C-E. these suspected ligation errors only seem to have happened in only a few dna sequencing library preps and are always present at low frequency ~3/10,000 reads. i hope this makes things clearer.

    see edit above
    Last edited by The_Roads; 06-08-2010, 11:40 AM.

    Leave a comment:


  • thinkRNA
    replied
    Originally posted by pparg View Post
    Hello, The_Roads,
    Your idea of internal priming is very interesting. This seems to explain read pairs with insert length <36 very well. If this is the case, such internal priming should occur at the PCR step in the library preparation stage, rather than the sequencing stage. One thing I am not sure is that is such internal priming feasible or common in reality?
    Ligation errors, if occur, should lead to false ‘re-arrangement’ calls. But if I understand correctly, the B-F and F-B pairs yo mentioned were not likely due to ligation error. I don’t think ligation error can generate a combined sequence where the two ends are B-F or F-B complementary to each other.
    Anyway, thanks a lot for your informative input!!
    EST sequences which are created by sequencing ends of cDNA that are in turn generated using oligo-dT primers have evidence of as much as 15-20% internal priming (http://bioinformatics.oxfordjournals...ull/21/18/3691). If your sequencing was done using cDNA generated by oligo-dT primers, I don't see why it couldn't be as prevalent.

    One way to check for it is, you can see the stretches of A's in the genome where the primer attached itself (instead of the polyA tail). I wonder if you can use this to check for errors in your paired end data.
    Last edited by thinkRNA; 03-29-2010, 10:44 AM.

    Leave a comment:


  • pparg
    replied
    Hello, The_Roads,
    Your idea of internal priming is very interesting. This seems to explain read pairs with insert length <36 very well. If this is the case, such internal priming should occur at the PCR step in the library preparation stage, rather than the sequencing stage. One thing I am not sure is that is such internal priming feasible or common in reality?
    Ligation errors, if occur, should lead to false ‘re-arrangement’ calls. But if I understand correctly, the B-F and F-B pairs yo mentioned were not likely due to ligation error. I don’t think ligation error can generate a combined sequence where the two ends are B-F or F-B complementary to each other.
    Anyway, thanks a lot for your informative input!!

    Leave a comment:


  • The_Roads
    replied
    Hi,

    If i understand your description we have seen the same thing in paired end dna sequencing.

    We see a small percentage of pairs where the two reads are either on top of each other or have very short inserts/paired end distances. it depends on the assembler you use as to whether you can see or extract these pairs. in our case suspect they are simply an artifact of the gel extraction steps where a small percentage of short fragments end up in the extraction. i dont know if the same step is done in rna seq so i may be totally wrong. for snp/indel detection we now remove all pairs like this as they the overlapping pairs double the counts for any variants they carry.

    we also see low frequency reads where segments of the read align inverted to each other. as with the above these show random and even/coverage dependent distribution across our ref seqs and we put these down to seq errors (internal priming?). i have seen these described in an rna-seq paper as evidence for unique transcripts but as they are inverted and seen all over the place i am personally doubtful about this, again i may be way off.

    i guess while we are on the subject or pe errors, when we map re-arrangements by looking for extended inserts/read distances or backward forward aligned pairs we have found some libraries where we see more than usual but again they are distributed evenly. Because in these libraries we see equal proportions of B-F and F-B pairs for each rearrangement locus we put these down to ligation errors during library prep. ie some fragments are ligating to each other creating false pairs. anyone have any idea whether this is a possibility? i'm pretty certain they are non-biological what ever the cause.

    EDIT: we did some blast alignments of single reads and found the same B-F, F-B "mirrored" breakpoints so i think we can rule out ligation errors. i have no idea what could be causing the problem, whether biological or experimental. Any ideas welcome.

    cheers,
    The_Roads
    Last edited by The_Roads; 06-08-2010, 11:38 AM. Reason: new evidence

    Leave a comment:


  • pparg
    replied
    Yes, any thoughts?

    Leave a comment:


  • Xi Wang
    replied
    is the insert length you meant from the most right site of mate 1 to the most left site of mate 2?

    Leave a comment:


  • pparg
    replied
    Thanks a lot, dcjamison, for all the thoughts and suggestions.
    I think it is not likely due to cut-off setting. Even though over 99% of the abnormal read pairs have insert length <=36, there are still very small number of read pairs have insert length >36.
    It is also not likely due to incomplete Refseq data, because the insert length here covers the whole segment between the outer bounderies of the read pair. When read length =36, insert length <=36 suggest that only the first mate have been re-sequenced from the other direction in the second round. I know my hypothesis seems not likely be the true cause, but I can’t think of any reason that explain the data.
    You last suggestion seems to be promising. I will try it out later.

    Leave a comment:


  • dcjamison
    replied
    I've done PE vs RefSeq also, and my impression (not backed by numbers) is that the majority of anomalies are due to positional issues. I did not pursue it, since I was looking for something completely different in the data set, and because it was more or less what I was expecting.

    There are two parts to your question: why are the anomalous pairs restricted to >36 bp inserts, and what is causing the small inserts.

    The first is caused by the algorithm. The pairing estimates the average size of the insert, then places cut-offs at a couple standard deviations out. In your case, the low-end cut-off seems to be 36, which by happenstance concurs with the read-length.

    The "short" inserts are probably caused by a difference between RefSeq and the biological realities. For example, RefSeq does not contain all the possible isoforms generated by alternative splicing. I have also noticed that the amount of UTR reported for various isoforms varies, which can lead to odd insert sizes.

    I am pretty sure the sequencing hypothesis you're describing isn't very likely, since it would A) require the fragments to hang around through the cluster regen steps, which involves several washes; B) would also be present in genomic PE sequencing; and C) would result in read 2 being the read 1 adaptor.

    My suggestion is to put the reads into two bed files (translating the coordinates from RefSeq to genomic) and load them as separate tracks into the UCSC browser: one colored blue with only consistent pairs, and one colored red with only the anomalous pairs. I predict the anomalous pairs will cluster to a limited number of places, and that by examining them in relation to the gene model you will be able to figure out what is going on.

    Leave a comment:


  • pparg
    replied
    Hello,
    Has anyone seen this before? Are there any thoughts? Thanks!

    Leave a comment:


  • pair-end sequencing produces single-end read artifact

    Dear all,
    I mapped pair-end RNA-seq reads to RefSeq transcripts. I looked into the mapped but not properly/sensibly paired reads, and found about 75000 among 195000 read pairs in this groups have insert length between 30-36bp. There is still decent number of pairs with insert length down to 20, but essentially no read pairs has insert length >36. Considering the read length is 36 bp and my expected insert length is 250 bp, my statistics suggest that these read pairs with ~36 bp insert seem to be generated by some artifact of the pair-end sequencing. The first round of sequencing is fine (mate 1), but the second round of sequencing seems to take the 36 short sequence synthesized in the first round as templated and sequence by synthesizing the complementary ~36 bp. I am not sure why/how this happens? Does anyone have any ideas? Note that my short read data are of high quality, and this is just some common artifact with Illumina pair-end sequencing.
    Pparg

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:58 AM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 02:20 PM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-07-2024, 06:58 AM
0 responses
181 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-06-2024, 08:18 AM
0 responses
231 views
0 likes
Last Post seqadmin  
Working...
X