Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina PE 100bp and allele content

    Hi All Im new on here. i was after advice concerning the 100bp PE reads:

    Q1) I have heard it is a problem bioinformaticaly that if you do 100bp PE reads, you ideally don't want the reads from either end to overlap (ie more than 100bp+ from either end) coz if they do you can't align these easily.

    Q2) Following on from Q1 the reason I ask this is that I would like to sequence directly from 200bp PCR fragments as I am hoping to index 96 samples in a single lane. So is it possible to sequence many (~150) amplicons that are all 200bp long, I guess adapters and barcodes would be added to these after PCR.

    so ~150 x 200bp amplicons for 96 patients using 100bp PE reads, therefore achieving 200bp reads that would hold allele specific information for that one paired end read. Also I really have no other option but to use PCR products - but to maintain the read along the full length.

    Can anyone help

  • #2
    Will all of your amplicons be exactly 200 bp long?
    And are you aligning to a reference genome after sequencing?
    I'm just thinking that if you're not aligning, and your reads don't overlap, then how will you know the actual insert size -- i.e. how many nt are between your PE reads.

    In terms of size, though, having some be less than 200 nt and having the PE reads overlap is no problem. I've seen a number of our samples have low insert sizes close to a mean of 180 nt, with a distribution around that (i.e. some < 150 nt), and aligning to the reference human genome has been no problem.
    You just don't want your inserts to be so short that you start reading through them into the adaptor sequences. But it sounds like that wouldn't be the case here.

    Comment


    • #3
      Originally posted by Jeremy37 View Post
      Will all of your amplicons be exactly 200 bp long?
      And are you aligning to a reference genome after sequencing?
      I'm just thinking that if you're not aligning, and your reads don't overlap, then how will you know the actual insert size -- i.e. how many nt are between your PE reads.

      In terms of size, though, having some be less than 200 nt and having the PE reads overlap is no problem. I've seen a number of our samples have low insert sizes close to a mean of 180 nt, with a distribution around that (i.e. some < 150 nt), and aligning to the reference human genome has been no problem.
      You just don't want your inserts to be so short that you start reading through them into the adaptor sequences. But it sounds like that wouldn't be the case here.
      ====
      Thanks for the quick response

      I was hoping to generate similar sized amplicons (~200bp) so there would be no need for size selection.

      I plan to do bisulphite sequencing and aligning to a small region where all my amplicons will come from (300kb region of bisulphite converted sequnce) some of these amplicons will overlap with one another.

      In essence I want as much read length (100bp x2) from the 200bp PCR amplicons as possible and so I was thinking there was going to be no insert - is this possible?

      Comment


      • #4
        Have you thought about combining 150bp paired end reads into one psuedo read of 200bp using the 50bp overlap?
        These guys used that approach for their metagenomic work, but it should also work for other applications.

        Background Different high-throughput nucleic acid sequencing platforms are currently available but a trade-off currently exists between the cost and number of reads that can be generated versus the read length that can be achieved. Methodology/Principal Findings We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read. Conclusions/Significance This strategy is broadly applicable to sequencing applications that benefit from low-cost high-throughput sequencing, but require longer read lengths. We demonstrate that our approach enables metagenomic analyses using the Illumina Genome Analyzer, with low error rates, and at a fraction of the cost of pyrosequencing.

        Comment


        • #5
          Sorry Im new on here and not sure if you directly got my response Jeremy37

          I was hoping to generate similar sized bisulphite PCR amplicons (~200bp) so there would be no need for size selection and this is a reliably obtainable size for bisulphite PCR.

          I plan to do bisulphite sequencing and aligning to a small genomic region (a large gene) where all my amplicons will come from (300kb region of bisulphite converted sequnce and which will be used specifically for the alignment) some of these ~200bp amplicons will overlap with one another (say where I was interested in a streach of 3kb or so).

          In essence I want as much read length (100bp x2) from the 200bp PCR amplicons as possible to look at methylated CpGs and SNPs in the same amplicon and so I was thinking there was going to be no insert as I want data on the whole 200bp PCR amplicon - is this possible to achieve?

          Comment


          • #6
            Originally posted by TonyBrooks View Post
            Have you thought about combining 150bp paired end reads into one psuedo read of 200bp using the 50bp overlap?
            These guys used that approach for their metagenomic work, but it should also work for other applications.

            http://www.plosone.org/article/info%...l.pone.0011840
            Thanks will look into that

            Comment


            • #7
              I don't see any problem with what you're trying to do.
              I'm not sure why you are concerned about the read length though. It seems to me that you could do this even with 50 bp reads if you wanted. You would be demultiplexing the samples yourself using your adapter sequences, I guess.

              I'm not sure how the SNP calling would work, since with the bisulphite treatment (which I just had to look up) you're going to have a lot of differences from the reference. I think you need someone who knows about methylation analysis to comment...

              Comment


              • #8
                Overlapping end is not a problem for all the major read mappers. It could propose minor issues for SNP calling, but just minor.

                Comment


                • #9
                  Technically, overlapping reads should not be a problem (unless they are completely contained within each other). However if reads overlap you will potentially call the methylation state of the overlapping part twice, and you need to think about a strategy how to deal with this (i.e. use methylation calls from only a single read, from both reads...).

                  I am also quite concerned about a read length of 100bp. From our experience the basecall qualities drop steadily towards the end of reads, and this usually starts from bp 50-70bp. BS-Seq is very dependent on good quality reads, especially if you also want to look at SNPs later on. We have seen numerous examples where long reads (75-108bp) had to be trimmed uniformly to ~50bp (or using adaptive quality trimmers) in order to obtain a good mapping efficiency. This essentially means wasting half of the data und thus money. If I understood it correctly you should have many different products of your amplified gene, and I think more but shorter reads will be more useful than one 2x100bp run with low qualities.

                  If you have good coverage SNP calling is possible, but it is a bit trickier than normal because SNPs concerning Cs or Ts can only be called by looking at reads from the opposing strand (before BS conversion).

                  Comment


                  • #10
                    Thank you all for your comments they have been really helpful as I don't have any hands on experience with Illumina sequencing just yet - it's been "Illuminating"

                    Jeremy37 - My reason for hoping for longer reads is to associate methylation on specific reads (originating from a single cluser - a bit like a single molecule) and it's associated SNP's, and so the longer the read the more potenital SNPs to try and associate the methylation status with.

                    fkrueger - Ok I get that overlapping won't be an issue, but now appreciate that the quality is going to drop off from 50-75bp onwards so wasting half the money! Also Iam aware that the C or T SNPs will need to be confirmed by genomic re-seq.

                    As you have experience with BS-seq, I just wondered wouldn't it be less complex mapping to a defined region (such as my 300kb gene region) than a whole genome and so we might be more sucessful in mapping the poorer quality end of reads? OR would you still reconmend shorter, say 75bp PE reads in the hope that the last 25 bases are OKish quality, or in your experience this would still be poor quality for BS-seq??

                    Comment


                    • #11
                      I would assume that you wouldn't lose many reads due to ambiguous mapping if you aligned 2x50 or even 2x75bp reads the whole genome instead of just your region of interest. It might be a bit quicker but shouldn't make such a big difference. If in doubt you could just compare the number of mapped reads against the whole genome with your region of interest, and if they don't differ very much i would possibly use the whole genome approach as this can be informative whether your experiment worked the way you intended and it is probably easier to justify for a publication at some point...

                      If I had a choice I would opt for 2x50 or 2x75bp reads, the latter might need to be run through a quality and/or adapter trimmer just to be sure. Low quality sequence can lead to wrong methylation calls, in rare cases even to mis-mappings (which generally produce random methylation calls). And of course many mismatches can bring down your mapping efficiency quite quickly if you use reasonably strict mapping parameters. So I suggest short to medium reads and possibly quality trimming, then you should be fine. Let me know if I can be of any further help with your project.

                      Comment


                      • #12
                        How long reads you can sequence depends on many factors, such as machine, chemistry and optimization. All the HiSeq users I know can confidently get 2*100bp reads without much quality drop at the end. I have seen optimized GAIIx can also reach this level of accuracy. With 100bp reads, we have much fewer alignment artifacts than using 2*50bp reads. If your machine (e.g. HiSeq) can do that and you are not very constrained by the funding, you should try to get 2*100 reads. Roche used to advertise "longer is better". That is true.

                        Also, in the previous post, I just want to say overlapping ends does not cause mapping problems. How to deal with them is largely the task of downstream tools.
                        Last edited by lh3; 07-01-2011, 04:51 AM.

                        Comment


                        • #13
                          I agree that longer = better IF quality stays up until the end. The latest iPS BS-Seq datasets from Lister et al. have excellent qualities for reads >100bp for instance. However we have received loads of emails from people where the quality of their data deteriorated quite early on (as mentioned above).

                          Comment


                          • #14
                            Moving to ILMN forum.

                            Comment


                            • #15
                              All the HiSeq data I have seen so far have good quality at the end. Another potential concern is that not all BS mappers are optimized for 100bp reads. They may have better performance for 50bp reads.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X