Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use of illumina mate pair libraries for scaffolding

    Hi colleagues,

    I'm working with Illumina mate pair libraries and trying to use them for scaffolding. Illumina mate pair libraries have many problems that I believe that are not fully taken into account by assembly and scaffolding softwares (please correct me if I'm wrong). These problems are:
    • a. Chimeric fragments (long linear biotinylated fragments could form chimeras before circularizing)
    • b. Junction reads (one of the fragment ends span the biotinylated junction)
    • c. Inward paired ends contamination (fragments don't span the biotinylated junction
    • d. Low diversity (multiple sequenced copies of the same fragment)


    It's difficult to identify the good mates, that have the expected distance/orientation. My current bioinformatics strategy to use those reads for scaffolding is:
    • 1. de novo assembly (without using mate pair libraries)
    • 2. Identify and mark contig regions that could represent collapsed repeats. This is accomplished by doing a 22-mer frequency analysis in the dataset of reads and their occurrence in the contigs
    • 3. Map mate pairs to de novo contigs not allowing multi-mapping reads (bowtie parameter -m 1)
    • 4. Select mate pairs that obey all the criteria above:
      -the ends are mapped to different contigs
      -the ends don't map to regions marked as collapsed repeats in step (2)
      -position and orientation of both ends in their contigs reduces the possibility that they could be "inwards paired end contamination". For example, let contig A be a contig of 5Kbp and let myfrag be a sequenced fragment of one 3Kbp Illumina mate pair library. If read myfrag/1 maps to position 1000 of contig A in +/+ orientation, I would expect that, if myfrag is an inwards contamination, myfrag/2 would map to position ~ 1500 in +/- orientation. If myfrag/2 does not map to contig A, probably myfrag is not an inwards contamination. However, if read myfrag/1 had mapped to position 4800 of contig A (+/+), I could not exclude the possibility that myfrag is inwards, so I would not use it for scaffolding.
    • 5. Remove redundancy. Fragments whose ends are mapped to the exact same positions are counted only once.


    Typically, after all those steps, the link graph (contig ends are nodes, linking mates are edges) still has many incompatible links. This indicates that I have a mix of good mates and chimeric mates (case a). To scaffold, I have to make unsafe decisions, expecting that the number of good links is always much grater than the number of bad links.

    I didn't find any discussion regarding these issues with the use of mate pair libraries to scaffolding. In the forums and mail lists I see that many people either use mate pairs in the assembly directly or use them to scaffold only, without worries about inwards contamination and other problems. Am I being unnecessarily meticulous? What tools/strategy do you use with illumina mate pair libraries for scaffolding?

  • #2
    Applied Biosystems and Roche use circularization adapters to generate mate end libraries. (Roche calls their "paired end", but they are what everyone else would call "mate end" or "mate pair".) That takes care of issue (b). I am mystified as to why Illumina's protocol does not used this methodology.

    --
    Phillip

    Comment


    • #3
      I think you have hit upon a fundamental issue with Illumina MP libraries right now. The 454 paired-end protocol is still much nicer as you sequence through the linker and as such have much more confidence that you have a true pair. I don't have the answers you are looking for but I think it is an important point you make and it is not well understood how it should be tackled in silico right now.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Developments in Metagenomics
        by seqadmin





        Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
        09-23-2024, 06:35 AM
      • seqadmin
        Understanding Genetic Influence on Infectious Disease
        by seqadmin




        During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

        Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
        09-09-2024, 10:59 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 10-02-2024, 04:51 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 10-01-2024, 07:10 AM
      0 responses
      21 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 09-30-2024, 08:33 AM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 09-26-2024, 12:57 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Working...
      X