Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to close rRNA gaps in a genome

    Hello everyone,

    I have sequenced and assembled my bacterial genome from PE 100bp Illumina reads into a series of contigs using Velvet. However, when I attempted to align the contigs against a very closely related strain using Mauve, I noticed that all the contig breaks correspond to annotated rRNA operons on my reference genome.

    There are five other fully sequenced genome of my species, and they all have ~8 rRNA operons, however my assembly has no 5, 16 or 23s annotated features.

    I have only used ~10 million reads from a total of ~255 million reads produced, so my question is this: How can I pull out the rRNA sequences form the total Illumina dataset and use these to join contigs/close gaps? Looking forward to some potential feedback, thanks!

  • #2
    Why did you only use a small part?

    If you have no rRNAs at all, then it first means that they couldn't be assembled (none at all a bit weird, but whatever).
    Maybe an issue with velvet, no clue.
    But might be a coverage issue, although it shouldn't.

    Basic first recommendation:
    Map your reads to your genome, and filter out the not-mapping reads (there's a bowtie2 option for that).
    Try to assembly them separately with another assembler.
    Then try to see if CAP3 can stitch the rRNAs + your assembly together.
    Else try to scaffold to the reference with e.g. Contiguator.

    But another question: If these are your only gaps: Why bother? Apparently your genome is nearly fully complete, and making it fully circular will probably cost you quite a bit time. Might not be worth it.

    Comment


    • #3
      I can only run a small portion of my reads as i only have my personal laptop to perform assemblies on. However, it seems that including more reads does not improve the assembly. That is, roughly the same contigs are returned when assembled with more, or less reads.

      I do have several small contigs which contain rRNA sequence, and each has an eight-fold greater read coverage then the rest of the genome, indicating the eight rRNA operons are being assembled as one. However each contig is smaller then the size of a 16s or 23s gene, which points to velvet not being able to extend the contigs.

      I have gone ahead and mapped my reads to the closely related reference genome using bwa. I was able to map reads to each of the rRNA operons and the surrounding sequence, and extract a consensus sequence which ought to bridge these gaps. Are CAP3/Contiguator programs that can create larger scaffolds using additional contigs?

      As it stands, the genome is likely assembled good enough for our downstream applications of ChIP-seq and RNA-seq, however I would like to close the gaps because 1)we seem to have the read data, its just a matter of assembling/mapping it and 2) I dont know if we can submit the genome as a draft when it lacks any rRNA operons and therefor transcriptional machinery.

      Comment


      • #4
        Originally posted by Tom_C View Post
        I can only run a small portion of my reads as i only have my personal laptop to perform assemblies on. However, it seems that including more reads does not improve the assembly. That is, roughly the same contigs are returned when assembled with more, or less reads.
        How much % of your data maps back to the assembly?

        Originally posted by Tom_C View Post
        I have gone ahead and mapped my reads to the closely related reference genome using bwa. I was able to map reads to each of the rRNA operons and the surrounding sequence, and extract a consensus sequence which ought to bridge these gaps. Are CAP3/Contiguator programs that can create larger scaffolds using additional contigs?
        CAP3 is one of these really, really old assemblers, which don't use a graph approach, but directly try to overlap sequences.
        If you think you might have overlapping sequences, you can try to throw them in, maybe they can be merged.

        Contiguator is a program to scaffold according to a reference. Just so that you can get one big .fasta sequence with a few gaps, for easier handling in that case.

        Trying to make a consensus with the extracted rRNA sequences and your assembled contigs is definitely worth a thought IMHO. If mapping quality is good, and in all cases you have uniquely mapping reads, then I'd not see a problem, but it's not 100% clean science.
        Maybe some of you lab people can PCR + sequence through the dubious regions, that would be the cleaniest thing.

        Originally posted by Tom_C View Post
        2) I dont know if we can submit the genome as a draft when it lacks any rRNA operons and therefor transcriptional machinery.
        Submit to...
        - databases: Nobody cares
        - a journal: People might care, but with your reasoning that you can nicely map your reads to the related rRNA, I'd write that down, and hope that people understand your reasoning.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X