Header Leaderboard Ad

Collapse

Mira: Contigs failing to collapse despite similarity

Collapse

Announcement

Collapse

SEQanswers June Challenge Has Begun!

The competition has begun! We're giving away a $50 Amazon gift card to the member who answers the most questions on our site during the month. We want to encourage our community members to share their knowledge and help each other out by answering questions related to sequencing technologies, genomics, and bioinformatics. The competition is open to all members of the site, and the winner will be announced at the beginning of July. Best of luck!

For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
See more
See less
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mira: Contigs failing to collapse despite similarity

    I have been using MIRA to assemble PacBio data for a very small circular genome and I have been observing a strange result in the output. For several datasets when the contigs are compared to the closest available reference There are a large number of contigs in certain regions that represent the same region of the genome.

    Even when though these contigs have a high degree of overlap, they are not joined into single contigs.

    The problem is especially obvious in one dataset where the whole genome can be represented as two contigs with a large degree of overlap at both ends but are not collapsed into a single contig (shown by MUMmer mapview output attached)

    I've been running Mira just with the most basic settings for whole genome, denovo, accurate

    The closest theory I can come up with for why this is happening is that errors are prevalent enough in the PacBio data that it is possible to come up with two distinct version of the same sequence as a contig.

    I would love to hear any suggestions on how to properly collapse these contigs as I am worried I am missing valuable read and quality information by having identical regions represented by different contigs.
    Attached Files

  • #2
    I have never used MIRA, so cannot comment specifically as to the why, but in assembling PacBio data with HGAP - Celera assembler I have on occasion seen this. It is generally due to Celera Assembler conservatively breaking conitgs based on some heuristic. To force the overlap I generally use a simpler overlapper, such as minimus2, then resequence and call a consensus with quiver to check for the introduction of any missasemblies.

    Comment


    • #3
      use the mira mailing list to get a quick reply and solution from the authors

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 07:14 AM
      0 responses
      4 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 01:08 PM
      0 responses
      6 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-01-2023, 08:56 PM
      0 responses
      56 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-01-2023, 07:33 AM
      0 responses
      193 views
      0 likes
      Last Post seqadmin  
      Working...
      X