Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • krobison
    replied
    Originally posted by gnatnog View Post
    I can't find much on the plasmid backbone, that is the point of the sequencing. I'm trying to find the backbone of a plasmid I am developing. I do know about 3kb of the sequence, since I have inserted it myself. I'm estimating the plasmid to be about 20kb, as that is what makes sense from some restriction digests. I can take portions of the 3kb I inserted and find them in different contigs that velvet spits out, so at least I know that my plasmid is in there.
    Try mapping the reads back to your backbone with Bowtie2. That would give you an estimate of what coverage you actually achieved for your plasmid. Also worth taking your longest contigs and BLASTing them against all known sequences; sometimes that can be an eye opener.

    I'm not a big fan of trimming, but the truth is I haven't played with it much. One challenge is that many assemblers have logic to deal with these issues, but not always clear how much -- so there is a complex interaction between various pre-processing tools & assemblers, and only empirically can you really work out the best strategy.

    Leave a comment:


  • krobison
    replied
    Originally posted by mchaisso View Post
    If you have an additional few hundred dollars to commit to the project, why not just run PacBio sequencing? Since the reads are O(length of the plasmid), it becomes MSA rather than assembly...

    That's a bit of a stretch; you won't have many reads quite that long, and they probably won't survive error correction.

    On the other hand, even one SMRT cell on such a small genome -- barring rampant host contamination -- should give a number of high quality long reads to easily assemble the genome

    Leave a comment:


  • mchaisso
    replied
    If you have an additional few hundred dollars to commit to the project, why not just run PacBio sequencing? Since the reads are O(length of the plasmid), it becomes MSA rather than assembly...


    Originally posted by gnatnog View Post
    Hello all,

    I've very new to this type of analysis and was hoping I could get some help. I am trying to assemble the sequences of a 20kb plasmid that I had sequenced. They are short single reads that when all added up have over 4000x coverage. What I have done so far is I have trimmed off the first 4 and last 5 bases of the reads, since they were the lowest values when I quality checked them. I am trying to reduce the number of overall reads by quality filtering then down to get somewhere around 50x coverage.

    My plan is to use velvet to assemble, but I am a bit confused about what to do with the output. I have tested it a couple of times and can get the output files, but I have no idea what the next step should be. How do I decide what contigs are good and what are bad? I know that my plasmid is about 20kb, so should I just dismiss anything larger? The contigs file has a lot of different sequences, and I am not sure how to narrow it down from there. Any help would be greatly appreciated!

    Leave a comment:


  • gnatnog
    replied
    I'll look into MUSKET for sure. What I'm starting to wonder is if I should not trim to reduce my coverage, but take a percentage of the reads. The problem is I have not a clue how to do that.

    My reads are 100bp. I trimmed them to 90bp.

    I can't find much on the plasmid backbone, that is the point of the sequencing. I'm trying to find the backbone of a plasmid I am developing. I do know about 3kb of the sequence, since I have inserted it myself. I'm estimating the plasmid to be about 20kb, as that is what makes sense from some restriction digests. I can take portions of the 3kb I inserted and find them in different contigs that velvet spits out, so at least I know that my plasmid is in there.

    Leave a comment:


  • krobison
    replied
    You might want to try using MUSKET or one of the other k-mer based tools out there to correct errors. This may be more effective than your trimming, and you might want to try assembling both trimmed & untrimmed data

    How long are your reads? Longer is better -- which is why paired end is far better than single end.

    Do you know anything about your plasmid backbone? What host was the plasmid prepared in? Screening out contigs which match the host (e.g. E.coli DH10B) would be a valuable next step. Screening out contigs corresponding to the center of the backbone may be useful -- as well as identifying the vector-insert junctions.

    Leave a comment:


  • gnatnog
    started a topic 20kb Plasmid Assembly

    20kb Plasmid Assembly

    Hello all,

    I've very new to this type of analysis and was hoping I could get some help. I am trying to assemble the sequences of a 20kb plasmid that I had sequenced. They are short single reads that when all added up have over 4000x coverage. What I have done so far is I have trimmed off the first 4 and last 5 bases of the reads, since they were the lowest values when I quality checked them. I am trying to reduce the number of overall reads by quality filtering then down to get somewhere around 50x coverage.

    My plan is to use velvet to assemble, but I am a bit confused about what to do with the output. I have tested it a couple of times and can get the output files, but I have no idea what the next step should be. How do I decide what contigs are good and what are bad? I know that my plasmid is about 20kb, so should I just dismiss anything larger? The contigs file has a lot of different sequences, and I am not sure how to narrow it down from there. Any help would be greatly appreciated!

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 11:09 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-19-2024, 07:20 AM
0 responses
148 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
121 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
111 views
0 likes
Last Post seqadmin  
Working...
X