Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Noa
    replied
    Ive been working with Galaxy (http://main.g2.bx.psu.edu/) for a lot of my analyses -

    Leave a comment:


  • vtosha
    replied
    What software do you use for trimming and sliding window?

    Leave a comment:


  • Noa
    replied
    Thanks- I will download hopefully next week and give it a try!
    I will let you know how it goes.

    Leave a comment:


  • koadman
    replied
    Hi Noa, some colleagues and I cobbled together a pipeline to automate the process of quality trimming, error correction, contig assembly, scaffolding, and some QC for bacterial and archaeal genomes. We've seen great results on our data, but could really use feedback from others about it. Available here. Documentation here. At least one other person in the seqanswers forum has tried it, and ran it successfully but apparently had sequenced a mixed culture so interpreting the result was ambiguous.

    If you're interested and willing to give it a go I'll try to field any questions that may arise.

    Leave a comment:


  • Noa
    replied
    yup- trimmed anything without a mean score of 20.

    not sure i will get anything better with MIRA but hey, worth a shot. Basically i wanted to try one deBruijn and one OCL algorithm and MIRA and velvet got the best reviews for bacterial genomes.

    I will update (and feel free to remind me if i didnt update in a week or so!)

    Leave a comment:


  • twaddlac
    replied
    Just to clarify: I'm assuming that the sliding window trimmed what didn't have a quality score of 20?

    In regards to the MIRA assembler, I don't know if it would make much of a difference since you're dealing with a bacterial genome and they're generally not as complex as other eukaryotes, so it may be wasting your time. However, I have no experience in doing that and would be interested to see what you come up with.

    The more knowledge about assembly the better off we'll be!

    Leave a comment:


  • Noa
    replied
    They were originally 144 bp (each mate). I trimmed the first 5bp and the last 44 since they looked bad. Then I did a sliding window requiring a quality score of 20. Then I trashed any sequence that was left with less than 40bp. I looked at each step with FASTQC by eye to eyeball the next step with respect to trimming.

    I used velvetoptimiser and so far it seems that 25kmer was the best but I want to check some >31kmers manually tomorrow or next week. (I have to run only a few kmers at a time due to memory constraints on my machine).

    Let me know if I can give you more details, and I will also get back to you once I fully complete the runs. I also want to fool around some more with MIRA, or maybe taking MIRA data to velvet? Does that even make sense?

    n

    Leave a comment:


  • twaddlac
    replied
    That's fantastic, Noa!

    For my personal interest, what lengths did you trim them to? Did you just trim the 3' mate? Details details details!

    Leave a comment:


  • Noa
    replied
    Thanks a million- the trimming made a HUGE difference.
    I had actually trimmed too much before i guess- the reads were worse near the end but I think I trimmed off too much.
    I did a sliding window now for quality score and then trashed any sequences that were too short, and now i got everything in 300 contigs, and the largest is 2.5M which is about half the genome! Yahoo!

    Leave a comment:


  • twaddlac
    replied
    Trim the Reads?

    I've noticed that trimming the reads has had a significant impact on assembly results. I've tried this on both ABySS and Velvet and it seems to work fairly well in terms of generating larger contigs. I haven't done this for Illumina reads (solid only) but it could potentially be worthwhile. For my solid data I've trimmed the 5' end of the 3' mate and it has improved the quality under the same settings used by the assembler. If the quality does drop off on the 3' mate of your reads, try trimming to, say, 50bp? I'd be interested to see the results if you do so.

    Hope this helps!

    Leave a comment:


  • kentk
    replied
    Originally posted by Noa View Post
    I also tried MIRA yesterday for the first time and got with 2 million trimmed reads (55bp): 22000 contigs covering consensus of 4.8M largest conti is 3200bp, N50 is 287 - which is also not so great for a bacterial genome. Happy for advice there too!
    Since you've tried a de brujin assembler an an OLC one, have you tried to combine the contigs together using Minimus? Sometimes it works wonders especially if youre getting alot of overlaps but sometimes it doesnt help at all, but still worth a shot.

    Leave a comment:


  • Noa
    replied
    This boxplot (a tiny bit bigger) is biggest file they will let me upload on SeqAnswers. Alternatively another version...

    I ran FastQC - there are no overrepresented seq's or anything.
    I do get over-represented kmers but these are in the first 5bp (which I trimmed) and in the very edge after 100bp (which I wasnt using either) - picture attached.

    I will try to trim by quality and not a fixed length and see if that helps any.
    Attached Files

    Leave a comment:


  • arvid
    replied
    You may wish to keep the (possibly few) long reads which have high quality, instead of clipping them at a fixed length. It is a bit difficult to read the quality boxplot, could you upload it in full resolution?
    Did you check for adapter contamination? If you ran FastQC on it, you should get an idea if you have some overrepresented sequences or k-mers in your reads.

    Leave a comment:


  • vtosha
    replied
    Some bacterial, near 5,5 M. 72nt, paired-end. N50 from 100 000 to 170 000. Not the best at all, but best our. But for some chloroplast genome we couldn't get any long contig with an excellent sequencing results, so problem may be in library preparation or DNA.
    Yes, there may be optimal k-mer for your own results, and k-mers with more length not better.
    Last edited by vtosha; 02-29-2012, 04:19 AM.

    Leave a comment:


  • Noa
    replied
    I just tried running as a 41mer with the PE 2millionx2 reads and I ma getting N50 of 243, max 5556, total of 3.6M genome. So that doesnt look too promising. (Maybe not surprising since the entire read I am using is 56 bp (the trimmed read) so maybe that is too large of a kmer.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-19-2024, 07:20 AM
0 responses
32 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
43 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
43 views
0 likes
Last Post seqadmin  
Working...
X