Seqanswers Leaderboard Ad

Irsan_Kooi · 06-07-2011, 04:26 AM

Does anyone have an idea how long it takes to perform a single end assembly with CLC assembly cell 3.2.2. on 24 Gbases of data using quadcore with 16 GB or RAM.

P.S. I know what they claim on the company website, I just like to hear about experiences of an unbiased user...

syambmed · 03-30-2011, 12:19 AM

Confirmation

Originally posted by amhein View Post

Hi syambmed,

Here are a few comments to your questions:

1: I think it would probably be preferable to include both control and cell line reads in your de novo assembly.

2: If the box plots look ok there is probably not a need for normalizing your data. Also, there are two types of statistical tests available. The Gaussian based tests assume a continuous expression measure (such as the RPKM) and require replicates for each condition. The proportion-based tests compare counts (such as a read count, e.g 'Total gene reads'), and can work with or without replicates. As the proportion based tests compare proportions they implicitly normalize samples, so you should not use them on normalized data.

3: The RNA-seq analysis in the Genomics Workbench allows you to work either (a) with an annotated genome or (b) with a list of reference sequences (e.g. ESTs). Option (b) would be the one you use when you assemble against some reference sequences that you found in your de novo analysis. Option (a) requires that you have a genome sequence which has 'gene', and possible 'mRNA' annotations. If only gene annotations are available the reads will be assembled against the gene regions. If also 'mRNA' annotations are available, you can choose between the 'Eukaryote' and 'Prokaryote' modes. If you choose 'Eukaryote' reads will also be assembled against transcripts. If you do not have mRNA annotations on you reference sequence for the cat the 'prokaryote' option is the only available option. But it will make sense to use this option - it is our name for the option that is a bit misleading - sorry.

Hope this helps. If you need further assistance please contact our Support people.

Anne-Mette (CLC developer)
On behalf of Roald

==============================================================

Dear Anne-Mette,

Thank you for your reply. Your reply has shed some light in my tunnel..haha..

Can I confirm these with you.

1. So, it is preferable to do 'de novo-ed control reads' vs 'de novo-ed treated reads' RNA seq rather than control (raw reads) vs treated (raw reads) RNA seq..? I tried doing 'raw control' vs 'raw treated' reads once but my computer freezed after 10 hours with only 1 percent progress. (16 core, 47 gb RAM, control=48 million reads vs 50 million reads)

2. I don't have any replicates..just 1 control and 1 treated sequenced. Thus, based on your suggestion, proportion-based tests are the most appropriate.

3. The 2x cat's genome has mRNA annotations. So, I don't have problem with this.

I found that CLC GWB is very user friendly especially for a newbie like me. Keep up the good work.

Roald · 03-28-2011, 11:02 AM

Thanks!!

Thanks for your help Anne-Mette!!

amhein · 03-28-2011, 03:42 AM

Hi syambmed,

Here are a few comments to your questions:

1: I think it would probably be preferable to include both control and cell line reads in your de novo assembly.

2: If the box plots look ok there is probably not a need for normalizing your data. Also, there are two types of statistical tests available. The Gaussian based tests assume a continuous expression measure (such as the RPKM) and require replicates for each condition. The proportion-based tests compare counts (such as a read count, e.g 'Total gene reads'), and can work with or without replicates. As the proportion based tests compare proportions they implicitly normalize samples, so you should not use them on normalized data.

3: The RNA-seq analysis in the Genomics Workbench allows you to work either (a) with an annotated genome or (b) with a list of reference sequences (e.g. ESTs). Option (b) would be the one you use when you assemble against some reference sequences that you found in your de novo analysis. Option (a) requires that you have a genome sequence which has 'gene', and possible 'mRNA' annotations. If only gene annotations are available the reads will be assembled against the gene regions. If also 'mRNA' annotations are available, you can choose between the 'Eukaryote' and 'Prokaryote' modes. If you choose 'Eukaryote' reads will also be assembled against transcripts. If you do not have mRNA annotations on you reference sequence for the cat the 'prokaryote' option is the only available option. But it will make sense to use this option - it is our name for the option that is a bit misleading - sorry.

Hope this helps. If you need further assistance please contact our Support people.

Anne-Mette (CLC developer)
On behalf of Roald

syambmed · 03-24-2011, 11:54 PM

Hi Roald,

I am new to bioinformatics and have little bioinfo knowledge.

I got treated and control transcriptome data from a cell line from a cat. I want to find differentially express genes between the two. I am using CLC Genomic workbench4.5.1 for analysis. I have several questions I hope you can help. For finding differently gene expression I did de novo assembly on control reads and then map my treated reads back to assembled control reads by RNA seq. this way de novo control act as the reference.

Is this the right way to do it..? or should I de novo treated reads also before mapping back to de novo control reads..?

after I did that I create box plot for quality control.the mean line is on the same level.

so, should I conduct normalization..? if yes, the software provide 3 ways to normalize data which are by scaling, quantile and reads per million.

which one should I choose..? I read the reads per million is the suitable one for RNA high throughput sequencing data.

or should I use reference gene like GAPDH or beta-actin expression value for normalization..? if yes, how do I do it using this software..?

FYI, cat has 2x annotated genome and 3x genome without annotation. I already did rna seq analysis for control and treated reads with these genome to find out the genes.

The problem is I dont know how to compare control and treated bcoz when I want to compare them by rna seq but the software always tick prokaryote instead of eukaryote. I read your CLC bio tutorial on rna seq but still have some confusion about this.

Help me..huhu

Thank you.

usad · 07-05-2010, 03:47 AM

Hi didymos,

I reckon you want to do de-novo? I wouldn't have the values for a full human genome but 500MBases to 1GBase should run with <100GBytes of RAM with CLC . The recommended RAM size is currently somewhere around 70Gbytes, and we never went over 100Gbytes RAM usage when we assemble up to 1Gigabase (non human genomes though).

Best Wishes

didymos · 06-30-2010, 03:22 AM

Hi,

I am wondering how big amount of RAM you need to assembly long contigs from human genome sequencing with short Illumina reads. On the CLC web page is writen that you need much less then in case of SOAP which was used in assembly of Panda genome. To get good coverage you need about 150-200Gb of reads (>50x coverage). In Panda genome project they used supercomputer with 512GB RAM....

johnny · 12-11-2009, 01:16 AM

Originally posted by smprince18 View Post

If however you would still like to use the conflict table you can go to the right hand side panel, go to the heading Nucleotide info and choose to show your translation, you can pick any and all frames or tell it to translate ORF and CDS regions. If you have more questions feel free to contact me (shawn prince) at [email protected]

This is what I was actually looking for in the beginning. But I think there are more questions rising up while digging deeper into that subject. So I'll probaly take your offer and write an email instead of bothering the community

.
Thank you both for the quick reply!

smprince18 · 12-10-2009, 09:18 AM

************************disclaimer I work for CLC bio ***********************

Johnny,
The Roads gave you a good answer to run the SNP detection, b/c you will have set significant values for the SNP. The conflict table will reflect anywhere you see a difference, you could have 100x coverage 99 a 1 c and that will be a conflict, but may not be included in the SNP table (depends on your sig values). The snp table will also reflect any AA change within annotated regions.
If however you would still like to use the conflict table you can go to the right hand side panel, go to the heading Nucleotide info and choose to show your translation, you can pick any and all frames or tell it to translate ORF and CDS regions. If you have more questions feel free to contact me (shawn prince) at [email protected]

The_Roads · 12-10-2009, 09:16 AM

Conflicts and SNPs are not really the same thing. In CLC a conflict is any variation from the ref seq or consensus. A SNP is a conflict that has passed quality and position requirements.
The minimum coverage for a SNP is debated a lot and obviously depends on your coverage and the type of project you are doing. For absolute reads, 8 reads that pass SNP calling is often used but i dont think there is really a gold standard, % all depends on your coverage. I would be very careful altering SNP quality or position criteria too much, you could easily end up with junk. CLC has excellent help on how SNPs are called. Vector and quality trimming and removal of broken reads should also improve the quality of your assembly before SNP detection. Gapped and ungapped assemblies can also give different SNP calls, particularly when there are conflicts between ref seqs and your consensus seqs. What type of project are you doing?

johnny · 12-10-2009, 08:54 AM

Yes, this is what I did before, but I also had numerous "conflicts" in the consensus sequence but no hit in the SNP detection. After changing the sensitivity I now found these variation as well. So thanks for bringing me onto the right track again.
What do you think is the minimum coverage to call a SNP ?

The_Roads · 12-10-2009, 08:12 AM

Hi Johnny, I assume you are using CLCGWB. if so the conflict table is not the place to look. you should run a snp detection. if you have an annotated ref seq then the table will present you with all the amino acid changes.

johnny · 12-10-2009, 08:10 AM

Hey there,

one more newbie here

I have a probably easy to answer question but can't find the solution on my own....
How can I see the corresponding protein sequence to a nucleotide sequence ? In more detail, after a reference assembly, I would like to click through the variations with the "Find Conflict" button and instantly see if the protein sequence is affected as well.

Thanks for your help!

smprince18 · 09-10-2009, 04:18 AM

The Roads,

Glad to hear that your are enjoying your experience with CLC Genomics WB. Let me know if you need anything.

Shawn M Prince

Disclaimer I work at CLC bio

smprince18 · 09-10-2009, 04:14 AM

Disclaimer I work for CLC bio

Polsum,

Once you Blast your sequences within CLC WB, you will be shown a graphic view of the results. If you look in the lower left hand corner of the working are you will see a table view. Once this is open you will see a default group of columns, Please note in the right hand side panel you can toggle on and off the columns you want to see.. You will be able to look at the direction of the results. Also if you are using this to reference map your reads we graphically show orientation (red = reverse read, green forward) Also the count of forward and reverse reads can be found in the contig report. Please let me know if this straightens anything up for you. If you would like I can be contacted at the CLC Boston office 617-444-8765.

Shawn

Disclaimer I work for CLC bio

Topics	Statistics	Last Post
Bacterial Timeline Study Suggests Oxygen Use Preceded Photosynthesis by seqadmin Started by seqadmin, Today, 12:59 PM	0 responses 6 views 0 reactions	Last Post by seqadmin Today, 12:59 PM
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 8 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM

Seqanswers Leaderboard Ad

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News