Does anyone have an idea how long it takes to perform a single end assembly with CLC assembly cell 3.2.2. on 24 Gbases of data using quadcore with 16 GB or RAM.
P.S. I know what they claim on the company website, I just like to hear about experiences of an unbiased user...
Seqanswers Leaderboard Ad
Collapse
X
-
Confirmation
Originally posted by amhein View PostHi syambmed,
Here are a few comments to your questions:
1: I think it would probably be preferable to include both control and cell line reads in your de novo assembly.
2: If the box plots look ok there is probably not a need for normalizing your data. Also, there are two types of statistical tests available. The Gaussian based tests assume a continuous expression measure (such as the RPKM) and require replicates for each condition. The proportion-based tests compare counts (such as a read count, e.g 'Total gene reads'), and can work with or without replicates. As the proportion based tests compare proportions they implicitly normalize samples, so you should not use them on normalized data.
3: The RNA-seq analysis in the Genomics Workbench allows you to work either (a) with an annotated genome or (b) with a list of reference sequences (e.g. ESTs). Option (b) would be the one you use when you assemble against some reference sequences that you found in your de novo analysis. Option (a) requires that you have a genome sequence which has 'gene', and possible 'mRNA' annotations. If only gene annotations are available the reads will be assembled against the gene regions. If also 'mRNA' annotations are available, you can choose between the 'Eukaryote' and 'Prokaryote' modes. If you choose 'Eukaryote' reads will also be assembled against transcripts. If you do not have mRNA annotations on you reference sequence for the cat the 'prokaryote' option is the only available option. But it will make sense to use this option - it is our name for the option that is a bit misleading - sorry.
Hope this helps. If you need further assistance please contact our Support people.
Anne-Mette (CLC developer)
On behalf of Roald
Dear Anne-Mette,
Thank you for your reply. Your reply has shed some light in my tunnel..haha..
Can I confirm these with you.
1. So, it is preferable to do 'de novo-ed control reads' vs 'de novo-ed treated reads' RNA seq rather than control (raw reads) vs treated (raw reads) RNA seq..? I tried doing 'raw control' vs 'raw treated' reads once but my computer freezed after 10 hours with only 1 percent progress. (16 core, 47 gb RAM, control=48 million reads vs 50 million reads)
2. I don't have any replicates..just 1 control and 1 treated sequenced. Thus, based on your suggestion, proportion-based tests are the most appropriate.
3. The 2x cat's genome has mRNA annotations. So, I don't have problem with this.
I found that CLC GWB is very user friendly especially for a newbie like me. Keep up the good work.
Leave a comment:
-
-
Hi syambmed,
Here are a few comments to your questions:
1: I think it would probably be preferable to include both control and cell line reads in your de novo assembly.
2: If the box plots look ok there is probably not a need for normalizing your data. Also, there are two types of statistical tests available. The Gaussian based tests assume a continuous expression measure (such as the RPKM) and require replicates for each condition. The proportion-based tests compare counts (such as a read count, e.g 'Total gene reads'), and can work with or without replicates. As the proportion based tests compare proportions they implicitly normalize samples, so you should not use them on normalized data.
3: The RNA-seq analysis in the Genomics Workbench allows you to work either (a) with an annotated genome or (b) with a list of reference sequences (e.g. ESTs). Option (b) would be the one you use when you assemble against some reference sequences that you found in your de novo analysis. Option (a) requires that you have a genome sequence which has 'gene', and possible 'mRNA' annotations. If only gene annotations are available the reads will be assembled against the gene regions. If also 'mRNA' annotations are available, you can choose between the 'Eukaryote' and 'Prokaryote' modes. If you choose 'Eukaryote' reads will also be assembled against transcripts. If you do not have mRNA annotations on you reference sequence for the cat the 'prokaryote' option is the only available option. But it will make sense to use this option - it is our name for the option that is a bit misleading - sorry.
Hope this helps. If you need further assistance please contact our Support people.
Anne-Mette (CLC developer)
On behalf of Roald
Leave a comment:
-
-
Hi Roald,
I am new to bioinformatics and have little bioinfo knowledge.
I got treated and control transcriptome data from a cell line from a cat. I want to find differentially express genes between the two. I am using CLC Genomic workbench4.5.1 for analysis. I have several questions I hope you can help. For finding differently gene expression I did de novo assembly on control reads and then map my treated reads back to assembled control reads by RNA seq. this way de novo control act as the reference.
Is this the right way to do it..? or should I de novo treated reads also before mapping back to de novo control reads..?
after I did that I create box plot for quality control.the mean line is on the same level.
so, should I conduct normalization..? if yes, the software provide 3 ways to normalize data which are by scaling, quantile and reads per million.
which one should I choose..? I read the reads per million is the suitable one for RNA high throughput sequencing data.
or should I use reference gene like GAPDH or beta-actin expression value for normalization..? if yes, how do I do it using this software..?
FYI, cat has 2x annotated genome and 3x genome without annotation. I already did rna seq analysis for control and treated reads with these genome to find out the genes.
The problem is I dont know how to compare control and treated bcoz when I want to compare them by rna seq but the software always tick prokaryote instead of eukaryote. I read your CLC bio tutorial on rna seq but still have some confusion about this.
Help me..huhu
Thank you.
Leave a comment:
-
-
Hi didymos,
I reckon you want to do de-novo? I wouldn't have the values for a full human genome but 500MBases to 1GBase should run with <100GBytes of RAM with CLC . The recommended RAM size is currently somewhere around 70Gbytes, and we never went over 100Gbytes RAM usage when we assemble up to 1Gigabase (non human genomes though).
Best Wishes
Leave a comment:
-
-
Hi,
I am wondering how big amount of RAM you need to assembly long contigs from human genome sequencing with short Illumina reads. On the CLC web page is writen that you need much less then in case of SOAP which was used in assembly of Panda genome. To get good coverage you need about 150-200Gb of reads (>50x coverage). In Panda genome project they used supercomputer with 512GB RAM....
Leave a comment:
-
-
Originally posted by smprince18 View PostIf however you would still like to use the conflict table you can go to the right hand side panel, go to the heading Nucleotide info and choose to show your translation, you can pick any and all frames or tell it to translate ORF and CDS regions. If you have more questions feel free to contact me (shawn prince) at [email protected].
Thank you both for the quick reply!
Leave a comment:
-
-
************************disclaimer I work for CLC bio ***********************
Johnny,
The Roads gave you a good answer to run the SNP detection, b/c you will have set significant values for the SNP. The conflict table will reflect anywhere you see a difference, you could have 100x coverage 99 a 1 c and that will be a conflict, but may not be included in the SNP table (depends on your sig values). The snp table will also reflect any AA change within annotated regions.
If however you would still like to use the conflict table you can go to the right hand side panel, go to the heading Nucleotide info and choose to show your translation, you can pick any and all frames or tell it to translate ORF and CDS regions. If you have more questions feel free to contact me (shawn prince) at [email protected]
Leave a comment:
-
-
Conflicts and SNPs are not really the same thing. In CLC a conflict is any variation from the ref seq or consensus. A SNP is a conflict that has passed quality and position requirements.
The minimum coverage for a SNP is debated a lot and obviously depends on your coverage and the type of project you are doing. For absolute reads, 8 reads that pass SNP calling is often used but i dont think there is really a gold standard, % all depends on your coverage. I would be very careful altering SNP quality or position criteria too much, you could easily end up with junk. CLC has excellent help on how SNPs are called. Vector and quality trimming and removal of broken reads should also improve the quality of your assembly before SNP detection. Gapped and ungapped assemblies can also give different SNP calls, particularly when there are conflicts between ref seqs and your consensus seqs. What type of project are you doing?
Leave a comment:
-
-
Yes, this is what I did before, but I also had numerous "conflicts" in the consensus sequence but no hit in the SNP detection. After changing the sensitivity I now found these variation as well. So thanks for bringing me onto the right track again.
What do you think is the minimum coverage to call a SNP ?
Leave a comment:
-
-
Hi Johnny, I assume you are using CLCGWB. if so the conflict table is not the place to look. you should run a snp detection. if you have an annotated ref seq then the table will present you with all the amino acid changes.
Leave a comment:
-
-
Hey there,
one more newbie here
I have a probably easy to answer question but can't find the solution on my own....
How can I see the corresponding protein sequence to a nucleotide sequence ? In more detail, after a reference assembly, I would like to click through the variations with the "Find Conflict" button and instantly see if the protein sequence is affected as well.
Thanks for your help!
Leave a comment:
-
-
The Roads,
Glad to hear that your are enjoying your experience with CLC Genomics WB. Let me know if you need anything.
Shawn M Prince
Disclaimer I work at CLC bio
Leave a comment:
-
-
Disclaimer I work for CLC bio
Polsum,
Once you Blast your sequences within CLC WB, you will be shown a graphic view of the results. If you look in the lower left hand corner of the working are you will see a table view. Once this is open you will see a default group of columns, Please note in the right hand side panel you can toggle on and off the columns you want to see.. You will be able to look at the direction of the results. Also if you are using this to reference map your reads we graphically show orientation (red = reverse read, green forward) Also the count of forward and reverse reads can be found in the contig report. Please let me know if this straightens anything up for you. If you would like I can be contacted at the CLC Boston office 617-444-8765.
Shawn
Disclaimer I work for CLC bio
Leave a comment:
-
Latest Articles
Collapse
-
by seqadmin
The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...-
Channel: Articles
03-24-2025, 11:48 AM -
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-20-2025, 05:03 AM
|
0 responses
49 views
0 reactions
|
Last Post
by seqadmin
03-20-2025, 05:03 AM
|
||
Started by seqadmin, 03-19-2025, 07:27 AM
|
0 responses
57 views
0 reactions
|
Last Post
by seqadmin
03-19-2025, 07:27 AM
|
||
Started by seqadmin, 03-18-2025, 12:50 PM
|
0 responses
50 views
0 reactions
|
Last Post
by seqadmin
03-18-2025, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
201 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
Leave a comment: