Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Irsan_Kooi
    replied
    Does anyone have an idea how long it takes to perform a single end assembly with CLC assembly cell 3.2.2. on 24 Gbases of data using quadcore with 16 GB or RAM.

    P.S. I know what they claim on the company website, I just like to hear about experiences of an unbiased user...

    Leave a comment:


  • syambmed
    replied
    Confirmation

    Originally posted by amhein View Post
    Hi syambmed,

    Here are a few comments to your questions:

    1: I think it would probably be preferable to include both control and cell line reads in your de novo assembly.

    2: If the box plots look ok there is probably not a need for normalizing your data. Also, there are two types of statistical tests available. The Gaussian based tests assume a continuous expression measure (such as the RPKM) and require replicates for each condition. The proportion-based tests compare counts (such as a read count, e.g 'Total gene reads'), and can work with or without replicates. As the proportion based tests compare proportions they implicitly normalize samples, so you should not use them on normalized data.

    3: The RNA-seq analysis in the Genomics Workbench allows you to work either (a) with an annotated genome or (b) with a list of reference sequences (e.g. ESTs). Option (b) would be the one you use when you assemble against some reference sequences that you found in your de novo analysis. Option (a) requires that you have a genome sequence which has 'gene', and possible 'mRNA' annotations. If only gene annotations are available the reads will be assembled against the gene regions. If also 'mRNA' annotations are available, you can choose between the 'Eukaryote' and 'Prokaryote' modes. If you choose 'Eukaryote' reads will also be assembled against transcripts. If you do not have mRNA annotations on you reference sequence for the cat the 'prokaryote' option is the only available option. But it will make sense to use this option - it is our name for the option that is a bit misleading - sorry.

    Hope this helps. If you need further assistance please contact our Support people.

    Anne-Mette (CLC developer)
    On behalf of Roald
    ==============================================================

    Dear Anne-Mette,

    Thank you for your reply. Your reply has shed some light in my tunnel..haha..

    Can I confirm these with you.

    1. So, it is preferable to do 'de novo-ed control reads' vs 'de novo-ed treated reads' RNA seq rather than control (raw reads) vs treated (raw reads) RNA seq..? I tried doing 'raw control' vs 'raw treated' reads once but my computer freezed after 10 hours with only 1 percent progress. (16 core, 47 gb RAM, control=48 million reads vs 50 million reads)

    2. I don't have any replicates..just 1 control and 1 treated sequenced. Thus, based on your suggestion, proportion-based tests are the most appropriate.

    3. The 2x cat's genome has mRNA annotations. So, I don't have problem with this.

    I found that CLC GWB is very user friendly especially for a newbie like me. Keep up the good work.

    Leave a comment:


  • Roald
    replied
    Thanks!!

    Thanks for your help Anne-Mette!!

    Leave a comment:


  • amhein
    replied
    Hi syambmed,

    Here are a few comments to your questions:

    1: I think it would probably be preferable to include both control and cell line reads in your de novo assembly.

    2: If the box plots look ok there is probably not a need for normalizing your data. Also, there are two types of statistical tests available. The Gaussian based tests assume a continuous expression measure (such as the RPKM) and require replicates for each condition. The proportion-based tests compare counts (such as a read count, e.g 'Total gene reads'), and can work with or without replicates. As the proportion based tests compare proportions they implicitly normalize samples, so you should not use them on normalized data.

    3: The RNA-seq analysis in the Genomics Workbench allows you to work either (a) with an annotated genome or (b) with a list of reference sequences (e.g. ESTs). Option (b) would be the one you use when you assemble against some reference sequences that you found in your de novo analysis. Option (a) requires that you have a genome sequence which has 'gene', and possible 'mRNA' annotations. If only gene annotations are available the reads will be assembled against the gene regions. If also 'mRNA' annotations are available, you can choose between the 'Eukaryote' and 'Prokaryote' modes. If you choose 'Eukaryote' reads will also be assembled against transcripts. If you do not have mRNA annotations on you reference sequence for the cat the 'prokaryote' option is the only available option. But it will make sense to use this option - it is our name for the option that is a bit misleading - sorry.

    Hope this helps. If you need further assistance please contact our Support people.

    Anne-Mette (CLC developer)
    On behalf of Roald

    Leave a comment:


  • syambmed
    replied
    Hi Roald,

    I am new to bioinformatics and have little bioinfo knowledge.

    I got treated and control transcriptome data from a cell line from a cat. I want to find differentially express genes between the two. I am using CLC Genomic workbench4.5.1 for analysis. I have several questions I hope you can help. For finding differently gene expression I did de novo assembly on control reads and then map my treated reads back to assembled control reads by RNA seq. this way de novo control act as the reference.

    Is this the right way to do it..? or should I de novo treated reads also before mapping back to de novo control reads..?

    after I did that I create box plot for quality control.the mean line is on the same level.

    so, should I conduct normalization..? if yes, the software provide 3 ways to normalize data which are by scaling, quantile and reads per million.

    which one should I choose..? I read the reads per million is the suitable one for RNA high throughput sequencing data.

    or should I use reference gene like GAPDH or beta-actin expression value for normalization..? if yes, how do I do it using this software..?

    FYI, cat has 2x annotated genome and 3x genome without annotation. I already did rna seq analysis for control and treated reads with these genome to find out the genes.

    The problem is I dont know how to compare control and treated bcoz when I want to compare them by rna seq but the software always tick prokaryote instead of eukaryote. I read your CLC bio tutorial on rna seq but still have some confusion about this.

    Help me..huhu

    Thank you.

    Leave a comment:


  • usad
    replied
    Hi didymos,

    I reckon you want to do de-novo? I wouldn't have the values for a full human genome but 500MBases to 1GBase should run with <100GBytes of RAM with CLC . The recommended RAM size is currently somewhere around 70Gbytes, and we never went over 100Gbytes RAM usage when we assemble up to 1Gigabase (non human genomes though).

    Best Wishes

    Leave a comment:


  • didymos
    replied
    Hi,

    I am wondering how big amount of RAM you need to assembly long contigs from human genome sequencing with short Illumina reads. On the CLC web page is writen that you need much less then in case of SOAP which was used in assembly of Panda genome. To get good coverage you need about 150-200Gb of reads (>50x coverage). In Panda genome project they used supercomputer with 512GB RAM....

    Leave a comment:


  • johnny
    replied
    Originally posted by smprince18 View Post
    If however you would still like to use the conflict table you can go to the right hand side panel, go to the heading Nucleotide info and choose to show your translation, you can pick any and all frames or tell it to translate ORF and CDS regions. If you have more questions feel free to contact me (shawn prince) at [email protected]
    This is what I was actually looking for in the beginning. But I think there are more questions rising up while digging deeper into that subject. So I'll probaly take your offer and write an email instead of bothering the community .
    Thank you both for the quick reply!

    Leave a comment:


  • smprince18
    replied
    ************************disclaimer I work for CLC bio ***********************

    Johnny,
    The Roads gave you a good answer to run the SNP detection, b/c you will have set significant values for the SNP. The conflict table will reflect anywhere you see a difference, you could have 100x coverage 99 a 1 c and that will be a conflict, but may not be included in the SNP table (depends on your sig values). The snp table will also reflect any AA change within annotated regions.
    If however you would still like to use the conflict table you can go to the right hand side panel, go to the heading Nucleotide info and choose to show your translation, you can pick any and all frames or tell it to translate ORF and CDS regions. If you have more questions feel free to contact me (shawn prince) at [email protected]

    Leave a comment:


  • The_Roads
    replied
    Conflicts and SNPs are not really the same thing. In CLC a conflict is any variation from the ref seq or consensus. A SNP is a conflict that has passed quality and position requirements.
    The minimum coverage for a SNP is debated a lot and obviously depends on your coverage and the type of project you are doing. For absolute reads, 8 reads that pass SNP calling is often used but i dont think there is really a gold standard, % all depends on your coverage. I would be very careful altering SNP quality or position criteria too much, you could easily end up with junk. CLC has excellent help on how SNPs are called. Vector and quality trimming and removal of broken reads should also improve the quality of your assembly before SNP detection. Gapped and ungapped assemblies can also give different SNP calls, particularly when there are conflicts between ref seqs and your consensus seqs. What type of project are you doing?

    Leave a comment:


  • johnny
    replied
    Yes, this is what I did before, but I also had numerous "conflicts" in the consensus sequence but no hit in the SNP detection. After changing the sensitivity I now found these variation as well. So thanks for bringing me onto the right track again.
    What do you think is the minimum coverage to call a SNP ?

    Leave a comment:


  • The_Roads
    replied
    Hi Johnny, I assume you are using CLCGWB. if so the conflict table is not the place to look. you should run a snp detection. if you have an annotated ref seq then the table will present you with all the amino acid changes.

    Leave a comment:


  • johnny
    replied
    Hey there,

    one more newbie here
    I have a probably easy to answer question but can't find the solution on my own....
    How can I see the corresponding protein sequence to a nucleotide sequence ? In more detail, after a reference assembly, I would like to click through the variations with the "Find Conflict" button and instantly see if the protein sequence is affected as well.

    Thanks for your help!

    Leave a comment:


  • smprince18
    replied
    The Roads,

    Glad to hear that your are enjoying your experience with CLC Genomics WB. Let me know if you need anything.

    Shawn M Prince

    Disclaimer I work at CLC bio

    Leave a comment:


  • smprince18
    replied
    Disclaimer I work for CLC bio

    Polsum,

    Once you Blast your sequences within CLC WB, you will be shown a graphic view of the results. If you look in the lower left hand corner of the working are you will see a table view. Once this is open you will see a default group of columns, Please note in the right hand side panel you can toggle on and off the columns you want to see.. You will be able to look at the direction of the results. Also if you are using this to reference map your reads we graphically show orientation (red = reverse read, green forward) Also the count of forward and reverse reads can be found in the contig report. Please let me know if this straightens anything up for you. If you would like I can be contacted at the CLC Boston office 617-444-8765.

    Shawn

    Disclaimer I work for CLC bio

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin


    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    ...
    02-08-2024, 06:33 AM
  • seqadmin
    The 3D Genome: New Technologies and Emerging Insights
    by seqadmin


    The study of three-dimensional (3D) genomics explores the spatial structure of genomes and their role in processes like gene expression and DNA replication. By employing innovative technologies, researchers can study these arrangements to discover their role in various biological processes. Scientists continue to find new ways in which the organization of DNA is involved in processes like development1 and disease2.

    Basic Organization and Structure
    Understanding...
    01-22-2024, 03:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:52 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 08:57 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-14-2024, 09:19 AM
0 responses
48 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-12-2024, 03:37 PM
0 responses
422 views
0 likes
Last Post seqadmin  
Working...
X