Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coverage Calculation for Whole Genome Sequencing on GA II X

    Hello Everybody. How does one calculate the coverage on a paired end Illumina GA II X run? My fast qc statistics for the reads are as follows:

    Foward reads: 1,95,71,148. No of Bases: 1,05,68,41,992 file size: 3.9 GB

    Reverse reads: 1,95,71,148 No of Bases: 1,05,68,41,992 3.9 GB


    This is a mouse genome we have sequenced. The size of the mouse genome in bases is: 2,7,1,69,65,481 which corresponds to 2.5 GB.

    Do i follow the simple calculation of C = LN / G
    • C stands for coverage
    • G is the haploid genome length
    • L is the read length
    • N is the number of reads

    Doing this results in a measly 0.77 X coverage which is too low , in which case it is unacceptable for the experiment. am i calculating this correctly?

    I cannot map my reads to a reference mouse genome to use coveragebed or any other function/tool, because of the nature of my experiment which wont allow a large no. of reads to map. So finding the uniquely mapped reads is out of question. I also cannot do a de novo assembly because of computer memory constraints. Any thoughts on this are appreciated. Thanks in advance!

  • #2
    That file size sounds about right for exome sequencing but it is likely very low for WGS but you did mention that your experiment is different to regular sequencing projects that are done for variant calling.

    It might be inaccurate to calculate coverage on the whole genome size if your project is something like variant validation.

    If you could specify more about the goal of the project someone can make a better guess.

    Comment


    • #3
      Like Vivek said, these numbers are definitely comparable to an Exome seq experiment and thus very small for a regular WGS experiment for variant calling which expects around 30X coverage of the 3GB human genome. It will be a good idea for you to provide some more details about what is that you are expecting to get out of this experiment.

      Comment


      • #4
        Thanks a ton guys for a quick reply!

        One thing i would like to point out like i posted in my earlier thread, which everybody seems to have missed out is that this is a WGS of a MOUSE GENOME and not a HUMAN GENOME.

        This is basically a transfection experiment wherein we have transfected human oncogenes into a NIH 3T3 cell line, which is established from Murien fibroblast cells. The whole genome for this NIH 3t3 cell line was sequenced. I was not involved in the actual sequencing and joined in at a much later stage. I have a feeling that The sequencing company where we got the cell lines sequenced has taken us for a ride, and not given us quality data, seeing as to how you seem to agree with me that this is characteristic of an whole exome and not a whole genome. I do not think that the nature of my experiment will actually affect the sequencing, as in effect we have done a WGS of a cell line.
        Last edited by ron128; 01-10-2013, 12:07 AM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X