Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • westerman
    replied
    Originally posted by Malabady View Post
    hi all;
    some papers mention X coverage and some says % coverage, is there any difference between both? we get % coverage by dividing total (reads*length) by genome size. if these two terms are different, how the X coverage is calculated? do we use haploid genome size instead of genome size?
    From my understanding yes they are different and what you are calculating is the 'X' coverage. I.e., given the number of raw bases sequenced how many times (or X) does the sequencing potentially cover the genome.

    % coverage is how well the genome is actually covered after all mapping and assembly is done.

    As an example let's say we have 300M reads of 50 bases or 1.5 Gbase total. Our genome is 150M bases. After mapping (or assembly) we have a bunch of non-overlapping contigs that have 100M bases total.

    So our 'X coverage' is 10X (1.5 Gbases / 150 Mbases)
    Our '% coverage' is 66.6% (100 Mbases / 150 Mbases)


    One way to think about this is that percentages generally range from 0% to 100% and so having a percentage greater that 100 can be confusing.


    I use the haploid genome size or more specifically the C-value times 965Mbases/pg.
    Last edited by westerman; 07-13-2009, 10:25 AM. Reason: Corrected my division order. Color me red-faced!

    Leave a comment:


  • Malabady
    replied
    hi all;
    some papers mention X coverage and some says % coverage, is there any difference between both? we get % coverage by dividing total (reads*length) by genome size. if these two terms are different, how the X coverage is calculated? do we use haploid genome size instead of genome size?

    Leave a comment:


  • Jonathan
    replied
    There's an easy way to do it using output from the maq-pipeline:

    Code:
    ...
    [maq-steps]
    ...
    maq pileup -p [your bfa] [your map] > pileup.out
    
    cut -f 2-4 pileup.out > croppedpileup.out
    
    #then launch R
    R
    #following are R commands
    data <-read.table(file="croppedpileup.out",sep="\t",header=F)
    colnames(data)<-c("pos","consensus","coverage")
    depth<-mean(data[,"coverage"])
    # depth now has the mean (overall)coverage
    #set the bin-size
    window<-101
    rangefrom<-0
    rangeto<-length(data[,"pos"])
    data.smoothed<-runmed(data[,"coverage"],k=window)
    png(file="cov_out.png",width=1900,height=1000)
    plot(x=data[rangefrom:rangeto,"pos"],y=data.smoothed[rangefrom:rangeto],pch=".", cex=1,xlab="bp position",ylab="depth",type="l")
    dev.off()
    Feel free to leave R afterwards,
    you should (unless some error occured) find a PNG-file containing the coverageplot in your directory;
    Of course window can be changed (needs to be odd-numbered, though)
    as well as rangefrom and rangeto values.

    Edit:
    Of course when using many sequences in maq,
    you will most likely be interessted in keeping the first column of the pileup.out.
    However, this will leed to much bigger files (longer R-load-times), and will require R-handling
    as you probably want to slice and dice and plot them by sequence-ID I take it?

    Any questions?
    Best
    -Jonathan
    Last edited by Jonathan; 06-09-2009, 02:20 AM.

    Leave a comment:


  • strob
    replied
    Hello all,

    instead of having a complete chromosome/genome as a reference, I use many gene-scale sequences as my reference. I now want to see what the coverage per base is when I map my solexa reads against these many reference sequences. Is there already a tool out there that can do the job?
    Can programs like soap, bowtie, ... provide me this type of information?
    Is it also possible to do a blast (with stringent parameters) and than parse these blast results?

    Any help/comments are more than welcome

    Leave a comment:


  • v_kisand
    replied
    I do not remember exactly but some time ago I messed with velvet I think it calculated coverage (I must admit I do not remember was that single base coverage)
    and there was a tutorial how to get nice graphs using R.

    Originally posted by ewilbanks View Post
    Hi! I was working on just this issue a while back and was surprised by the relative lack of tools. Binning should work just fine. I'd recommend Aaron Quinlan's "BEDTools". the CoverageBed and genomeCoverageBed seem applicable, though I haven't used them yet.



    A friend of mine wrote a nifty script in R for me that calculates the coverage at each base-pair across the genome, using input of a text file with read genome coordinates. I'd be happy to send it to you, if that seems helpful. The data from this is quite noisy and usually needs smoothing of some sort (I did rolling means, using the R "zoo package"). I don't know what genome size you're working with, but I was using this on microbial genomes (~4 Mb) and the program runs in ~20-30 min.

    Cheers,
    Lizzy

    Leave a comment:


  • ewilbanks
    replied
    Hi! I was working on just this issue a while back and was surprised by the relative lack of tools. Binning should work just fine. I'd recommend Aaron Quinlan's "BEDTools". the CoverageBed and genomeCoverageBed seem applicable, though I haven't used them yet.



    A friend of mine wrote a nifty script in R for me that calculates the coverage at each base-pair across the genome, using input of a text file with read genome coordinates. I'd be happy to send it to you, if that seems helpful. The data from this is quite noisy and usually needs smoothing of some sort (I did rolling means, using the R "zoo package"). I don't know what genome size you're working with, but I was using this on microbial genomes (~4 Mb) and the program runs in ~20-30 min.

    Cheers,
    Lizzy

    Leave a comment:


  • arendon
    replied
    Terribly sorry, I did make little sense.

    I would like to know at each base along the genome how many reads saw that base. Alternatively, one can ask not at the single base level but at some interval distance, say 10bp. The output could be a wig file. This does not sound terribly complicated to do if the reads are sorted. I am more wondering whether there are tools that already do this.

    Many thanks,

    a

    Leave a comment:


  • westerman
    replied
    Maybe it is early in the morning but I am not able to parse the phrase "I would like to sequence coverage along the genome from aligned reads". Perhaps you meant "... like to determine sequence coverage ...". In which case, yes, binning would work. Or just taking the number of reads times the average length of the reads all divided the genome length.

    Many of the alignment programs will give you something that you can toss into a spreadsheet and come up with a fancy graph. Basically via binning with a bin size of 1.

    Really it all depends on exactly what you want in the end. A single "X coverage" number? A graph? A mean with standard deviation? In any case the computational portion does not seem that complex to me.

    Leave a comment:


  • arendon
    started a topic How to calculate coverage

    How to calculate coverage

    Hello,

    I would like to sequence coverage along the genome from aligned reads. Any suggestions or tools to do this efficiently? Maybe also being able to choose different bin sizes?

    Many thanks,

    a

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
40 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 09:45 AM
0 responses
205 views
0 likes
Last Post seqadmin  
Working...
X