Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gringer
    replied
    With a file looking like that, just what you've put should work (i.e. changing the header value to 'TRUE'):

    data <-read.table(file="~/q20snpref/illusmp454merCbed",sep="\t",header=T)
    Is there a particular reason why the reading of the table was killed? How big is the input file? How much memory does your computer have? Did you break out of it manually? How long did it take before the 'Killed' message appeared?

    Leave a comment:


  • msutada@gmail.com
    replied
    hi gringer,

    Thank you very much for suggestion.
    I got this with this code.
    > readLines(con = "~/q20snpref/illusmp454merCbed", n = 5)
    [1] "Scaffold\tsca_position\tcoverage" "Scaffold1\t1\t0"
    [3] "Scaffold1\t2\t0" "Scaffold1\t3\t0"
    [5] "Scaffold1\t4\t0"

    I have also tried
    data <-read.table(file="~/q20snpref/illusmp454merCbed",sep="\t",header=T) and I got
    > data <-read.table(file="~/q20snpref/illusmp454merCbed",sep="\t",header=T)
    Killed
    ^@^@^Killed.

    Should I put # for the header line. Actually illusmp454merCbed is a text file contains 3 column separate by tab. I thought that to run R we need the column name to define the data.

    Any suggestion will be great.
    Best,

    Leave a comment:


  • gringer
    replied
    Based on the options you've given, "~/q20snpref/illusmp454merCbed" [note no extension] should be a headerless file with fields separated by tabs.

    The warning message suggests that you either have a header line in your file (or possibly a comment line), or a non-numeric line in the 'coverage' column.

    Just to make sure the input looks like what is expected, what is the output of this command [which displays the first 5 lines of a file]?
    Code:
    readLines(con = "~/q20snpref/illusmp454merCbed", n = 5)

    Leave a comment:


  • msutada@gmail.com
    replied
    Hi frymor,

    Thank you for the suggestion.
    I have fix the script and now I got another message from coverPlot.Rout.

    > data <-read.table(file="~/q20snpref/illusmp454merCbed",sep="\t",header=F)
    > colnames(data)<-c("Scaffold","sca_position","coverage")
    > depth<-mean(data[,"coverage"])
    Warning message:
    In mean.default(data[, "coverage"]) :
    argument is not numeric or logical: returning NA
    > #depth now has the mean (overall)coverage
    > #set the bin-size
    > window<-1001
    > rangefrom<-0
    > rangeto<-length(data[,"sca_position"])
    > data.1kb<-runmed(data[,"coverage"],k=window)
    > png(file="cov_1k.png",width=400,height=1000)
    coverPlot.Rout

    Any suggestion would be great.

    Best regards,
    Sutada

    Leave a comment:


  • frymor
    replied
    Originally posted by [email protected] View Post
    Hi Jonathan,
    > data <-read.table(file="./illusmp454merCbed",sep="\t",header=F)
    > colnames(data)<-c("Scaffold","sca_position","coverage")
    > depth<-mean(data[,"coverage'])
    + #depth now has the mean (overall)coverage
    + #set the bin-size
    + window<-1001
    + rangefrom<-0
    + rangeto<-length(data[,"sca_position"])
    You have two different types of quoting in coverage
    It should be like that:
    Code:
    depth<-mean(data[,[B][COLOR="Red"]"coverage"[/COLOR][/B]])
    and than you will also define the windows size and the ranges

    Leave a comment:


  • msutada@gmail.com
    replied
    Hi Jonathan,

    About R script above, I have tried and go these error. Any suggestion is great as I'm R beginner.


    > data <-read.table(file="./illusmp454merCbed",sep="\t",header=F)
    > colnames(data)<-c("Scaffold","sca_position","coverage")
    > depth<-mean(data[,"coverage'])
    + #depth now has the mean (overall)coverage
    + #set the bin-size
    + window<-1001
    + rangefrom<-0
    + rangeto<-length(data[,"sca_position"])
    Error: unexpected symbol in:
    "rangefrom<-0
    rangeto<-length(data[,"sca_position"
    > data.1kb<-runmed(data[,"coverage"],k=window)
    Error in as.integer(k) :
    cannot coerce type 'closure' to vector of type 'integer'
    > png(file="cov_1k.png",width=400,height=1000)
    > plot(x=data[rangefrom:rangeto,"sca_position"],y=data.1kb[rangefrom:rangeto],pch=".",cex1,xlab="bp position",ylab="depth",type="p")
    Error in `[.data.frame`(data, rangefrom:rangeto, "sca_position") :
    object 'rangefrom' not found
    > dev.off()

    Leave a comment:


  • fhb
    replied
    Originally posted by quinlana View Post
    Hi,
    As ewilbanks suggested, BEDTools will do this for you.



    If you want to compute coverage for "bins/windows" that march along the genome, you would use coverageBed. Let's say you've created a BED file called windows.bed for 10Kb windows across your genome and it looks like this (note BEDTools uses UCSC 0-based starts):
    chr1 0 10000
    chr1 9999 20000
    ...

    Now, you also have a bed file of sequence reads called reads.bed. The following command with calculate for each window in windows.bed:
    1) The number of reads in reads.bed that overlap the window
    2) Coverage density (i.e. the fraction of base pairs in window.bed that are covered by >= 1 read in reads.bed)

    coverageBed -a reads.bed -b windows.bed | sortBed -i stdin > windows.bed.coverage

    Sample output:
    <chr> <start> <end> <#reads> <# window bases covered> <windowSize> <fraction of window covered>
    chr1 0 10000 0 0 10000 0
    chr1 9999 20000 33 1000 10000 0.10

    I hope this helps. If you need a script to create a BED file with windows of a given size, just let me know.

    Best,
    Aaron
    Hi Dr. Quinlan,

    Thanks for the bedtools. I am using the genomeCoverageBed, but I would like to use the coverageBed.
    I am not sure if you have posted this somewhere else, but I would appreciate if you could provide this code that you offered that creates a BED file with intervals of a given size.
    Thanks in advance,
    Best
    Fernando

    Leave a comment:


  • rdeborja
    replied
    We've started using BED Tools as part of our pipeline and so far it's been good. We calculate 2 types of coverage, non-collapsed (total mapped reads x read length / genome size) and collapsed coverage (total mapped reads with PCR duplicates removed x read length / genome size).

    The tough part is identifying those reads aligning to multiple locations. What do people do in that case? Are people using uniquely aligned reads only and using a mappable genome size (different read lengths will change this number). We use Picard to collapse our reads. If you use BEDTools, you'll need to remove the duplicates since BEDTools doesn't seem to do a flag check for the read.

    Leave a comment:


  • jordi
    replied
    thank you very much for your answer westerman. You're right, I have no idea about the transcriptome size. It's a tissue specific from a non-model specie. I got 62M from a 454 pyrosequencing process from 8 different organisms (same plate) and I suspect that there were too much samples for a single plate. I would like to guess the ideal ratio sample/plate so I thought that coverage could be a good way to estimate this.

    Leave a comment:


  • westerman
    replied
    Originally posted by jamal View Post

    I'm new in this field but I was searching about x covarage to calculate it for my reads and found this qoute. I would like to ask you that in your example 300M* 50 is 15G, so what is the point for this caculation?
    And finally catching up with Jamal's question. Hum. It looks like I made another Monday morning calculation mistake. 300M reads times 50 would be indeed be 15 Gbases or 15000 Mbases. Then the 'X' coverage in my example is 15000/150 or 100x coverage.

    Oh well, numbers being wrong or not, my original post was about 'X coverage' (where you divide how many bases you sequenced by the size of the genome/transcriptome/whatever you wish to sequence) versus 'percentage coverage' (which is how many bases you end up after mapping/assembly by the size of the genome/transcriptome/whatever).

    This doesn't seem to be complex concept to me. Of course determining the size of what you wish to sequence can be complex. And even after you get the average coverage -- what does it really mean? The average does not help if your region of interest does not have good coverage in that region. Variance of coverage can be a helpful metric.

    Leave a comment:


  • westerman
    replied
    jordi:

    6 reads per contig with an average read length of 209 will tell you only that you have 1254 bases of data in a contig. What you really need is know is how many base *should* be covered. For example if your contig is suppose to be 1000 bases then your coverage is 1254/1000 or about 1.25 coverage. If your contig is suppose to be 300 bases then you have 1254/300 or about 4x coverage.

    Unfortunately it appears that you have no idea of how many bases you are suppose to be covering. So you will have to take a wild guess. Fortunately inter-species transcriptomes tend to be a bit more uniform in size than inter-species genomes. I suggest you take the transcriptome size of a related species and use that. It won't be exact but it probably will not be off by more than a factor of 2.

    To be clear what you want to calculate is:

    transcriptome_size divided by (#_of_reads times avg_length_of_reads)

    Leave a comment:


  • jordi
    replied
    Hi all,
    I would like to know the coverage of a 454 tissue specific transcriptome of a non-model specie, so I have not a reference sequence, neither EST nor genome sequence. I understood coverage as number of reads covering a given region (as dePhi said a year ago), that is, reads / bp. If I have a mean of 6 reads per contig and a mean of 209 bp per read, it is correct to say that I have a ~0.03X (6/209) coverage??
    Thanks a lot!

    Leave a comment:


  • mattanswers
    replied
    Using Tablet, a genome viewer program, you can export a file in which the number of reads at each base is printed out. However, the base number is not printed out, just the number of reads.

    Is there a graphing program that could take this txt file with the number of reads at each base and graph it as a line graph ? Or, if I added the appropriate base number along with the number of reads at each base, then is there a graphics program which could print out a line graph from this txt file ?

    Leave a comment:


  • jamal
    replied
    Originally posted by westerman View Post
    That is what I get for posting early Monday morning.

    'X' coverage is raw bases divided by genome size. So the above should be 1.5 Gbases / 150 Mbases or 10X. I will correct my original post.



    I believe that all published C-values for for haploid genomes. Although if someone knows for certain then please chime in.
    thank you for your answer

    I'm new in this field but I was searching about x covarage to calculate it for my reads and found this qoute. I would like to ask you that in your example 300M* 50 is 15G, so what is the point for this caculation?

    Leave a comment:


  • ohofmann
    replied
    Somehow missed Martin's post, strange. Thanks for the pointer! And yes, the UCR pages are a great starting point.

    -- Oliver

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
40 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 09:45 AM
0 responses
205 views
0 likes
Last Post seqadmin  
Working...
X