Unconfigured Ad

simonandrews · 02-11-2013, 09:21 AM

I've just release SeqMonk v0.24.0. I've included the smoothing subtraction quantitation I described above which should help for DNase type analyses but there's also lots of other new stuff listed below:

Added the ability to export all probe reports in GFF format
Added a pipeline to detect antisense transcription from directional RNA-Seq libraries.
Added a system which can provide immediate feedback to submitted crash reports if they're ones we've seen before and for which we can offer useful feedback.
Added a chi-square based contingency test filter which is useful for bisulphite sequencing libraries (and possibly others too).
Added an ID field to reports for cases where the name of a feature isn't useful or unique
Added a probe length quantitation option
Added a probe name filter which allows you to specify a large list of names and selects probes which match any of them
Added an option to merge all transcripts in the RNA-Seq pipeline to create a single gene level measure of transcription
Changed the active store parser to a visible stores parse to allow the easy re-import of multiple datasets in a single operation
Added an option to generate raw counts to the RNA-Seq quantitation pipeline to allow for easy interfacing with tools such as DESeq which require this
Added a smoothing subtraction quantitation method which can be used to detect sudden local changes in quantitation
Added the ability to select the order of highlighted probe lists in the scatterplot

Some changes have also been made to address problems in previous versions:

We fixed a bug which would produce incorrect p-values following multiple testing correction, but only affected p-values which were initially very high (p>~0.3)
We fixed an unnecessary level of multiple testing correction in the intensity difference filter which meant that some candidates which could have been reported were not. Typically we see around a 10% increase in the number of candidates in the new correction method over the previous version.
We changed the behaviour of the BAM import filter for paired end data which were mapped with a spliced read mapper. We now show the second read of the pair with the same direction as the first read to indicate the direction of the fragment and preserve the direction in strand specific libraries.
The "load probes from file" probe generator has been removed. It was never very well supported and its functionality is better performed by importing the data into an annotation track and using the feature probe generator.
A couple of timing bugs were fixed which prevented the import of extra annotation on some linux installations.
In HiC analysis we have removed some optimisations in the testing which were leading to unrealistically low p-values for some interactions. We now test against the full set of possible interactions, only making an exception to correct for only cis interactions when all trans interactions have been specifically excluded.

simonandrews · 02-08-2013, 02:54 AM

I've just added a new smoothing subtraction quantitation which should do what you need to allow a better systematic analysis of the DNase data. Drop me an email and I can give you a test release containing this code so you can try it out before I put it into an official release.

simonandrews · 02-07-2013, 12:53 AM

Originally posted by mjp View Post

What I would like to achieve now, is to come up with some kind of systematic way of identifying the depleted regions within the enriched regions. These would correspond to the protected sites where my protein was bound to. I have attached an example of such region.

What you see there is DNase Hypersensitive Sites (DNase_HS), underlying protein biding site (PBS) and at the very bottom the sinlge base-paired probes created within DNase_HS. The underlying dip within the probes would ideally correspond to the depleted region (there might be another one to the right of the first one - between 400 and 500 bp in displayed region).

I have tried to use Z-score re-quantitation to see how different the probes are from the mean but that didn't yield anything informative at the moment.

At the moment I can't think of anything I could use in SeqMonk to annotate probes that have significantly lower values than surrounding probes within a window (which would be what I'm essentially looking for).

Is there any systematic way I could identify such short stretches of depletion in SeqMonk?

Thanks in advance for any input.

Looking at the result you have I think what you'd need would be a new quantitation normalisation method which would do a local subtraction of a smoothed value running through the data. This would remove the large scale effects of the enrichment and leave you with a measure of the local difference to the general enrichment level of the area. Once you had this you could then use the windowed replicate filter to find regions which had a value which was consistently different from 0 over whatever window size you chose. This would then find sets of adjacent depleted probes which would hopefully correspond to your binding sites.

Setting this up would need the addition of a new quantitation method, but it would basically be an adaption of the existing smoothing quantitation so it would be really easy to add. If you want to contact me off list ([email protected]) I can give you a development snapshot with this in to test, and if you could let me have some example data of this type it would be really helpful in making sure it's working properly.

simonandrews · 02-07-2013, 12:44 AM

Originally posted by sschmidt View Post

I have RNA-seq libraries made from cell lines that express a transgene, and was able to quantitate using the RPKM pipeline for all existing probes. Is it possible to design a probe for the transgene as well, if so how, and can that be done using the RPKM pipeline?

So I'm assuming that you're talking about a novel gene inserted into the main genome which you're also measuring in your data. If that's right then you'll need to find some way to represent the transgene in your genome. You could do this by modifying the existing genome sequence and inserting the novel sequence at the correct position, or you could take a shortcut and make a short extra fake chromosome which just contained the transgene sequence. You'd need to do this for the mapping stage as well as the downstream analysis since otherwise the hits to the transgene won't be mapped. The process for adding a new fake chromosome would be the same as for making a custom genome except that you'd just add the new dat file to the chromosomes in an existing assembly rather than making a new one.

Once you've done that then the transgene should show up the same as any other gene and it would just be a case of picking the value for that gene out of the full set of quantitated data.

sschmidt · 02-06-2013, 01:26 PM

I have RNA-seq libraries made from cell lines that express a transgene, and was able to quantitate using the RPKM pipeline for all existing probes. Is it possible to design a probe for the transgene as well, if so how, and can that be done using the RPKM pipeline?

mjp · 02-06-2013, 04:39 AM

DNase I footprints with SeqMonk

So I managed to come up with similar results with SeqMonk as with other software. The one I have in mind here is F-seq, which is fairly commonly used for analysis of DNase data.
In SeqMonk I have used contig probe generator to do this. Actually, SeqMonk returned more reasonable enriched regions that F-seq but many of them overlapped.

Furthermore I managed to relate my data in SeqMonk to the protein binding sites (PBS) I'm interested in.

What I would like to achieve now, is to come up with some kind of systematic way of identifying the depleted regions within the enriched regions. These would correspond to the protected sites where my protein was bound to. I have attached an example of such region.

What you see there is DNase Hypersensitive Sites (DNase_HS), underlying protein biding site (PBS) and at the very bottom the sinlge base-paired probes created within DNase_HS. The underlying dip within the probes would ideally correspond to the depleted region (there might be another one to the right of the first one - between 400 and 500 bp in displayed region).

I have tried to use Z-score re-quantitation to see how different the probes are from the mean but that didn't yield anything informative at the moment.

At the moment I can't think of anything I could use in SeqMonk to annotate probes that have significantly lower values than surrounding probes within a window (which would be what I'm essentially looking for).

Is there any systematic way I could identify such short stretches of depletion in SeqMonk?

Thanks in advance for any input.

Attached Files

example_of_footprint.png (11.7 KB, 14 views)

mjp · 01-27-2013, 05:05 AM

DNase I footprints with SeqMonk

To not to cite to many fragments from this short paper I will let you read it if you are interested.

The biggest difference between ChIP and DNase, in my opinion, is that while ChIP looks at the underlying peaks of usually well known proteins, DNase looks at the enrichment of much wider regions. There are also other differences between these two methodologies (see the paper).
Overall, with this kind of experiment people would be actually looking for the depleted (from sequencing tags) regions which were protected by bound proteins.

I guess that could be done in SeqMonk by creating continuous probes across the genome and then looking at the depleted regions instead of the enriched. Having said that, I would probably have some problems setting some kind of cut-off level between enrichment and depletion.

At the moment I'm trying to evaluate the options from the paper I mentioned but it would be nice to have SeqMonk to do that as well, as I already have done some simple analysis in it. If I will succeed to do that with SeqMonk I'll let you know.

simonandrews · 01-27-2013, 03:58 AM

We've never tried as far as I'm aware. Since it's (as I understand it) just enrichment data you should be able to use the same sort of methods as for ChIP-Seq. If there's something more specific which applies to this kind of data then if you can provide me with a pointer I can look at adding it.

I'd be interested to hear how you get on.

mjp · 01-27-2013, 03:50 AM

DNase I footprints with SeqMonk

Has anybody been successful in finding DNase I footprints using SeqMonk?

Can we use SeqMonk to do this sort of analysis?

glados · 01-22-2013, 04:34 AM

Dear Simon.

I sent you a private message a few weeks ago. Perhaps you can take a look at it? It was a question regarding installing a custom genome.

simonandrews · 01-11-2013, 02:55 AM

Originally posted by shadow19c View Post

Hello,
I want to know how can I vizualise the bedgraph file from bismark after methylation call?

SeqMonk is designed to to the quantitation of your data within the program rather than taking in externally quantitated files. Rather than trying to load the BedGraph file from Bismark you'd instead import the raw data from the methylation extractor and then quantitate this however you wanted inside SeqMonk to be able to visualise the methylation levels.

I put up a tutorial video covering some of the basics for working with bisulphite data on our youtube channel which should give you an idea how to get started with this.

shadow19c · 01-11-2013, 02:42 AM

Hello,
I want to know how can I vizualise teh bedgrap file from bismark after methylation call?

Thanks

mjp · 01-08-2013, 02:48 AM

That does sound OK indeed. Thanks!

As another alternative I could do simple read count for all probes and create an annotated probe report #1 (not annotating with anything) for all the stores which would give me a list of probes with '0's for probes not having any reads over them.

Do the difference quantitation of forward as percentage of all, which would give me the list probes with '100' for the probes with only forward reads across all stores. Probe report #2.

Same for the reverse. Probe report #3.

Having these three reports it would be easy to parse it outside of SeqMonk.
If probe in #1 = 0 => no reads.
If probe in #1 > 0 and probe in #2 = 100 => then only forward
If probe in #1 > 0 and probe in #3 = 100 => then only reverse.
If probe in #1 > 0 and probes in #2 and #3 different that 0 and 100 => both reads

I think this way I will get what I need the fastest for all stores.

One way or another, your input about difference quantitation was invaluable.

Thanks again.

simonandrews · 01-08-2013, 12:51 AM

Originally posted by mjp View Post

Sorry for this small delay and for not being specific enough. I see that the title of my post was wrong.
I wanted to have a list of probes that have just strand specific reads covering them. So the list would contain probes that have only fwd, only rvrs, none, and both type of reads covering them. Ideally I would like to have such a list for each of my stores independently. Is it possible to create that in single SeqMonk pipeline or does it have to be repeated for each store separately?

I don't think you can avoid having to do some filtering per-store since I think I understand that you want to end up with a separate set of lists for each store? You'll also need to do two quantitations. I reckon the quickest way to get these lists would be:

1) Quantitate using the difference quantitation using the option to quantitate forward reads as percentage of all reads.

2) Use the values filter to select probes with a value of 0 or 100 for each store. This should be pretty quick since you can set the parameters and then just select each store in turn in the filter and re-run it without having to reopen the dialog.

3) Requantitate the data with a simple read count quantitation (no corrections or transformations).

4) Use a values filter to select probes with some reads in them for each store.

5) Use the combine probes filter to select the subset of the 0% results which actually have some data in them. Again you can do this in a single filter session by changing which lists you're using so it shouldn't be too horrible.

Does this sound OK?

mjp · 01-06-2013, 11:06 PM

Originally posted by simonandrews View Post

do you want a quick way to determine if a given probe is forward or reverse only in all of a set of stores, or are you looking for a quick way to make separate lists for several stores where you have them?

Sorry for this small delay and for not being specific enough. I see that the title of my post was wrong.
I wanted to have a list of probes that have just strand specific reads covering them. So the list would contain probes that have only fwd, only rvrs, none, and both type of reads covering them. Ideally I would like to have such a list for each of my stores independently. Is it possible to create that in single SeqMonk pipeline or does it have to be repeated for each store separately?

Originally posted by simonandrews View Post

To do the analysis across several stores you could basically repeat the process you outlined but selecting all of the stores, and making your values filters use 'at least 1' rather than 'exactly 1' to pull out probes which had a read in that direction in any of your stores.

I'm not quite sure if that does what I want - this is where the confusion starts. I created 2 sample stores for yeast chr1 and run the workflow that I outlined previously but this time selecting 'at least 1' for all selected stores (please see the attached image). Probes generated using running window, size 1 step 1.
What I see is that the first visible cluster of probes for the bottom dataset (on the screenshot) is not spread entirely over the read. Instead it covers the section of the read that met similar criteria for the top dataset.

I was thinking that this would produce probes with value 1 for the top dataset as it is currently seen. However for the bottom dataset I would have probes of value 1 for entire read, wider than that of top dataset.

Originally posted by simonandrews View Post

You could also put all of your data into a single data group and then treat it as a single dataset.

These are independent samples. So I would like to avoid doing that.

I hope I didn't make it more complicated.

Attached Files

Screenshot from 2013-01-07 09:37:24.png (22.8 KB, 5 views)

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, Today, 06:09 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 Today, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 43 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News