SeqMonk: Visualisation and analysis for large mapped data sets

mediator replied

03-22-2012, 12:49 PM
Hi Simon,
For bed file (generated by Scripture, from RNA-Seq data), which quantification pipeline would you recommend? I am trying to compare bed files between patients and healthy controls in order to find splice variants unique to patients. Thank you!
Leave a comment:
pbseq replied

03-22-2012, 06:01 AM
Thanks a lot Simon, great hints. Seqmonk has really a lot of features to explore !

pbseq
Leave a comment:
simonandrews replied

03-22-2012, 04:57 AM
Originally posted by pbseq View Post

Hi Simon,
first again lots of compliments for seqmonk, I don't feel like I can fully grasp a new RNA-seq experiment until I've viewed it in seqmonk. !

Thanks! It's always great for us to hear feedback from other people using the program.

Originally posted by pbseq View Post

This told, I have a question, maybe trivial: is there a way to load a custom set of genes (let's say a particular class of genes) for, e.g. getting a chromosome overview of their expression and mapping over chromosomes ?

Sure, but I guess this will depend on how your're defining your group. The method we're using most commonly is to use the fearture search tool (Edit > Find Feature) to identity a group of genes/transcripts based on their annotation. This would include things like GeneOntology terms or anything else you find in the annotation. Once you have the list of hits visible you can use the option at the bottom to turn the hits into a new annotation track. Once you have a track just containing your features of interest then you can either just quantitate over these features, or you could do a wider quantitation and then use the feature filter to pull out just the probes which overlapped with your selected set of features.

Originally posted by pbseq View Post

If I also can suggest an improvement, I' d like to be able to resize the sample window (e.g: If have lots of samples, I may like to focus on only one interesting sample to let also visualize fully the mapped reads; with more than 5-6 samples is hard to visualize everything and so it's better to select one or few samples (e.g. for deciphering alternative splicing claims) ... I know I can delete a sample but resizing / hiding one or more samples maybe a better solution?

I'm not sure I get what you mean here. You can remove a sample from the main chromosome view without deleting it from your project. Just go to View > Set Data Tracks and you can choose which samples you want to have visible, and in which order. The removed samples are still in your project and can be added back to the view whenever you like.

I suspect I may be missing the point you're making though.

If you're interested in looking at alternative splicing then if you haven't seen this already then a really neat option is to import just the spliced introns into your project. If you have a spliced mapped SAM/BAM file (eg from TopHat), then if you import this and select "Split Spliced Reads" and "Import Introns rather than exons" then you'll see just the splices which you've observed. You can quantitatively analyse these by using the Read Position Probe Generator followed by the Exact Overlap Count Quantitation. We've found this way of looking at the data to be really helpful in deciding if there really is a change in the splicing pattern between samples.
Leave a comment:
pbseq replied

03-22-2012, 04:38 AM
Hi Simon,
first again lots of compliments for seqmonk, I don't feel like I can fully grasp a new RNA-seq experiment until I've viewed it in seqmonk. !

This told, I have a question, maybe trivial: is there a way to load a custom set of genes (let's say a particular class of genes) for, e.g. getting a chromosome overview of their expression and mapping over chromosomes ?

If I also can suggest an improvement, I' d like to be able to resize the sample window (e.g: If have lots of samples, I may like to focus on only one interesting sample to let also visualize fully the mapped reads; with more than 5-6 samples is hard to visualize everything and so it's better to select one or few samples (e.g. for deciphering alternative splicing claims) ... I know I can delete a sample but resizing / hiding one or more samples maybe a better solution?
thanks a lot for considering those notes !
pbseq
Leave a comment:
simonandrews replied

03-06-2012, 12:51 AM
Originally posted by aggp11 View Post

Hello Simon,

Can we use SeqMonk to visualize CNVs? I know there are several tools for predicting copy number changes, but am just wondering if there is a way of visualizing these Copy Number changes using SeqMonk from NGS data.

Hi Praful,

SeqMonk should certainly be able to do this. You'd probably want to do a simple read count over tiled probes which are large enough to contain enough data to get a reliable measure of the read depth, but small enough to catch smaller deletions. There are then a number of different tools to allow you to compare different samples and find differences between samples, or outliers from the normal coverage distribution in a single sample.

This isn't something our group works on much, but we've certainly used the program to confirm targeted knockouts that we've made, so the same principles could be used to find novel deletions or duplications.
Leave a comment:
simonandrews replied

03-06-2012, 12:37 AM
Originally posted by mediator View Post

Hi Simon,
Do you know if SeqMonk can show the exact base pairs for each reads? It will be very helpful for detecting indels and de novo mutation. Thank you!

Sorry but no it can't. SeqMonk operates purely on mapped positions. This allows it to analyse a billion plus reads on a normal desktop PC, but does mean that there's no direct connection to the original sequences of the submitted reads. We've thought about allowing it to keep connection to the original genomic sequence (so you could for example look for trends vs specific motifs, or GC content etc.) but it's very unlikely we're ever going to add in mutation information to each read since this would kill the very optimised data model we have for storing and manipulating these reads.
Leave a comment:
aggp11 replied

03-05-2012, 12:26 PM
Hello Simon,

Can we use SeqMonk to visualize CNVs? I know there are several tools for predicting copy number changes, but am just wondering if there is a way of visualizing these Copy Number changes using SeqMonk from NGS data.

Thanks,
Praful
Leave a comment:
mediator replied

03-05-2012, 09:19 AM
Originally posted by simonandrews View Post

For this type of experiment we'd recommend using the intensity difference filter rather than a straight difference filter. The intensity difference filter is a statistical filter where cutoffs are set as p-values, and we'd normally go with the default 0.05 cutoff. Details of how the filter works are in the advanced course.

In your case as you have 4 x 4 replicates you could use a combination of the replicate stats filter for a conventional statistical analysis and the intensity difference filter between the two replicate groups to determine the significant deviations from a difference from 0. Do the intensity difference filter first though since this relies on seeing the whole distribution of points.

Hi Simon,
Do you know if SeqMonk can show the exact base pairs for each reads? It will be very helpful for detecting indels and de novo mutation. Thank you!
Leave a comment:
mediator replied

02-25-2012, 04:07 PM
Thank you Simon!
Leave a comment:
simonandrews replied

02-25-2012, 01:16 PM
Originally posted by mediator View Post

Hi Simon,
That advanced course is really helpful, thanks! Do you know when use difference filter to identify differentially expressed genes, what is the appropriate interval for RNA-Seq experiments? I have four KO samples and four WT and I have calculated RPKM for all the samples. Thank you in advance!

For this type of experiment we'd recommend using the intensity difference filter rather than a straight difference filter. The intensity difference filter is a statistical filter where cutoffs are set as p-values, and we'd normally go with the default 0.05 cutoff. Details of how the filter works are in the advanced course.

In your case as you have 4 x 4 replicates you could use a combination of the replicate stats filter for a conventional statistical analysis and the intensity difference filter between the two replicate groups to determine the significant deviations from a difference from 0. Do the intensity difference filter first though since this relies on seeing the whole distribution of points.
Leave a comment:
mediator replied

02-24-2012, 11:15 AM
Originally posted by simonandrews View Post

After promising to do this for ages I've finally finished writing an Advanced SeqMonk Course. It won't get its first official outing for a couple of weeks, but I've released the course material onto our web site so everyone can have a look.

There are a couple of things in the course which require features which won't be released until v0.21.0 - but that should be coming fairly soon now.

Hi Simon,
That advanced course is really helpful, thanks! Do you know when use difference filter to identify differentially expressed genes, what is the appropriate interval for RNA-Seq experiments? I have four KO samples and four WT and I have calculated RPKM for all the samples. Thank you in advance!
Leave a comment:
colindaven replied

02-10-2012, 05:37 AM
Thanks for that Simon, it's a very nice document.
Leave a comment:
simonandrews replied

02-10-2012, 03:33 AM
Advanced SeqMonk course

After promising to do this for ages I've finally finished writing an Advanced SeqMonk Course. It won't get its first official outing for a couple of weeks, but I've released the course material onto our web site so everyone can have a look.

There are a couple of things in the course which require features which won't be released until v0.21.0 - but that should be coming fairly soon now.
Leave a comment:
simonandrews replied

02-06-2012, 01:22 AM
Originally posted by beajorrin View Post

OK! In fact I have and inter size of 500bp, so I have to change it. I have to check the trim fastq to reduce the mispaired.
Thanks

Even if you are size selecting at 500bp it's probably best to give yourself some leeway for slightly longer inserts. Size selection isn't as exact as you might think and a 1kb cutoff should still remove most of the mapping noise which might otherwise be a problem.
Leave a comment:
beajorrin replied

02-06-2012, 12:33 AM
Originally posted by simonandrews View Post

Ah, OK. When you import paired end data SeqMonk displays the inferred insert from the paired set of reads. If you have two reads from the same transcript which mapped 100,000bases apart then you'll see a read which is 100,000bases long. Because of this SeqMonk sets a limit on how far apart paired end reads can be. The default is 1kb which is about the limit for insert sizes on the Illumina platform. Unless you're working on a platform which can actually work with much longer insert sizes then you probably don't want to increase this.

Looking at the screenshot you posted you seem to have a big discrepancy between the number of reads mapped before and after trimming your data. This leads me to suspect that something may have gone wrong with your mapping of the trimmed data. When you trim your data you do need to ensure that you keep the sequences in your two fastq files exactly paired - ie if you trim one sequence down to no bases, then you still need to leave it in the file - or remove it completely from both fastq files so that bowtie always sees correctly paired sequences when it does the paired end mapping. My initial guess would be that your fastq files have ended up with different numbers of reads in them causing your data to be mispaired - which will lead to this odd kind of pairing.

OK! In fact I have and inter size of 500bp, so I have to change it. I have to check the trim fastq to reduce the mispaired.
Thanks
Leave a comment:

Previous 1 7 14 15 16 17 18 19 20 21 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 27 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News