Unconfigured Ad

**fkrueger** · 11-23-2011, 01:33 PM

The methylation information can be imported into SeqMonk whereby '+' reads are methylated and '-' reads are non-methylated cytosines. Use the position value for both start and end of the cytosine methylation calls. You can then perform a probe generation over individual C positions (e.g. read position probe generation) and do a relative quantitation of 'FORWARD' reads 'as percentage of' 'ALL READS'. You could also look at other genomic features such as CGIs, promoters etc.

If you are not necessarily interested in strand specific methylation you can also import *bismark.txt files directly into SeqMonk using the Bismark import filter where you can select the context you are interested in. Hope this helps.

**simonandrews** · 01-03-2012, 05:45 AM

SeqMonk v0.19.0 released

As a somewhat belated christmas present I've just put up the release of SeqMonk v0.19.0 onto our project web page. In this release we've made some fairly major changes to the core data model which mean that we get a significant increase in loading and saving speeds (load times are around half what they were before), along with a big decrease in the running memory footprint (down by around 4X) as well as a nice speed increase in many of the analysis functions.

Along with this we've improved some of the plots (aligned probes and probe trend), and have put some new display options into the main chromosome view (greater raw read density, fixed colours for individual datasets).

We've been running this build internally for a while and have seen large increases in the amount of data we've been able to handle - along with a pleasant reduction in the amount of time we spend watching little red bars slowly crawl across the screen.

The updated version is available from our project page. As always, if you don't see the new version try pressing shift+refresh in your browser to bypass the annoying BBSRC proxy server.

If you have any problems, either add a note to this thread, or report them in our bugzilla system.

**beajorrin** · 01-17-2012, 07:20 AM

I'm trying to visualizer my data with SeqMonk. My data is Illumina pair-based sequences, I work fist with Galaxy, and do Bowtie there. So now i have SAM and BAM files. I could import my reference genome, changing those thing in the AC and product, locus_tag. I try first with the BAM file, but when I import this data and the SeqMonk reads it it told me "Couldn't extract a valid name from <name>".
So I go to the reference genome that i used in galaxy (the same that i used in seqMonk) and change in the fasta file the AC/ID, that I use in SeqMonk reference genome. And the answer is the same.

I don't try yet the SAM, but I think that the problem is the reference genome used in galaxy.

Thanks

**simonandrews** · 01-17-2012, 07:38 AM

If your reference genome is chromosome based, but the identifiers are not chromosome names but accession numbers or something similar then you need to define some custom chromosome name mappings so SeqMonk can figure out which accession refers to which chromosome. Once you have the mappings set up then the import should work.

**goofy** · 01-18-2012, 06:37 PM

Total Read Count Differece

Hi,

I'm trying to quantitate the percentage distribution of TF enrichment of my control and treated samples but I got a massive Total Read Count between the samples, just wondering what it means. I've designed probes based on promoter, introns, exons region, they're ok but I want to normalize that against the total read count. My other ChUp-Seq's total read counts are relatively similar between control and treated samples but just this TF ChIP-Seq has a massive difference. Anyone know what this means????

**beajorrin** · 01-19-2012, 12:28 AM

Originally posted by simonandrews View Post

If your reference genome is chromosome based, but the identifiers are not chromosome names but accession numbers or something similar then you need to define some custom chromosome name mappings so SeqMonk can figure out which accession refers to which chromosome. Once you have the mappings set up then the import should work.

Thanks, It works!

**simonandrews** · 01-20-2012, 12:51 AM

Originally posted by goofy View Post

Hi,

I'm trying to quantitate the percentage distribution of TF enrichment of my control and treated samples but I got a massive Total Read Count between the samples, just wondering what it means. I've designed probes based on promoter, introns, exons region, they're ok but I want to normalize that against the total read count. My other ChUp-Seq's total read counts are relatively similar between control and treated samples but just this TF ChIP-Seq has a massive difference. Anyone know what this means????

Total read count isn't always a great thing to normalise to. In some cases (particularly in ChIP samples) you can get a huge number of sequences mapping to a small number of loci. Often these will be mis-mappings, maybe even of regions which aren't in the assembly (telomeric or centromeric repeats for example). We've seen cases where 40% of reads in a ChIP (a MeDIP actually) came from this kind of sequence and mapped to just 12 locations. This kind of bias can hugely throw off your normalisation.

Within SeqMonk you can use the cumulative distribution plot to look at how well your samples are normalised. If your total count has thrown off the normalisation then you'll probably see lines running parallel to each other. In this case you can then use the percentile normalisation quantitation method to correct your normalisation to a specific point in your distribution where the distributions look to be equivalent, and this should remove any odd biases in the total counts.

I'm actually going to be releasing our Advaanced SeqMonk course documentation in the next couple of weeks, and there will be a whole section on sorting out data normalisation which will go through these kinds of issues in much more detail.

**simonandrews** · 01-24-2012, 01:59 AM

I've just release SeqMonk v0.20.0 onto our repositories. This address a potentially nasty bug in v0.19 which may have truncated some filtered probe lists in any projects saved with that version.

The bug would affect you if your probe set contained multiple probes at exactly the same genomic position. In practice this only really happens if you make feature based probes and don't select the option to remove exact duplicates. If you made probe sets like this in v0.19.0 you should recalculate any filtered lists you have made with that version. Most of these won't actually have been affected, but since we can't spot a truncated list automatically it's better to be safe than sorry.

The gory details of the bug can be found on our bugzilla server.

Other changes in this release are:

We fixed a bug in the Intensity Difference Filter which was adding the same hit multiple times. All reported hits were real hits, but some may have been duplicated.
We fixed a display bug for deduplicated HiC data when it was first imported. Saving and reloading the project would fix the problem.
We added a new quantitaiton pipeline to allow you to easily make 'wiggle' type plots.

The new version is now available from our project page and all users of the previous version are strongly advised to upgrade immediately.

**mediator** · 01-30-2012, 11:36 AM

Hi Simon,
I am using the Seqmonk to analyze my RNA Seq data right now. It's very straightforward and intuitive. Just have a question, after I used the quantitation pipeline to perform RPKM calculation on my data, how do I save the RPKM for all the probes in a export file? Thank you!

Originally posted by simonandrews View Post

I've just release SeqMonk v0.20.0 onto our repositories. This address a potentially nasty bug in v0.19 which may have truncated some filtered probe lists in any projects saved with that version.

The bug would affect you if your probe set contained multiple probes at exactly the same genomic position. In practice this only really happens if you make feature based probes and don't select the option to remove exact duplicates. If you made probe sets like this in v0.19.0 you should recalculate any filtered lists you have made with that version. Most of these won't actually have been affected, but since we can't spot a truncated list automatically it's better to be safe than sorry.

The gory details of the bug can be found on our bugzilla server.

Other changes in this release are:

We fixed a bug in the Intensity Difference Filter which was adding the same hit multiple times. All reported hits were real hits, but some may have been duplicated.
We fixed a display bug for deduplicated HiC data when it was first imported. Saving and reloading the project would fix the problem.
We added a new quantitaiton pipeline to allow you to easily make 'wiggle' type plots.

The new version is now available from our project page and all users of the previous version are strongly advised to upgrade immediately.

**simonandrews** · 01-31-2012, 12:29 AM

Originally posted by mediator View Post

Hi Simon,
I am using the Seqmonk to analyze my RNA Seq data right now. It's very straightforward and intuitive. Just have a question, after I used the quantitation pipeline to perform RPKM calculation on my data, how do I save the RPKM for all the probes in a export file? Thank you!

Simply create an annotated probe report (Reports > Create Annotated Probe Report). You don't actually need to add any additional annotation as the probes themselves will be named after the transcript to which they relate.

**beajorrin** · 02-02-2012, 07:31 AM

I'm really think that SeqMonk is very useful, but i have a problem. I'm working with Illumina pair-end reads, I've trimmed my reads by quality, I've mapped it with Bowtie and finally I've transformed it from sam to bam. I`ve visualized it with Seqmonk, and I've observed that my reads are assembled (and Bowtie don`t assemble, just map). It dosen´t happen if don`t trim my data. What could be the problem?
Thanks

**simonandrews** · 02-02-2012, 07:46 AM

Originally posted by beajorrin View Post

I'm really think that SeqMonk is very useful, but i have a problem. I'm working with Illumina pair-end reads, I've trimmed my reads by quality, I've mapped it with Bowtie and finally I've transformed it from sam to bam.

OK, I'm with you so far (but for the record you could have left out the last step since SeqMonk would have read the SAM files directly - and doesn't care whether they're sorted or not).

Originally posted by beajorrin View Post

I`ve visualized it with Seqmonk, and I've observed that my reads are assembled (and Bowtie don`t assemble, just map). It dosen´t happen if don`t trim my data. What could be the problem?

I'm not sure what you mean here when you say your reads are assembled. SeqMonk will pack your mapped reads together so you can see as many as possible on the screen, but this isn't an assembly - it's just showing the positions of the reads in the existing genome assembly you mapped against with bowtie. You should have got this whether your data was trimmed or not (except that your untrimmed data might have been more spread out since the mapping efficiency might have been much lower). Could you describe (or post small pictures of) exactly what you're seeing which concerns you?

**beajorrin** · 02-02-2012, 08:26 AM

Originally posted by simonandrews View Post

OK, I'm with you so far (but for the record you could have left out the last step since SeqMonk would have read the SAM files directly - and doesn't care whether they're sorted or not).

I'm not sure what you mean here when you say your reads are assembled. SeqMonk will pack your mapped reads together so you can see as many as possible on the screen, but this isn't an assembly - it's just showing the positions of the reads in the existing genome assembly you mapped against with bowtie. You should have got this whether your data was trimmed or not (except that your untrimmed data might have been more spread out since the mapping efficiency might have been much lower). Could you describe (or post small pictures of) exactly what you're seeing which concerns you?

Hi!
First, thanks for your quickly answer.

What i see is different read length if my data. I have reads, in my original data, of at least 100pb, but when I viualized it whit Seqmonk I have read of 9000 pb or more. It could be because the maximum insert size for valid paired-end alignments? I've set it in 10000. Could Seqmonk join this reads that are far away one from other? or is how i map the reads?
thanks

(I upload an image)

Attached Files

Captura de pantalla 2012-02-02 a las 17.20.00.png (16.2 KB, 217 views)

**simonandrews** · 02-02-2012, 08:35 AM

Originally posted by beajorrin View Post

Hi!
First, thanks for your quickly answer.

What i see is different read length if my data. I have reads, in my original data, of at least 100pb, but when I viualized it whit Seqmonk I have read of 9000 pb or more. It could be because the maximum insert size for valid paired-end alignments? I've set it in 10000. Could Seqmonk join this reads that are far away one from other? or is how i map the reads?
thanks

(I upload an image)

Ah, OK. When you import paired end data SeqMonk displays the inferred insert from the paired set of reads. If you have two reads from the same transcript which mapped 100,000bases apart then you'll see a read which is 100,000bases long. Because of this SeqMonk sets a limit on how far apart paired end reads can be. The default is 1kb which is about the limit for insert sizes on the Illumina platform. Unless you're working on a platform which can actually work with much longer insert sizes then you probably don't want to increase this.

Looking at the screenshot you posted you seem to have a big discrepancy between the number of reads mapped before and after trimming your data. This leads me to suspect that something may have gone wrong with your mapping of the trimmed data. When you trim your data you do need to ensure that you keep the sequences in your two fastq files exactly paired - ie if you trim one sequence down to no bases, then you still need to leave it in the file - or remove it completely from both fastq files so that bowtie always sees correctly paired sequences when it does the paired end mapping. My initial guess would be that your fastq files have ended up with different numbers of reads in them causing your data to be mispaired - which will lead to this odd kind of pairing.

**beajorrin** · 02-06-2012, 12:33 AM

Originally posted by simonandrews View Post

Ah, OK. When you import paired end data SeqMonk displays the inferred insert from the paired set of reads. If you have two reads from the same transcript which mapped 100,000bases apart then you'll see a read which is 100,000bases long. Because of this SeqMonk sets a limit on how far apart paired end reads can be. The default is 1kb which is about the limit for insert sizes on the Illumina platform. Unless you're working on a platform which can actually work with much longer insert sizes then you probably don't want to increase this.

Looking at the screenshot you posted you seem to have a big discrepancy between the number of reads mapped before and after trimming your data. This leads me to suspect that something may have gone wrong with your mapping of the trimmed data. When you trim your data you do need to ensure that you keep the sequences in your two fastq files exactly paired - ie if you trim one sequence down to no bases, then you still need to leave it in the file - or remove it completely from both fastq files so that bowtie always sees correctly paired sequences when it does the paired end mapping. My initial guess would be that your fastq files have ended up with different numbers of reads in them causing your data to be mispaired - which will lead to this odd kind of pairing.

OK! In fact I have and inter size of 500bp, so I have to change it. I have to check the trim fastq to reduce the mispaired.
Thanks

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, Yesterday, 12:03 PM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 Yesterday, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, Yesterday, 11:40 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News