Seqanswers Leaderboard Ad

**brentp** · 09-29-2011, 08:30 AM

identical qualities?

Hi, this looks to be quite useful.

I call like:

Code:

./Sherman -n 100000 -l 50 -cr 0 --colorspace --error_rate 1 --genome_folder ~/data/hg19/ --quality 30

If I do the following, I get only 1 line of output:

Code:

$ awk '(NR %2 == 0)' simulated_QV.qual | uniq

e.g. There is no randomness in the quality values.
Is this as intended?

thanks,
-Brent

**fkrueger** · 09-29-2011, 11:12 AM

Hi Brent,

It is true that all reads have the same quality values at each position, and this is modeled so that on average there is a certain chance, of in your case 1%, of incorporating a sequencing error spread over the entire sequence. A certain degree of randomness is achieved at the point when the error is actually introduced, because this is decided randomly against the Phred score (= probability that a basecall is wrong) for each bp individually.

Hope this isn't too confusing.

Best,
Felix

**brentp** · 09-29-2011, 11:19 AM

Got it. Thanks for the explanation.

**fkrueger** · 01-09-2012, 08:49 AM

We have just released an updated version of Sherman (v0.1.1) which fixes an issue with the simulation of non-directional paired-end data and improves some other minor aspects.

**fkrueger** · 09-07-2012, 02:22 AM

We have updated Sherman (v0.1.2) so that reads which were simulated from an existing genome carry the genomic coordinates in the sequence ID. This makes it easier to determine the accuracy of different aligners..

**fkrueger** · 07-12-2013, 02:11 PM

We have released a new version of the bisulfite simulator Sherman (v.0.1.4). This update fixes the following flaw:

During context specific cytosine conversion, until now Sherman assumed that a C at the last position was in CH context. This did however cause a weird blip in the M-bias plots (introduced into the Bismark methylation extractor as of v0.8.0) of simulated data at the end or read 1 and at the start of read 2 whenever the read was actually in CpG context. To account for this, Sherman does now determine the sequence context of the last position in a read correctly.

Sherman is available here: https://www.bioinformatics.babraham....jects/sherman/.

**kerhard** · 07-22-2013, 10:29 PM

Hello,

I'm using Sherman to generate sets of 32 bp genomic sequences for use as random control "libraries" to some transcriptome libraries our lad has made. I compare the distribution of these random "reads" in different annotated genomic categories (how many fall within genes, transposons, etc.) to that of the transcriptome libraries.

So, a question about the --genome_folder option: How random are the sequences generated when this option is chosen? How are, for example, the different 32-mers chosen from the chromosome coordinates given?

This is the command I use:

./Sherman -l 32 -n 51402229 --genome_folder /genome/ZmB73_Refgen/

Just looking at two simulated files generated by using the identical command, I see they're not the same, but I just wanted to get a sense of how different they are.

Thanks,

Karl

**fkrueger** · 07-23-2013, 12:35 AM

Hi Karl,
The starting position in the genome is determined by first concatenating all chromosomes into one big long sequence, and then generating random numbers using the Perl rand() function. Using this number it does then first determine which chromosome and starting position this would correspond to, and extract 32bp sequence at this position. So in essence it should be as 'random' as the Perl rand() function is. Hope this helps.

**kerhard** · 07-23-2013, 09:18 AM

Ah, I was guessing it might be the Perl rand() function generating the coordinates, but wanted to be sure. Thanks very much!

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Simulating FastQ libraries for BS-Seq or normal applications using Sherman

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News