Seqanswers Leaderboard Ad
Collapse
X
-
Ah, I was guessing it might be the Perl rand() function generating the coordinates, but wanted to be sure. Thanks very much!
-
-
Hi Karl,
The starting position in the genome is determined by first concatenating all chromosomes into one big long sequence, and then generating random numbers using the Perl rand() function. Using this number it does then first determine which chromosome and starting position this would correspond to, and extract 32bp sequence at this position. So in essence it should be as 'random' as the Perl rand() function is. Hope this helps.
Leave a comment:
-
-
Hello,
I'm using Sherman to generate sets of 32 bp genomic sequences for use as random control "libraries" to some transcriptome libraries our lad has made. I compare the distribution of these random "reads" in different annotated genomic categories (how many fall within genes, transposons, etc.) to that of the transcriptome libraries.
So, a question about the --genome_folder option: How random are the sequences generated when this option is chosen? How are, for example, the different 32-mers chosen from the chromosome coordinates given?
This is the command I use:
./Sherman -l 32 -n 51402229 --genome_folder /genome/ZmB73_Refgen/
Just looking at two simulated files generated by using the identical command, I see they're not the same, but I just wanted to get a sense of how different they are.
Thanks,
Karl
Leave a comment:
-
-
We have released a new version of the bisulfite simulator Sherman (v.0.1.4). This update fixes the following flaw:
During context specific cytosine conversion, until now Sherman assumed that a C at the last position was in CH context. This did however cause a weird blip in the M-bias plots (introduced into the Bismark methylation extractor as of v0.8.0) of simulated data at the end or read 1 and at the start of read 2 whenever the read was actually in CpG context. To account for this, Sherman does now determine the sequence context of the last position in a read correctly.
Sherman is available here: https://www.bioinformatics.babraham....jects/sherman/.
Leave a comment:
-
-
Hi Brent,
It is true that all reads have the same quality values at each position, and this is modeled so that on average there is a certain chance, of in your case 1%, of incorporating a sequencing error spread over the entire sequence. A certain degree of randomness is achieved at the point when the error is actually introduced, because this is decided randomly against the Phred score (= probability that a basecall is wrong) for each bp individually.
Hope this isn't too confusing.
Best,
Felix
Leave a comment:
-
-
identical qualities?
Hi, this looks to be quite useful.
I call like:
Code:./Sherman -n 100000 -l 50 -cr 0 --colorspace --error_rate 1 --genome_folder ~/data/hg19/ --quality 30
Code:$ awk '(NR %2 == 0)' simulated_QV.qual | uniq
Is this as intended?
thanks,
-Brent
Leave a comment:
-
-
Simulating FastQ libraries for BS-Seq or normal applications using Sherman
We have just made available a FastQ simulation script, termed Sherman, for high-throughput bisulfite (or standard genomic) sequencing datasets. It can generate single-end or paired-end data in both nucleotide-/base-space (such as from the Illumina platform) and color-space (such as from the SOLiD platform).
Sherman was designed to assess the influence of common problems observed in many Next-Gen Sequencing libraries on the primary analysis of BS-Seq data. Thus, it allows the user to introduce various 'contaminants' into the simulated libraries, including basecall errors (following an exponential decay model), SNPs, Illumina adapter fragments and more.
These are the main features:
• Generate any number of sequences of any length
• Generate either completely random sequences or use genomic sequences (genome can be specified)
• Generates single-end or paired-end data with variable fragment sizes
• Adjustable bisulfite conversion rate from 0-100% for either all cytosines or cytosines in CH and CG context individually
• Generate directional or non-directional libraries
• Generate sequences in base-space or SOLiD color-space format
• Adjustable default Phred quality score (Sanger encoding, Phred+33 format)
• Sequences can have constant Phred qualities throughout the read or can have quality scores following an exponential decay curve, which will eventually result in basecall errors (note that this is handled slightly different for base- and color-space data)
• Introduce a variable number of random SNPs into each read
• Introduce a fixed amount of adapter sequence at the 3' end of all sequences
• Introduce a variable amount of adapter sequence at various positions at the 3' end of reads
While including the paired-end option, Sherman has received a major overhaul so it should now run much quicker and be less memory-intensive. Initially, Sherman was designed to generate the kinds of library contaminations we were interested in, but if you have any ideas or suggestions which could be implemented (_easily_) we would love to hear from you.
Sherman can be found at www.bioinformatics.bbsrc.ac.uk/projects/Tags: None
-
Latest Articles
Collapse
-
by seqadmin
The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...-
Channel: Articles
Today, 11:48 AM -
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-20-2025, 05:03 AM
|
0 responses
26 views
0 reactions
|
Last Post
by seqadmin
03-20-2025, 05:03 AM
|
||
Started by seqadmin, 03-19-2025, 07:27 AM
|
0 responses
33 views
0 reactions
|
Last Post
by seqadmin
03-19-2025, 07:27 AM
|
||
Started by seqadmin, 03-18-2025, 12:50 PM
|
0 responses
25 views
0 reactions
|
Last Post
by seqadmin
03-18-2025, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
190 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
Leave a comment: