Announcement
Collapse
No announcement yet.
X
-
You could try it out and see if it works (I have a hunch it won't be perfect).
If that does not work acceptably then I suggest you run mutate.sh multiple times with new values and create new datasets.
Leave a comment:
-
Hi GenoMax! Yes, you are right, the fraction of mutations may change in the new pool, but - based on the mixing percentages (in my scenario) - I'll be able to control the level of mutations per position. So, say on position 100, go from 100% MUT to 80% MUT if I mix my original reference fastq and mutated reference fastq using 80/20 percent of reads respectively.
Leave a comment:
-
Perhaps I am missing a subtle point but since you can control mutation type/rates with mutate.sh would it not be better to go with just that data (#2 in list above)? If you mix reads from mutated and un-mutated reference you can't be sure that you will maintain the fraction of mutations at the same level in new pool?
Leave a comment:
-
Originally posted by GenoMax View PostIt may be best to use "mutate.sh" (to look at in-line help) to introduce the mutations after you generate the reads with randomreads.sh. You will get a VCF files of changes.
I think using "local=t" is causing the alignment issues. It is meant for local alignments when you expect errors at end of reads.
1) First mutate my original sequence:
Code:./mutate.sh in=reference.fa out=reference_Mut.fa vcf=Variants.vcf subrate=0.01 indelrate=0.005 maxindel=3 overwrite=t
2) Then generate reads based on the new sequence, without any additional flags (only min quality):
Code:./randomreads.sh ref=reference_Mut.fa out1=Sample1_R1.fastq out2=Sample1_R2.fastq coverage=10000 paired=t minq=12 -Xmx10g
Code:./bbmap.sh ref=reference_Mut.fa in1=Sample1_R1.fastq in2=Sample1_R2.fastq covstats=Covstats.cov
4) Map the reads to the original reference.
Code:./bbmap.sh ref=reference.fa in1=Sample1_R1.fastq in2=Sample1_R2.fastq covstats=Covstats.cov
My follow-up question then is can I safely assume that the fastq files I'm generating, using the method above, contain the variants at a rate of 100%?
If that assumption is correct, what would be the best way to regulate the variation rate? I'm thinking along the lines of:
1) Generate perfect reads from un-mutated reference.
2) Generate perfect reads from mutated sequence.
3) Mix the file sets in various percentages to get the desired effect and make a new set of files.
Is that correct logic or is there a way to already do that?
Thank you in advance, I really appreciate it!
Leave a comment:
-
It may be best to use "mutate.sh" (to look at in-line help) to introduce the mutationsafter you generate the reads with randomreads.sh. You will get a VCF files of changes.
I think using "local=t" is causing the alignment issues. It is meant for local alignments when you expect errors at end of reads.Last edited by GenoMax; 04-30-2020, 05:49 AM.
Leave a comment:
-
BBmap generation and mapping of artificial paired-end fastq
Hello everyone.
I have discovered BBmap and have just started using it. I want to run a benchmark on some variant callers and I was thinking of making a set of paired-end Illumina fastq files where I know exactly the sample composition per position (e.g. position 100, 80% A (REF), 20% T (MUT)) and then I can run the file through the pipelines and assess how they perform.
So, I generate my set of files with:
Code:./randomreads.sh ref=reference.fa out1=Sample1_R1.fastq out2=Sample1_R2.fastq coverage=10000 paired=t mininsert=1 maxinsert=12 snprate=0.1 insrate=0.1 delrate=0.1 minq=12 -Xmx10g
Then my other question is about BBmap.sh. I tried to map the generated fastq files to my reference, using the command:
Code:/bbmap.sh ref=reference.fa local=t in1=Sample1_R1.fastq in2=Sample1_R2.fastq covstats=Covstats.cov
The variant rates are very high,
HTML Code:Read 1 data: pct reads num reads pct bases num bases mapped: 2.2502% 22429 2.2502% 3364350 unambiguous: 2.2502% 22429 2.2502% 3364350 ambiguous: 0.0000% 0 0.0000% 0 low-Q discards: 0.0000% 0 0.0000% 0 perfect best site: 0.2408% 2400 0.2408% 360000 semiperfect site: 0.2408% 2400 0.2408% 360000 rescued: 0.4520% 4505 Match Rate: NA NA 95.1746% 3202707 Error Rate: 86.5130% 19404 4.1666% 140208 Sub Rate: 60.4441% 13557 0.8741% 29415 Del Rate: 0.1338% 30 0.0219% 736 Ins Rate: 65.3707% 14662 3.2706% 110057 N Rate: 14.2583% 3198 0.6589% 22171 Read 2 data: pct reads num reads pct bases num bases mapped: 2.2421% 22348 2.2421% 3352200 unambiguous: 2.2421% 22348 2.2421% 3352200 ambiguous: 0.0000% 0 0.0000% 0 low-Q discards: 0.2039% 2032 0.2039% 304800 perfect best site: 0.4693% 4678 0.4693% 701700 semiperfect site: 0.4693% 4678 0.4693% 701700 rescued: 0.3604% 3592 Match Rate: NA NA 96.8958% 3248505 Error Rate: 76.8301% 17170 2.7130% 90955 Sub Rate: 60.3454% 13486 0.8803% 29513 Del Rate: 0.1253% 28 0.0112% 376 Ins Rate: 39.6680% 8865 1.8215% 61066 N Rate: 10.8198% 2418 0.3912% 13116
I also ran a second test where I lowered the rates drasticallyCode:snprate=0.1 insrate=0.01 delrate=0.01
I appreciate any pointers, thanks for reading!Tags: None
Latest Articles
Collapse
-
by seqadmin
At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...-
Channel: Articles
09-26-2023, 06:26 AM -
-
by seqadmin
Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...-
Channel: Articles
09-07-2023, 11:15 PM -
-
by seqadmin
Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.
Whole Transcriptome RNA-seq
Whole transcriptome sequencing...-
Channel: Articles
08-31-2023, 11:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 09:38 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Yesterday, 09:38 AM
|
||
Started by seqadmin, 09-27-2023, 06:57 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
09-27-2023, 06:57 AM
|
||
Started by seqadmin, 09-26-2023, 07:53 AM
|
1 response
23 views
0 likes
|
Last Post
![]() |
||
Multiplexed Biomarker Detection with Nanopore Technology: A Leap in Precision Diagnostics
by seqadmin
Started by seqadmin, 09-25-2023, 07:42 AM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
09-25-2023, 07:42 AM
|
Leave a comment: