Seqanswers Leaderboard Ad

**GenoMax** · 04-30-2020, 03:02 AM

It may be best to use "mutate.sh" (to look at in-line help) to introduce the mutations ~~after you generate the reads with randomreads.sh~~. You will get a VCF files of changes.

I think using "local=t" is causing the alignment issues. It is meant for local alignments when you expect errors at end of reads.

**DPalachan** · 04-30-2020, 04:30 AM

Originally posted by GenoMax View Post

It may be best to use "mutate.sh" (to look at in-line help) to introduce the mutations after you generate the reads with randomreads.sh. You will get a VCF files of changes.

I think using "local=t" is causing the alignment issues. It is meant for local alignments when you expect errors at end of reads.

Thank you for your ultra-fast reply GenoMax! I have a question on your comment. Did you mean to say to use "mutate.sh" before generating the reads? I actually tried that part:
1) First mutate my original sequence:

Code:

./mutate.sh in=reference.fa out=reference_Mut.fa vcf=Variants.vcf subrate=0.01 indelrate=0.005 maxindel=3 overwrite=t

The vcf file indeed looks super!

2) Then generate reads based on the new sequence, without any additional flags (only min quality):

Code:

./randomreads.sh ref=reference_Mut.fa out1=Sample1_R1.fastq out2=Sample1_R2.fastq coverage=10000 paired=t minq=12 -Xmx10g

3) Map the reads to the mutated reference (exclude "local" option):

Code:

./bbmap.sh ref=reference_Mut.fa in1=Sample1_R1.fastq in2=Sample1_R2.fastq covstats=Covstats.cov

This indeed yields a 99.905% mapping, which makes sense based on some quality filter I assume.
4) Map the reads to the original reference.

Code:

./bbmap.sh ref=reference.fa in1=Sample1_R1.fastq in2=Sample1_R2.fastq covstats=Covstats.cov

This yielded a 99.875% mapping which also makes sense.

My follow-up question then is can I safely assume that the fastq files I'm generating, using the method above, contain the variants at a rate of 100%?

If that assumption is correct, what would be the best way to regulate the variation rate? I'm thinking along the lines of:
1) Generate perfect reads from un-mutated reference.
2) Generate perfect reads from mutated sequence.
3) Mix the file sets in various percentages to get the desired effect and make a new set of files.
Is that correct logic or is there a way to already do that?

Thank you in advance, I really appreciate it!

**GenoMax** · 04-30-2020, 05:54 AM

Perhaps I am missing a subtle point but since you can control mutation type/rates with mutate.sh would it not be better to go with just that data (#2 in list above)? If you mix reads from mutated and un-mutated reference you can't be sure that you will maintain the fraction of mutations at the same level in new pool?

**DPalachan** · 04-30-2020, 06:26 AM

Hi GenoMax! Yes, you are right, the fraction of mutations may change in the new pool, but - based on the mixing percentages (in my scenario) - I'll be able to control the level of mutations per position. So, say on position 100, go from 100% MUT to 80% MUT if I mix my original reference fastq and mutated reference fastq using 80/20 percent of reads respectively.

**GenoMax** · 04-30-2020, 06:31 AM

You could try it out and see if it works (I have a hunch it won't be perfect).

If that does not work acceptably then I suggest you run mutate.sh multiple times with new values and create new datasets.

**DPalachan** · 04-30-2020, 06:50 AM

Thank you, I'll try it and report back!

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

BBmap generation and mapping of artificial paired-end fastq

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News