I am trying to build a comprehensive database of prokaryotic (bacteria and archea) and fungal genomes to be used for screening ancient DNA reads for contamination. What I found unfortunately that many of genomes in NCBI or EMBL databases have a lot of poly-N inserts, which obviously need to be eliminated. This can be done either by removing inserts from each FASTA record, which may be difficult, or by splitting records at poly-N inserts and trimming Ns from the ends. Is there a tool/sctipt to do this? Alternatively, I may have to abandon genomes and just concatenate GenBank relevant records, but I first will have to extract FASTA from them. Any advice?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Well, dowloaded data files need to be preprocessed by BWA-SW to make databases for local install of DeconSeq, and the author removed Ns by splitting, as BWA-SW replaces Ns with either of A, G, C, T at random (citing the paper). But it does not say in the paper how this was done...
Comment
-
I am also not convinced that you should remove N's, but if you must, you can with Biopieces (www.biopieces.org):
Code:read_fasta -i in.fna | transliterate_seq -d 'nN' | write_fasta -o out.fna -x
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-02-2024, 08:06 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
05-02-2024, 08:06 AM
|
||
Started by seqadmin, 04-30-2024, 12:17 PM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-30-2024, 12:17 PM
|
||
Started by seqadmin, 04-29-2024, 10:49 AM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
04-29-2024, 10:49 AM
|
||
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
Comment