Does anyone have the SNP_in_ORF_nonsyn.pl script that was described in this thread? The link to the script is no longer in use (http://users.ugent.be/~slvbelle/NGS/)
Any info would be great!
Seqanswers Leaderboard Ad
Collapse
X
-
Hi Steven,
i send you those three input files. thanks a lot.
Maoshigua
Leave a comment:
-
-
Hi maoshigua,
can you send me ([email protected]) a sample of your data? I will try to fix it.
Cheers,
Steven
Leave a comment:
-
-
JackieBadger, I tried the script. It came with:
Error, Reference nucleotide does not equal the one in the original sequence at ./SNP_in_ORF_nonsyn_multiSNP.pl line 85, <GEN0> line 6.
any suggestions, please?
Maoshigua
Leave a comment:
-
-
Originally posted by fanx View PostJackieBadger, I tried the script. It came with:
Use of uninitialized value $countSyn in concatenation (.) or string at SNP_in_ORF_nonsyn.pl line 101, <GEN0> line 39393.
Use of uninitialized value $countNonSyn in concatenation (.) or string at SNP_in_ORF_nonsyn.pl line 102, <GEN0> line 39393.
any advice? pls.
I made some modifications, it should work now...
Leave a comment:
-
-
SNPdat
SNPdat can be used for this
Background Single nucleotide polymorphisms (SNPs) are the most abundant genetic variant found in vertebrates and invertebrates. SNP discovery has become a highly automated, robust and relatively inexpensive process allowing the identification of many thousands of mutations for model and non-model organisms. Annotating large numbers of SNPs can be a difficult and complex process. Many tools available are optimised for use with organisms densely sampled for SNPs, such as humans. There are currently few tools available that are species non-specific or support non-model organism data. Results Here we present SNPdat, a high throughput analysis tool that can provide a comprehensive annotation of both novel and known SNPs for any organism with a draft sequence and annotation. Using a dataset of 4,566 SNPs identified in cattle using high-throughput DNA sequencing we demonstrate the annotations performed and the statistics that can be generated by SNPdat. Conclusions SNPdat provides users with a simple tool for annotation of genomes that are either not supported by other tools or have a small number of annotated SNPs available. SNPdat can also be used to analyse datasets from organisms which are densely sampled for SNPs. As a command line tool it can easily be incorporated into existing SNP discovery pipelines and fills a niche for analyses involving non-model organisms that are not supported by many available SNP annotation tools. SNPdat will be of great interest to scientists involved in SNP discovery and analysis projects, particularly those with limited bioinformatics experience.
(there is also a short tutorial in the downloads section)
You only need a VCF for input, annotation file (GTF) and reference sequence (Fasta file). The annotation and sequence information can be from your own assembly and dont require any preprocessing.
Leave a comment:
-
-
JackieBadger, I tried the script. It came with:
Use of uninitialized value $countSyn in concatenation (.) or string at SNP_in_ORF_nonsyn.pl line 101, <GEN0> line 39393.
Use of uninitialized value $countNonSyn in concatenation (.) or string at SNP_in_ORF_nonsyn.pl line 102, <GEN0> line 39393.
any advice? pls.
Leave a comment:
-
-
Originally posted by JackieBadger View PostLook at this pub. "De novo Transcriptome Assembly and SNP Discovery in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae)"
I now provide a quote from the primary author, reference their paper if you use the script
"The script for finding amino acid changes uses several data files.
- I searched the ORFs in the unigenes with this program: http://proteomics.ysu.edu/tools/OrfPredictor.html
è Output: a CDS file (DNA sequences of the ORFs) and a PEP file (AA sequences of the ORFs, and also contains START, STOP and READINGFRAME of the ORFs)
- SNP calling with SAMtools
è Output: VCF file (SNP and positions of SNP)
- Perl script (SNP_in_ORF_nonsyn.pl) infers whether SNPs are located within an ORF and whether the SNP results in an amino acid change. The script gets the SNP position from the VCF file, mutates the position in the original sequence in the unigene fasta file, then translates that sequence according its ORF (from PEP file) and then checks whether the original sequence differs from the mutated sequence. The script uses bioperl.
è Output: each line in the VCF file that contains a nonsynonymous SNP. At the end, the number of synonymous and nonsynonymous is also outputted.
I made the script and data available here: http://users.ugent.be/~slvbelle/NGS/
(I added an example PEP and VCF file which should work)
The script should be used as follows:
./SNP_in_ORF_nonsyn.pl Trinity_GC018ALL_unique.fasta PEP.fasta SNP.vcf > output"
Leave a comment:
-
-
Look at this pub. "De novo Transcriptome Assembly and SNP Discovery in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae)"
I now provide a quote from the primary author, reference their paper if you use the script
"The script for finding amino acid changes uses several data files.
- I searched the ORFs in the unigenes with this program: http://proteomics.ysu.edu/tools/OrfPredictor.html
è Output: a CDS file (DNA sequences of the ORFs) and a PEP file (AA sequences of the ORFs, and also contains START, STOP and READINGFRAME of the ORFs)
- SNP calling with SAMtools
è Output: VCF file (SNP and positions of SNP)
- Perl script (SNP_in_ORF_nonsyn.pl) infers whether SNPs are located within an ORF and whether the SNP results in an amino acid change. The script gets the SNP position from the VCF file, mutates the position in the original sequence in the unigene fasta file, then translates that sequence according its ORF (from PEP file) and then checks whether the original sequence differs from the mutated sequence. The script uses bioperl.
è Output: each line in the VCF file that contains a nonsynonymous SNP. At the end, the number of synonymous and nonsynonymous is also outputted.
I made the script and data available here: http://users.ugent.be/~slvbelle/NGS/
(I added an example PEP and VCF file which should work)
The script should be used as follows:
./SNP_in_ORF_nonsyn.pl Trinity_GC018ALL_unique.fasta PEP.fasta SNP.vcf > output"
Leave a comment:
-
-
Originally posted by bioinfun View PostHi
Anyone has any ideas how would one find out (programmatically) synonymous and non-synonymous snps from vcf files? I have used mpileup on several hundred bacterial genomes to get the vcf file.
Thanks
Leave a comment:
-
-
What I've done is using the coordiante from the vcf to get the sequence around and including the SNP. Then I blastx those sequences against a database of the proteins from that bacterium. Then I parse the blastx to find out which changes cause amino acid differences.
But yes, annovar is easier, if you can get a file for annovar to use to compare to.
Leave a comment:
-
-
You can try comparing the coordinates in the variant VCF with the coding region start/ends in refseq to see where your variant falls in and make a determination based on that.
Leave a comment:
-
-
Thanks guys but....
I am trying to program it myself and I thought I can get some leads into how to do this from a vcf file.
What do you think of this quick way of doing this:
1- get the nucleotide sequence of the CDS that has the SNP
2- perform 6-frame translation
3- compare with reference translated sequence
4- if the sequences are different then the SNP at point (1) is non-syn if they are the same then its syn.
Not accurate but will give you an idea. What do you guys think?
Leave a comment:
-
-
Or Ensembl's VEP (http://www.ensembl.org/tools.html) or snpEff (http://snpeff.sourceforge.net/) or...
Leave a comment:
-
Latest Articles
Collapse
-
by seqadmin
The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...-
Channel: Articles
Yesterday, 11:48 AM -
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-20-2025, 05:03 AM
|
0 responses
26 views
0 reactions
|
Last Post
by seqadmin
03-20-2025, 05:03 AM
|
||
Started by seqadmin, 03-19-2025, 07:27 AM
|
0 responses
33 views
0 reactions
|
Last Post
by seqadmin
03-19-2025, 07:27 AM
|
||
Started by seqadmin, 03-18-2025, 12:50 PM
|
0 responses
25 views
0 reactions
|
Last Post
by seqadmin
03-18-2025, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
190 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
Leave a comment: