Seqanswers Leaderboard Ad
Collapse
X
-
parsing gff3 from NCBI
Hi,
I need some help parsing a gff3 file. Essentially, I am trying to pull out specific fields out of the info section using awk. I can't simply use the column arrangement because the lines do not all have the same info in them so I need to do a match. Here are a couple of lines from the file:
HTML Code:NC_015011.2 Gnomon gene 18691 26481 . + . ID=gene0;Dbxref=GeneID:100538868;Name=LOC100538868;gbkey=Gene;gene=LOC100538868;partial=true;start_range=.,18691 NC_015011.2 Gnomon mRNA 18691 26481 . + . ID=rna0;Parent=gene0;Dbxref=GeneID:100538868,Genbank:XM_010707932.1;Name=XM_010707932.1;gbkey=mRNA;gene=LOC100538868;partial=true;product=hematopoietic lineage cell-specific protein-like;start_range=.,18691;transcript_id=XM_010707932.1 NC_015011.2 Gnomon exon 18691 18743 . + . ID=id1;Parent=rna0;Dbxref=GeneID:100538868,Genbank:XM_010707932.1;gbkey=mRNA;gene=LOC100538868;partial=true;product=hematopoietic lineage cell-specific protein-like;start_range=.,18691;transcript_id=XM_010707932.1 NC_015011.2 Gnomon exon 18865 18994 . + . ID=id2;Parent=rna0;Dbxref=GeneID:100538868,Genbank:XM_010707932.1;gbkey=mRNA;gene=LOC100538868;partial=true;product=hematopoietic lineage cell-specific protein-like;transcript_id=XM_010707932.1
Code:awk -F "\t" '{ print $9 }' mga_ref_Turkey_5.0_NCBI_FINAL_no_GI_no_region.gff3.txt | grep product | awk -F ";" '{ gsub(";","\t",$0);print $0 }' | awk -F "\t" '{for(i=0;i<NF;i++){if($i~/gene\=/){printf $i};if($i~/product\=/){printf $i }}printf "\n"}' | head
HTML Code:ID=rna0 Parent=gene0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 Name=XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like start_range=.,18691 transcript_id=XM_010707932.1ID=rna0 Parent=gene0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 Name=XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like start_range=.,18691 transcript_id=XM_010707932.1gene=LOC100538868product=hematopoietic lineage cell-specific protein-like ID=id1 Parent=rna0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like start_range=.,18691 transcript_id=XM_010707932.1ID=id1 Parent=rna0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like start_range=.,18691 transcript_id=XM_010707932.1gene=LOC100538868product=hematopoietic lineage cell-specific protein-like ID=id2 Parent=rna0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like transcript_id=XM_010707932.1ID=id2 Parent=rna0 Dbxref=GeneID:100538868,Genbank:XM_010707932.1 gbkey=mRNA gene=LOC100538868 partial=true product=hematopoietic lineage cell-specific protein-like transcript_id=XM_010707932.1gene=LOC100538868product=hematopoietic lineage cell-specific protein-like
Tags: None
-
Latest Articles
Collapse
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-20-2025, 05:03 AM
|
0 responses
16 views
0 reactions
|
Last Post
by seqadmin
03-20-2025, 05:03 AM
|
||
Started by seqadmin, 03-19-2025, 07:27 AM
|
0 responses
17 views
0 reactions
|
Last Post
by seqadmin
03-19-2025, 07:27 AM
|
||
Started by seqadmin, 03-18-2025, 12:50 PM
|
0 responses
18 views
0 reactions
|
Last Post
by seqadmin
03-18-2025, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
185 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
Leave a comment: