Announcement

Collapse
No announcement yet.

snpEff and extracting EFF field

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • snpEff and extracting EFF field

    Hi,
    Currently I am using snpEff to annotate vcf files. The output is also a vcf with the info field populated by several annotations including the gene names.

    For a sample, I am showing the annotations for five consecutive rows:

    DOWNSTREAM(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000469563|),DOWNSTREAM(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000487214|),INTRON(LOW||||NOC2L|processed_transcript|CODING|ENST00000327044|),TRANSCRIPT(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000477976|)
    INTRON(LOW||||KLHL17|protein_coding|CODING|ENST00000455747|),INTRON(LOW||||KLHL17|protein_coding|CODING|ENST00000540863|),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|Gcc/Acc|A111T|KLHL17|protein_coding|CODING|ENST00000338591|exon_1_896673_896932),TRANSCRIPT(MODIFIER||||KLHL17|protein_coding|CODING|ENST00000463212|),TRANSCRIPT(MODIFIER||||KLHL17|protein_coding|CODING|ENST00000473277|),UPSTREAM(LOW||||KLHL17|protein_coding|CODING|ENST00000466300|),UPSTREAM(LOW||||KLHL17|protein_coding|CODING|ENST00000481067|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000327044|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000469563|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000477976|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000487214|),UPSTREAM(LOW||||PLEKHN1|protein_coding|CODING|ENST00000379407|),UPSTREAM(LOW||||PLEKHN1|protein_coding|CODING|ENST00000379409|),UPSTREAM(LOW||||PLEKHN1|protein_coding|CODING|ENST00000379410|)
    DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000263743|),DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000345100|),DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000378888|),DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000378891|),DOWNSTREAM(MODIFIER||||GLTPD1|processed_transcript|CODING|ENST00000343938|),DOWNSTREAM(MODIFIER||||GLTPD1|processed_transcript|CODING|ENST00000464957|),SYNONYMOUS_CODING(LOW|SILENT|ggG/ggA|G384|TAS1R3|protein_coding|CODING|ENST00000339381|exon_1_1267404_1268186)
    TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000317673|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000340677|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000341832|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000407249|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000513088|)
    TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000317673|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000340677|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000341832|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000407249|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000513088|)

    My question is how do I extract specific fields like the gene names for all the rows?? vcftools doesn't help as it can only extract the whole info field with all these annotations.

    Thanks
    -Kasthuri

  • #2
    SNPsift from the same author has the ability to do this step

    Comment


    • #3
      Originally posted by Jon_Keats View Post
      SNPsift from the same author has the ability to do this step
      SnpSift tool is able to query and obtain the lines that matches the query. But I just want the subfield of the EFF. Looks like I may have to do the text processing myself - not too bad though!

      Thanks anyway.

      -K

      Comment

      Working...
      X