Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • BM7
    Junior Member
    • Jul 2015
    • 1

    Annotate gene list

    I would like to annotate the results output file from Desq2 so it contains gene names and symbols. The RNA-seq count file I have used comes from Dexseq and contains ensembl gene IDs:
    ENSMUSG00000000001:001
    ENSMUSG00000000001:002
    ENSMUSG00000000001:003
    etc.
    These refer to the the different exons of the gene.
    I cannot annotate the result file because it contains the different exons. So how can I combine or merge the different exon counts for the same gene into one count for the gene?
    Thanks in advance
  • cmccabe
    Senior Member
    • Jul 2012
    • 355

    #2
    I am not sure I understand completely, but if you have a file

    Code:
    ENSMUSG00000000001:001
    ENSMUSG00000000001:002
    ENSMUSG00000000001:003
    ENSMUSG00000000002:001
    ENSMUSG00000000002:002
    ENSMUSG00000000002:002

    you could use:

    Code:
    awk -F':' -v OFS='\t' '{sum[$1]+=$2} END{for (key in sum) print key, sum[key]}' file.txt
    ENSMUSG00000000001      6
    ENSMUSG00000000002      5
    Hope this helps.
    Last edited by cmccabe; 09-11-2015, 06:10 AM. Reason: added awk

    Comment

    Latest Articles

    Collapse

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, Today, 10:09 AM
    0 responses
    9 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, Yesterday, 08:59 AM
    0 responses
    15 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 12:03 PM
    0 responses
    24 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 11:40 AM
    0 responses
    20 views
    0 reactions
    Last Post SEQadmin2  
    Working...