Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best way to find host organism for protein IDs

    Hello,

    I inherited some bioinformatics RNA-Seq from someone else, and I'm trying to make sense of something that they did. The project was looking for differentially expressed genes using DESeq, which works nicely. The output of that is then compared to a gtf file for the reference genome and protein IDs for each of the hits are extracted. A program called fastacmd is then used to get amino acid sequences for those protein IDs. That all makes sense to me. However then those protein sequences are blasted against the kegg database, and the 3 letter code for the best hits for each is then used to assign the organism host to that gene. This doesn't really make sense to me (since the header from the fasta file generated by fastacmd contains the organism name), I'm hoping someone else can help. The work was originally done about a year ago so no one can quite remember the logic of doing it, and the blast results from kegg gave some interesting output so I want to be able to validate it.

    Can anyone offer some insight into their logic? or perhaps suggest a better way?

    Thanks

  • #2
    Perhaps you could install the taxdb database as indicated in this post: http://seqanswers.com/forums/showthread.php?t=30669

    And the also use the information in this blog post:

    This is an open letter to the NCBI BLAST+ team to request two simple enhancements which I think would be extremely useful - first and foremo...

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-25-2024, 11:49 AM
    0 responses
    19 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-24-2024, 08:47 AM
    0 responses
    20 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    62 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X