Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is blastx reliable enough to annotate transcripts from RNA-seq?

    Hi everybody!

    I am a newbie on in silico functional annotation of transcripts assembled from reads produced by RNA-seq. My doubt is about the utilization of blastx for this purpose.

    Many tools, such as blast2go and argot2, are known to use blastx as the first step in a pipeline in which the final goal is to assign gene ontology (GO) terms to transcripts. These tools use some kind of algorithm to transfer the GO terms from the blast hits to the query sequence. So far so good.

    But is it ok to transfer terms from proteins found to be translated from the minus strand of a transcript? Blastx considers all 6 frames to find potential proteins and, therefore, it should not be rare to these tools assign GO terms from these minus-strand-coded proteins to the transcript of interest.

    But is this procedure biologically correct? A transcript is translated from 5' to 3' by ribosomes, so I believe that it is not correct to consider proteins from minus strand. Or am I missing anything from the RNA-seq approach per se?

    Best regards!

  • #2
    Blastx to associate GO term to transcripts

    Hi everyone!

    I think no one has replied me so far because I have made my question hard to understand. So, I'll try to simplify it now!

    Let's suppose I have the following transcript assembled from reads generated by RNA-seq:

    >transcript_A
    tccgcaatgagtcaatactccaccaattgcagggtgtgaaagtataagcacttgaggagcccatcctctaatcaaaactcctctttctttaattctttgctcaaatccattctcaagaatccatttctctaaatcatttaatttatcacctcctcctaatacccatacaaaaggcctatttgactcttctaaaccaagaccaagttccaccatttgcaataatgtcaaacgagataaacttccaagacttgcataaaccacagattctgtttcaaaattatctaaccatttcaagcaatcttgattatcaattgcagttttattaccccttgtaaccaaatcttcaatttccttattacacaaagaaacaggaccaacacaccaaacttttttccctctagctttcctatattctttctcatacacttgctccaactcctcaaaactattaacaattacaccatatgatgattcctcggctaatctgatttgctcagtaacttctttcaatacagaagaactaacagaagtagtatttttcgtcgatcctgaaacctgagctttcgttagttcaactctatcgggtaaatcaggaacaacaaaatactctgaatctgaggttatattttcaagaatgttggaggaaagtattttataggaacataaaagtgagaaacaacaagtaccatgaaaaacaattcttgggatattaaaattttgtgcaatttgagtagtccaaggaaatcccatatctgaaataacacaacttggacttggatttattccttctaagagattttcaacttgttgtttcagcatactaattgcagcaaaaaactttgaagccaagtcaagagaaggaagcatg

    Now, I want to use blast2go or argot2 to assing gene ontology (GO) terms to this transcript. What's the first step? Blastx!

    In this case, blastx finds an ORF highly similar to other proteins in the reading frame -2, e.g. in the reverse complement of this sequence. See below a piece of xml output in which we can see description of hit 1:

    <Hit_num>1</Hit_num>
    <Hit_id>gi|697139547|ref|XP_009623864.1|</Hit_id>
    <Hit_def>PREDICTED: UDP-glycosyltransferase 73C3-like [Nicotiana tomentosiformis] &gt;gi|62241063|dbj|BAD93688.1| glucosyltransferase [Nicotiana tabacum]</Hit_def>
    <Hit_accession>XP_009623864</Hit_accession>
    <Hit_len>496</Hit_len>
    <Hit_hsps>
    <Hsp>
    <Hsp_num>1</Hsp_num>
    <Hsp_bit-score>562.377</Hsp_bit-score>
    <Hsp_score>1448</Hsp_score>
    <Hsp_evalue>0</Hsp_evalue>
    <Hsp_query-from>1</Hsp_query-from>
    <Hsp_query-to>867</Hsp_query-to>
    <Hsp_hit-from>87</Hsp_hit-from>
    <Hsp_hit-to>375</Hsp_hit-to>
    <Hsp_query-frame>-2</Hsp_query-frame>
    <Hsp_hit-frame>0</Hsp_hit-frame>
    <Hsp_identity>289</Hsp_identity>
    <Hsp_positive>289</Hsp_positive>
    <Hsp_gaps>0</Hsp_gaps>
    <Hsp_align-len>289</Hsp_align-len>


    Now let's suppose (again!) that this protein (XP_009623864) is associated to the GO term "metabolic process". Blats2go or argot2 will, based on their algorithms, transfer this term to my transcript_A and, in the end, my transcript_A will be associated to metabolic process.

    However, is this really OK? This GO term is associated to a protein that is highly similar to an ORF in the REVERSE COMPLEMENT of my transcript. But, here, we are considering a transcript and, to the best of my knowledge, a transcript is
    the "final" gene product that will be translated to a protein by ribosomes. In this regard, there is no sense in considering a reverse complement and, accordingly, there is no sense in associating a GO term transferred from a protein coded in the reverse complement of this transcript.

    Please, I kindly ask someone to let me know if I am right or wrong.

    Best regards,

    Marcio

    Comment


    • #3
      Did you follow a "stranded" RNAseq library protocol for this set of samples? If you did not then you had a 50-50 chance of sequencing either strand.

      Comment


      • #4
        Dear GenoMax,

        In fact, other people prepared the library. I am only the "bioinformatics guy"! But, anyway, they didn't follow a stranded library protocol. They used the TruSeq RNA Sample Preparation Kit v2. So, in our case, we have a 50-50 chance of sequencing either strand.

        In this case are the final assembled transcripts mixtures of reads from both strands? Is this the reason why I can consider ORFs in the reverse complement when I use blastx?

        Comment


        • #5
          In this case are the final assembled transcripts mixtures of reads from both strands? Is this the reason why I can consider ORFs in the reverse complement when I use blastx?
          Yes. Since the prep was not strand-specific, your contig's orientation is arbitrary.

          Comment


          • #6
            Kopi-o and GenoMax,

            Thank you for your answers.

            So, if I use a non-stranded protocol to generate my RNA-seq library and after sequencing and assembling I discover that some pairs of my assembled transcripts align with high identity and coverage, but in the reverse complement orientation, may I consider that anything was wrong during the assembly process?

            My question is based on the fact that as the contig orientation is arbitrary due to the way that the library was prepared, then reverse complement sequences can be originated from one common sequence.

            Best,

            Marcio

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Non-Coding RNA Research and Technologies
              by seqadmin




              Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

              Nobel Prize for MicroRNA Discovery
              This week,...
              10-07-2024, 08:07 AM
            • seqadmin
              Recent Developments in Metagenomics
              by seqadmin





              Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
              09-23-2024, 06:35 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:35 AM
            0 responses
            7 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-14-2024, 02:44 PM
            0 responses
            7 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-11-2024, 06:55 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-02-2024, 04:51 AM
            0 responses
            112 views
            0 likes
            Last Post seqadmin  
            Working...
            X