Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Get fasta amino-acid BLAST result

    Hi everybody,

    I come here with a beginner question. Sorry.

    I'm learning Perl and I begin using BioPerl.

    This is the job I want to do : get full amino-acid sequences obtained by TblastN on a nucleotide database.... I explain :

    I "TblastN" a protein sequence of interest (let's say HumanProt) on a local EST database (let's say FishEST) . Then my blast result output is an alignement of the query protein with matching portions of FishEST sequences (the classical default BLAST output). Of course this output contains many hits, each containing one or several hsp.

    In this output, each hsp is shown as a translation in the frame giving the best alignment with the query.


    Score = 51.2 bits (121), Expect = 2e-08, Method: Compositional matrix adjust.
    Identities = 25/81 (30%), Positives = 44/81 (54%), Gaps = 0/81 (0%)
    Frame = +1

    Query 513 QIEGPEGCNLFIYHLPQEFTDTDLASTFLPFGNVISAKVFIDKQTSLSKCFGFVSFDNPD 572

    Sbjct 607 QEEATKFSRIYVSSIHGDLTDRDVKSVFEAFGHIVSIDLAPDNVPGKHRGWGYVEYDNPK 786

    Query 573 SAQVAIKAMNGFQVGTKRLKV 593

    Sbjct 787 SAADAIASXNLFDLGGQXLRV 849


    At this point, I'm able to display only sequence IDs (-outfmt 6 sallac) and to extract them from my EST database (blastdbcmd). But what I extract are nucleotide sequences. If I want amino-acid sequences, I have to translate them in 6 frames and to extract the amino-acid sequence corresponding to my blast result.... I lost so much time

    This is my dream : I would like the output of TblastN research to be the tranlation of the retreived sequences (I mean the entire sequences, not only the matching portion), in the frame corresponding to the best hsp of each hit.... in a fasta format of course

    Do you think that it's possible ? is it necessary to use BioPerl to do that, or does it exist a Blast output format correponding to what I want (I checked in Blast User Manual but I haven't found).

    Thank you for your help.

    Alex

  • #2
    Hi alex,

    I just wanted to say: It seems we have very similar problems and no idea.

    I didn't get your problem clearly enough, but I have a general advice:

    Check all the out format options you have in tblastn and blastdbcmd? You get them by typing tblastn -help and blastdbcmd -help.

    Above that there are perl scripts in the BLAST book written by Korf, Yandell and Bedell. With these you can handle your output for example get hits with higher ninety percent identity.

    Comment


    • #3
      Originally posted by aliealexandre View Post
      Hi everybody,

      I come here with a beginner question. Sorry.

      I'm learning Perl and I begin using BioPerl.

      This is the job I want to do : get full amino-acid sequences obtained by TblastN on a nucleotide database.... I explain :

      I "TblastN" a protein sequence of interest (let's say HumanProt) on a local EST database (let's say FishEST) . Then my blast result output is an alignement of the query protein with matching portions of FishEST sequences (the classical default BLAST output). Of course this output contains many hits, each containing one or several hsp.

      In this output, each hsp is shown as a translation in the frame giving the best alignment with the query.


      Score = 51.2 bits (121), Expect = 2e-08, Method: Compositional matrix adjust.
      Identities = 25/81 (30%), Positives = 44/81 (54%), Gaps = 0/81 (0%)
      Frame = +1

      Query 513 QIEGPEGCNLFIYHLPQEFTDTDLASTFLPFGNVISAKVFIDKQTSLSKCFGFVSFDNPD 572

      Sbjct 607 QEEATKFSRIYVSSIHGDLTDRDVKSVFEAFGHIVSIDLAPDNVPGKHRGWGYVEYDNPK 786

      Query 573 SAQVAIKAMNGFQVGTKRLKV 593

      Sbjct 787 SAADAIASXNLFDLGGQXLRV 849


      At this point, I'm able to display only sequence IDs (-outfmt 6 sallac) and to extract them from my EST database (blastdbcmd). But what I extract are nucleotide sequences. If I want amino-acid sequences, I have to translate them in 6 frames and to extract the amino-acid sequence corresponding to my blast result.... I lost so much time

      This is my dream : I would like the output of TblastN research to be the tranlation of the retreived sequences (I mean the entire sequences, not only the matching portion), in the frame corresponding to the best hsp of each hit.... in a fasta format of course

      Do you think that it's possible ? is it necessary to use BioPerl to do that, or does it exist a Blast output format correponding to what I want (I checked in Blast User Manual but I haven't found).

      Thank you for your help.

      Alex

      what do you mean by "I would like the output of TblastN research"?
      If I understand correctly,you are querying a nucleotide sequence againt nr with tblastn and would like to extract the corresponding amino acid sequence in fastq format?

      BLAST any flavour gives you an xml out put format (-m option) which is what is exploited by most of the 'Bio' utilities.
      You could write a quick perl script to read/write out all the xml tags that you want and write out in any fashion you want.

      Comment


      • #4
        Aparna,

        you have well understood. (I'm sorry about my horrible english, I'm a french frog;-))

        I agree with you about xml format followed by sparsing with Perl. However if I do that (and I did) I can only retreive the portion of the sequence that match with my query. I mean the portion of the sequence that appears in the blast output.

        What I want is a translation of the entire sequence. I didn't find any script to do that (please consider that I m a beginner not able to write a script by myself)

        Thanks for your advices.

        Alex

        Comment


        • #5
          Hey Alex,
          Thats all right.

          BLAST only outputs the extent that it matches to your sequences.If you want the entire sequneces from data base,the only way that I know to work around is to use -I T option which gives you the 'gi' accessions of the db sequences.
          You can use these accessions to fetch out complete sequences from nr.
          You could use e-utils but its little complicated for a biginner you could copy paste the space delimited gi accessions within the search bar and get the sequences .

          Thx

          Comment


          • #6
            Alex, did you ever find a solution to this? If so would you mind sharing? I have the exact some problem, and I am also a beginner. I used tblastn to compare my own de novo assembled contigs to a protein database. I would like to extract the full translated sequences that come up as hits, ideally from the blast results themselves. It would be very great to avoid doing the translations myself, and then finding the correct reading frame.

            Comment


            • #7
              Hello,

              I am trying to translate my DNA sequences to protein with fasta files that can contain as many as 7,000 sequences. Instead of translating to all 6 reading frames I would like to perform a tBLASTx and extract the best protein sequence with the best reading frame according to blast results. Does anyone know the best way of doing this? It sounds very much like what everyone in this thread has done or has tried to do.

              Thanks so much for any help!

              Comment


              • #8
                dude,

                1. tblastx your query with the database sequences, and save the output in the standard format. (most important: The ID and frame)
                2. retrieve such sequences by ID and make a fasta file.
                3. translate these sequences to all frames
                4. compare with the tblastx results.

                depends on the amount of sequences, it may take a couple of minutes or hours :s

                greets

                Comment


                • #9
                  hello everyone!
                  seems like the thread is several years old. so @ Alex, did u find the way to solve your problem? if u did would you mind share the process?!
                  i blasted my nucleotide fasta file to protein db fasta file and got the blast result. now i also want to extract blasted nucleotide sequences' corresponding translated amino acid sequences form blastx result file. if those extracted sequences in a fasta format that would be great.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Latest Developments in Precision Medicine
                    by seqadmin



                    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                    Somatic Genomics
                    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                    05-24-2024, 01:16 PM
                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    05-06-2024, 07:48 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 05-24-2024, 07:15 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-23-2024, 10:28 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-23-2024, 07:35 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-22-2024, 02:06 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X