Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate .xml file from BLAST (Blast2Go)

    Hello,

    I need to generate an .xml file from BLAST starting with a list of genes (currently in .txt format). I need this to use it as input to Blast2Go.

    Do I require a script to do this, or is there an easier way to do this directly on the online BLAST website?

    Many many thanks....

  • #2
    When you do the blast at NCBI on the blast results page you have the option of saving the alignments as an XML file. See this post for a screenshot: http://seqanswers.com/forums/showpos...92&postcount=7

    Comment


    • #3
      Thank you!

      Thanks so much for the reply!

      I'm actually unsure how to put my whole list into BLAST NCBI as well! (

      If you know of how to do that, I will hopefully be able to reach the results page you kindly provided the screenshot for! Thank you!!!!

      Comment


      • #4
        Those numbers you posted in this thread (http://seqanswers.com/forums/showthread.php?t=45711) are not going to work at NCBI. You will need to parse out the protein sequences from this file http://marinegenomics.oist.jp/genome...0.1.prot.fa.gz.

        If the ID's you have match ones in this protein fasta sequence file then you can use faSomeRecords program from Kent utilities to extract the protein sequences of your interest (http://seqanswers.com/forums/showpos...0&postcount=13).

        You can then use them for the Blast search at NCBI.

        Comment


        • #5
          Thanks so much for your help!

          I am not sure how to parse out the proteins, though. I need to do this pretty fast (unfortunately). Is this a simple process? Any scripts online that I could borrow?

          This would make things much easier, because then I could just work witht he proteins in BLAST.... Thanks again...

          Comment


          • #6
            I posted the procedure in #4 above. Here is a step-wise version. You would need to have access to a linux machine (or OS X) for this to work though.

            1. You will need to download the protein sequence file and then gunzip it (uncompress).

            2. Have your ID's of interest in a text file.

            3. Run this program http://hgdownload.soe.ucsc.edu/admin.../faSomeRecords (this is the linux version) like below.

            Code:
            $   faSomeRecords protein_file.fa yourlistFile output_with_proteins_of_interest.fa

            Comment


            • #7
              Thanks. Does MacOSX count as Linux?

              Comment


              • #8
                Originally posted by PurplePancake View Post
                Thanks. Does MacOSX count as Linux?
                OS X is a certified variant of unix.

                Use the Mac version of the faSomeRecords program in that case in step 3 above: http://hgdownload.soe.ucsc.edu/admin.../faSomeRecords

                Comment


                • #9
                  Thanks so much! I did as you said, but get the error

                  "-bash: faSomeRecords: command not found"

                  I am *really bad* with Linux, and especially installation. I hope to take a course in two semesters so I can do this... because this happens a lot, and I never figure it out... (
                  Last edited by PurplePancake; 08-10-2014, 06:35 PM. Reason: clarity

                  Comment


                  • #10
                    Basically, all I did was download the faSomeRecords file, and moved the other two files into the Download folder. Then I typed:

                    faSomeRecords prot.fa geneList.rtf outProteins.fa

                    I know there are additional steps to do to "prepare the command"? But I always mess things up, and really freeze when it comes to installation....
                    Last edited by PurplePancake; 08-10-2014, 06:34 PM. Reason: clarity

                    Comment


                    • #11
                      Hello,

                      I don't feel that I really changed much, but for some reason, there is no error any more. I just did this instead:

                      ./faSomeRecords prot.fa geneList.odt outProteins.fa

                      There is no error, but there is also nothing in outProteins.fa.

                      Does this mean there is not enough information to determine proteins and genes identification in geneList?\

                      Thanks!!

                      Comment


                      • #12
                        Also, is there any reason why nothing would match the proteins? I had asked similar questions elsewhere, and was told that since coral was so old, and branched off before mammals, then some of this RNAseq can be difficult/impossible?

                        Okay, I will stop asking so many questions now

                        Comment


                        • #13
                          The ID's you posted in the other thread do not seem to match the protein ID's for both sets here: http://marinegenomics.oist.jp/genome...s?project_id=3 What file did you get your ID's from?
                          Last edited by GenoMax; 08-11-2014, 05:01 AM.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Choosing Between NGS and qPCR
                            by seqadmin



                            Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                            10-18-2024, 07:11 AM
                          • seqadmin
                            Non-Coding RNA Research and Technologies
                            by seqadmin




                            Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                            Nobel Prize for MicroRNA Discovery
                            This week,...
                            10-07-2024, 08:07 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 11-01-2024, 06:09 AM
                          0 responses
                          18 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-30-2024, 05:31 AM
                          0 responses
                          18 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-24-2024, 06:58 AM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-23-2024, 08:43 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X