Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate .xml file from BLAST (Blast2Go)

    Hello,

    I need to generate an .xml file from BLAST starting with a list of genes (currently in .txt format). I need this to use it as input to Blast2Go.

    Do I require a script to do this, or is there an easier way to do this directly on the online BLAST website?

    Many many thanks....

  • #2
    When you do the blast at NCBI on the blast results page you have the option of saving the alignments as an XML file. See this post for a screenshot: http://seqanswers.com/forums/showpos...92&postcount=7

    Comment


    • #3
      Thank you!

      Thanks so much for the reply!

      I'm actually unsure how to put my whole list into BLAST NCBI as well! (

      If you know of how to do that, I will hopefully be able to reach the results page you kindly provided the screenshot for! Thank you!!!!

      Comment


      • #4
        Those numbers you posted in this thread (http://seqanswers.com/forums/showthread.php?t=45711) are not going to work at NCBI. You will need to parse out the protein sequences from this file http://marinegenomics.oist.jp/genome...0.1.prot.fa.gz.

        If the ID's you have match ones in this protein fasta sequence file then you can use faSomeRecords program from Kent utilities to extract the protein sequences of your interest (http://seqanswers.com/forums/showpos...0&postcount=13).

        You can then use them for the Blast search at NCBI.

        Comment


        • #5
          Thanks so much for your help!

          I am not sure how to parse out the proteins, though. I need to do this pretty fast (unfortunately). Is this a simple process? Any scripts online that I could borrow?

          This would make things much easier, because then I could just work witht he proteins in BLAST.... Thanks again...

          Comment


          • #6
            I posted the procedure in #4 above. Here is a step-wise version. You would need to have access to a linux machine (or OS X) for this to work though.

            1. You will need to download the protein sequence file and then gunzip it (uncompress).

            2. Have your ID's of interest in a text file.

            3. Run this program http://hgdownload.soe.ucsc.edu/admin.../faSomeRecords (this is the linux version) like below.

            Code:
            $   faSomeRecords protein_file.fa yourlistFile output_with_proteins_of_interest.fa

            Comment


            • #7
              Thanks. Does MacOSX count as Linux?

              Comment


              • #8
                Originally posted by PurplePancake View Post
                Thanks. Does MacOSX count as Linux?
                OS X is a certified variant of unix.

                Use the Mac version of the faSomeRecords program in that case in step 3 above: http://hgdownload.soe.ucsc.edu/admin.../faSomeRecords

                Comment


                • #9
                  Thanks so much! I did as you said, but get the error

                  "-bash: faSomeRecords: command not found"

                  I am *really bad* with Linux, and especially installation. I hope to take a course in two semesters so I can do this... because this happens a lot, and I never figure it out... (
                  Last edited by PurplePancake; 08-10-2014, 06:35 PM. Reason: clarity

                  Comment


                  • #10
                    Basically, all I did was download the faSomeRecords file, and moved the other two files into the Download folder. Then I typed:

                    faSomeRecords prot.fa geneList.rtf outProteins.fa

                    I know there are additional steps to do to "prepare the command"? But I always mess things up, and really freeze when it comes to installation....
                    Last edited by PurplePancake; 08-10-2014, 06:34 PM. Reason: clarity

                    Comment


                    • #11
                      Hello,

                      I don't feel that I really changed much, but for some reason, there is no error any more. I just did this instead:

                      ./faSomeRecords prot.fa geneList.odt outProteins.fa

                      There is no error, but there is also nothing in outProteins.fa.

                      Does this mean there is not enough information to determine proteins and genes identification in geneList?\

                      Thanks!!

                      Comment


                      • #12
                        Also, is there any reason why nothing would match the proteins? I had asked similar questions elsewhere, and was told that since coral was so old, and branched off before mammals, then some of this RNAseq can be difficult/impossible?

                        Okay, I will stop asking so many questions now

                        Comment


                        • #13
                          The ID's you posted in the other thread do not seem to match the protein ID's for both sets here: http://marinegenomics.oist.jp/genome...s?project_id=3 What file did you get your ID's from?
                          Last edited by GenoMax; 08-11-2014, 05:01 AM.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Essential Discoveries and Tools in Epitranscriptomics
                            by seqadmin




                            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                            04-22-2024, 07:01 AM
                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 08:47 AM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          60 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          60 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          54 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X