Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate .xml file from BLAST (Blast2Go)

    Hello,

    I need to generate an .xml file from BLAST starting with a list of genes (currently in .txt format). I need this to use it as input to Blast2Go.

    Do I require a script to do this, or is there an easier way to do this directly on the online BLAST website?

    Many many thanks....

  • #2
    When you do the blast at NCBI on the blast results page you have the option of saving the alignments as an XML file. See this post for a screenshot: http://seqanswers.com/forums/showpos...92&postcount=7

    Comment


    • #3
      Thank you!

      Thanks so much for the reply!

      I'm actually unsure how to put my whole list into BLAST NCBI as well! (

      If you know of how to do that, I will hopefully be able to reach the results page you kindly provided the screenshot for! Thank you!!!!

      Comment


      • #4
        Those numbers you posted in this thread (http://seqanswers.com/forums/showthread.php?t=45711) are not going to work at NCBI. You will need to parse out the protein sequences from this file http://marinegenomics.oist.jp/genome...0.1.prot.fa.gz.

        If the ID's you have match ones in this protein fasta sequence file then you can use faSomeRecords program from Kent utilities to extract the protein sequences of your interest (http://seqanswers.com/forums/showpos...0&postcount=13).

        You can then use them for the Blast search at NCBI.

        Comment


        • #5
          Thanks so much for your help!

          I am not sure how to parse out the proteins, though. I need to do this pretty fast (unfortunately). Is this a simple process? Any scripts online that I could borrow?

          This would make things much easier, because then I could just work witht he proteins in BLAST.... Thanks again...

          Comment


          • #6
            I posted the procedure in #4 above. Here is a step-wise version. You would need to have access to a linux machine (or OS X) for this to work though.

            1. You will need to download the protein sequence file and then gunzip it (uncompress).

            2. Have your ID's of interest in a text file.

            3. Run this program http://hgdownload.soe.ucsc.edu/admin.../faSomeRecords (this is the linux version) like below.

            Code:
            $   faSomeRecords protein_file.fa yourlistFile output_with_proteins_of_interest.fa

            Comment


            • #7
              Thanks. Does MacOSX count as Linux?

              Comment


              • #8
                Originally posted by PurplePancake View Post
                Thanks. Does MacOSX count as Linux?
                OS X is a certified variant of unix.

                Use the Mac version of the faSomeRecords program in that case in step 3 above: http://hgdownload.soe.ucsc.edu/admin.../faSomeRecords

                Comment


                • #9
                  Thanks so much! I did as you said, but get the error

                  "-bash: faSomeRecords: command not found"

                  I am *really bad* with Linux, and especially installation. I hope to take a course in two semesters so I can do this... because this happens a lot, and I never figure it out... (
                  Last edited by PurplePancake; 08-10-2014, 06:35 PM. Reason: clarity

                  Comment


                  • #10
                    Basically, all I did was download the faSomeRecords file, and moved the other two files into the Download folder. Then I typed:

                    faSomeRecords prot.fa geneList.rtf outProteins.fa

                    I know there are additional steps to do to "prepare the command"? But I always mess things up, and really freeze when it comes to installation....
                    Last edited by PurplePancake; 08-10-2014, 06:34 PM. Reason: clarity

                    Comment


                    • #11
                      Hello,

                      I don't feel that I really changed much, but for some reason, there is no error any more. I just did this instead:

                      ./faSomeRecords prot.fa geneList.odt outProteins.fa

                      There is no error, but there is also nothing in outProteins.fa.

                      Does this mean there is not enough information to determine proteins and genes identification in geneList?\

                      Thanks!!

                      Comment


                      • #12
                        Also, is there any reason why nothing would match the proteins? I had asked similar questions elsewhere, and was told that since coral was so old, and branched off before mammals, then some of this RNAseq can be difficult/impossible?

                        Okay, I will stop asking so many questions now

                        Comment


                        • #13
                          The ID's you posted in the other thread do not seem to match the protein ID's for both sets here: http://marinegenomics.oist.jp/genome...s?project_id=3 What file did you get your ID's from?
                          Last edited by GenoMax; 08-11-2014, 05:01 AM.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Exploring the Dynamics of the Tumor Microenvironment
                            by seqadmin




                            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                            07-08-2024, 03:19 PM
                          • seqadmin
                            Exploring Human Diversity Through Large-Scale Omics
                            by seqadmin


                            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                            06-25-2024, 06:43 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 07:20 AM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-16-2024, 05:49 AM
                          0 responses
                          38 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-15-2024, 06:53 AM
                          0 responses
                          44 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-10-2024, 07:30 AM
                          0 responses
                          41 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X