Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with blastx and thousands of equences

    Hi all, I'm running a blastx in my server. Iw had works everytime but now for a large file of sequences is doing something strange.

    The query is the following:

    nohup ../ncbi-blast-2.2.27+/bin/blastx -db ../bin/data/uniprot_kb_2012_06.fasta -query 42000seq.fa -evalue 0.05 -max_target_seqs 5 -outfmt 5 -num_threads 10 -out ouBlastxXML &

    where uniprot_kb_2012... is a dataset containing all the protein taken from ncbi.
    42000seq.fa is a file containing 42thousands sequences in fasta format

    the ouput I want is in XML...

    I runned it one week ago and obtained a completely empty file!

    Now with nohup it's writing this:

    Selenocysteine (U) at position 73 replaced by X
    Selenocysteine (U) at position 40 replaced by X
    Selenocysteine (U) at position 52 replaced by X
    Selenocysteine (U) at position 48 replaced by X
    Selenocysteine (U) at position 37 replaced by X
    Selenocysteine (U) at position 40 replaced by X
    Selenocysteine (U) at position 40 replaced by X

    ...and other similar lines...

    and the xml file is still empty...

    What's happening?

    The same command on a query of ten sequences works well.

    Someone knows where can I been wrong?

    bye and thanks
    Angelo

  • #2
    Don't know what is wrong but you best bet is to run Blast with a small group of sequences and then put all of the XML files together.

    Comment


    • #3
      It seems BLAST+ is a bit silly with the XML output and doesn't write it out incrementally.

      I personally split the input into batches of 1000 queries (works well for spreading the work over a cluster).

      Comment


      • #4
        can the reason be the fact that I use an XML file to extract informations?
        Last edited by angeloulivieri; 09-28-2012, 12:24 AM.

        Comment


        • #5
          Originally posted by angeloulivieri View Post
          can the reason be the fact that I use an XML file to extract informations?
          If you use the text or tabular output, you should see results written to the file while BLAST+ is running.

          Comment


          • #6
            so only with these options? The 6,7 and 8...
            With other types my output will be full only at the end of the computation?

            Comment


            • #7
              I've only noticed a problem of delayed output with the XML output format.

              Comment


              • #8
                Ok. The first time I used blastx with these 40thousands sequences it doesn't give me nothing and for me was very strange. Now I'm trying using a not-XML output file cause I need only to use some bioPerl functions to watch for results.

                Thanks

                Comment


                • #9
                  Originally posted by angeloulivieri View Post

                  Selenocysteine (U) at position 73 replaced by X
                  Selenocysteine (U) at position 40 replaced by X
                  Selenocysteine (U) at position 52 replaced by X
                  Selenocysteine (U) at position 48 replaced by X
                  Selenocysteine (U) at position 37 replaced by X
                  Selenocysteine (U) at position 40 replaced by X
                  Selenocysteine (U) at position 40 replaced by X
                  Hi-
                  Just for information, from http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml this is due to
                  For protein code, U is replaced by X first before the search since it is not specified in any scoring matrices.
                  so this is nothing wrong. I don't know about the rest...

                  Best
                  Dario

                  Comment


                  • #10
                    Thank you Dario

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:47 AM
                    0 responses
                    9 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    57 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X