Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with blastx and thousands of equences

    Hi all, I'm running a blastx in my server. Iw had works everytime but now for a large file of sequences is doing something strange.

    The query is the following:

    nohup ../ncbi-blast-2.2.27+/bin/blastx -db ../bin/data/uniprot_kb_2012_06.fasta -query 42000seq.fa -evalue 0.05 -max_target_seqs 5 -outfmt 5 -num_threads 10 -out ouBlastxXML &

    where uniprot_kb_2012... is a dataset containing all the protein taken from ncbi.
    42000seq.fa is a file containing 42thousands sequences in fasta format

    the ouput I want is in XML...

    I runned it one week ago and obtained a completely empty file!

    Now with nohup it's writing this:

    Selenocysteine (U) at position 73 replaced by X
    Selenocysteine (U) at position 40 replaced by X
    Selenocysteine (U) at position 52 replaced by X
    Selenocysteine (U) at position 48 replaced by X
    Selenocysteine (U) at position 37 replaced by X
    Selenocysteine (U) at position 40 replaced by X
    Selenocysteine (U) at position 40 replaced by X

    ...and other similar lines...

    and the xml file is still empty...

    What's happening?

    The same command on a query of ten sequences works well.

    Someone knows where can I been wrong?

    bye and thanks
    Angelo

  • #2
    Don't know what is wrong but you best bet is to run Blast with a small group of sequences and then put all of the XML files together.

    Comment


    • #3
      It seems BLAST+ is a bit silly with the XML output and doesn't write it out incrementally.

      I personally split the input into batches of 1000 queries (works well for spreading the work over a cluster).

      Comment


      • #4
        can the reason be the fact that I use an XML file to extract informations?
        Last edited by angeloulivieri; 09-28-2012, 12:24 AM.

        Comment


        • #5
          Originally posted by angeloulivieri View Post
          can the reason be the fact that I use an XML file to extract informations?
          If you use the text or tabular output, you should see results written to the file while BLAST+ is running.

          Comment


          • #6
            so only with these options? The 6,7 and 8...
            With other types my output will be full only at the end of the computation?

            Comment


            • #7
              I've only noticed a problem of delayed output with the XML output format.

              Comment


              • #8
                Ok. The first time I used blastx with these 40thousands sequences it doesn't give me nothing and for me was very strange. Now I'm trying using a not-XML output file cause I need only to use some bioPerl functions to watch for results.

                Thanks

                Comment


                • #9
                  Originally posted by angeloulivieri View Post

                  Selenocysteine (U) at position 73 replaced by X
                  Selenocysteine (U) at position 40 replaced by X
                  Selenocysteine (U) at position 52 replaced by X
                  Selenocysteine (U) at position 48 replaced by X
                  Selenocysteine (U) at position 37 replaced by X
                  Selenocysteine (U) at position 40 replaced by X
                  Selenocysteine (U) at position 40 replaced by X
                  Hi-
                  Just for information, from http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml this is due to
                  For protein code, U is replaced by X first before the search since it is not specified in any scoring matrices.
                  so this is nothing wrong. I don't know about the rest...

                  Best
                  Dario

                  Comment


                  • #10
                    Thank you Dario

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Recent Advances in Sequencing Analysis Tools
                      by seqadmin


                      The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                      05-06-2024, 07:48 AM
                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 07:03 AM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 05-10-2024, 06:35 AM
                    0 responses
                    31 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 05-09-2024, 02:46 PM
                    0 responses
                    41 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 05-07-2024, 06:57 AM
                    0 responses
                    34 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X