Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with BLAST+ HSP

    Dear Community

    I have a very annoying problem. In our laboratory we’re going to launch automated BLAST-searches. Therefore we’re using BLAST+ and its output-files. We have chosen the old XML-Format (-outfmt 5). Everything worked fine until we started searching for a “max_target_seqs” value higher than 50. In this case the “<hsp>” - sections were so numerous that our Java-Software (while reading >80k lines) crashed. Is there a possibility to limited the <hsp> results per hit? May be one hsp per hit?

    Our command for blast+ is something like that:

    …\blastn.exe -db nt -remote -task blastn-short -outfmt 5 -max_target_seqs 500 -evalue 10.00 -word_size 7 -gapopen 5 -gapextend 2 -reward 1 -penalty -3 -out xml.gb -query seq.fas

    Greetings from Germany
    Jesfreric

  • #2
    Assuming you are using latest blast+, it has this option. It is per query though (not per hit, if that is what you are looking for).

    -max_hsps <Integer, >=1>
    Set maximum number of HSPs per subject sequence to save for each query
    Additionally you could experiment with

    -qcov_hsp_perc <Real, 0..100>
    Percent query coverage per hsp
    Last edited by GenoMax; 10-26-2015, 04:02 AM.

    Comment


    • #3
      Dear GenoMax

      Thanks for your answer. Is it necessary to write the Integer + brackets?
      -max_hsps <Integer, >=1>

      Because I wrote "-max_hsps 1" few minutes ago and it didn't work.

      Have a nice day
      Jesfreric

      Comment


      • #4
        Try splitting up the input sequences into 1MB chunks.

        So here we have a problem, that the limit of the number of the input sequences in the blast only limits a number of fasta entries in the input database, to which the hits are reported.
        If a whole chromosome has a 10K HSP's bellow given E, than it would report all of them...
        There are several possible solutions:

        1. Change the program, so it does on the fly parsing HSP - by - HSP , without reading entire results file into RAM.
        2. Increase ram available to JVM: java -Xmx4G or more...
        3. Lower the E-vallue, increase the -W (word size)
        4. Chop the nt database into 1Mbp chunks (take nt.fasta, parse the sequences, and every sequence longer than 1Mbp gets chopped up onto 1Mbp segments).
        5. Write a perl script to skip second and other HSP's, report only the fist HSP per hit...
        6. Have a catalog of the repetitive sequences, and fist search for hits against it, and then the whole nt (if no hits found)...
        BTW: Legacy blast has the same problem...
        Last edited by Markiyan; 10-26-2015, 04:30 AM. Reason: Typo fix/clarification.

        Comment


        • #5
          Originally posted by Jesfreric View Post
          Dear GenoMax

          Thanks for your answer. Is it necessary to write the Integer + brackets?
          -max_hsps <Integer, >=1>

          Because I wrote "-max_hsps 1" few minutes ago and it didn't work.

          Have a nice day
          Jesfreric
          Just the integer should be fine. Did you get an error?

          Comment


          • #6
            Hi

            No I'm getting no error. But the number of <hsp> is still as high as before...

            Comment


            • #7
              Is it possible, that the -max_hsps is only working on 64-bit systems?

              Comment


              • #8
                -max_hsps is working for me (tried a blastn search with nt). I don't think it would work only on 64-bit. Are you running 32-bit blast?
                Last edited by GenoMax; 10-26-2015, 04:55 AM.

                Comment


                • #9
                  At the moment yes. But we're going to update the system soon.
                  Did you try it with -task blastn-short too?

                  Comment


                  • #10
                    Is it possible that the usage of -task blastn-short will ignore all other parameter set by the user? Because the parameters are going to be adjusted automatically?

                    Greetings
                    Jesfreric

                    Comment


                    • #11
                      I am not sure. You can email blast support staff at NCBI to check. It may take a couple of days but they are good about responding.

                      Edit: I tried a regular DNA fasta file with blastn-short and max_hsps did work as advertised.

                      Are you using blastn-short with actual NGS reads?
                      Last edited by GenoMax; 10-26-2015, 09:11 AM.

                      Comment


                      • #12
                        Dear GenoMax

                        At the beginning I used blastn-short but now I have chosen -task blastn and adjust the parameter equivalent to blastn-short. Unfortunately that did not working too. Would it be possible for you to blast the following sequence?

                        XXX XXX XXX XXX XXX XXX XXX

                        And could you use the following commands?
                        …\blastn.exe -db nt -task blastn -query seq.fsa -out xml2.gb -remote -outfmt 5 -evalue 1000 -word_size 7 -max_hsps 1

                        The result I get with these settings is a xml-file with 499 <hit> sections and 5153 <hsp> sections.
                        It would be very interesting if you get the same results.

                        I’m using blast-2.2.31+ and (still) 32-bit version of Windows…

                        Greetings
                        Richard
                        Last edited by Jesfreric; 10-27-2015, 05:58 AM.

                        Comment


                        • #13
                          @Richard: -max_hsps option is working as expected with your sequence with blastn or blastn-short. (With a local copy of nt and outfmt 6 since it is easier to look at a glance)

                          With -outfmt 5 I see 500 <Hit> and <Hit_num> sections. So on windows blast appears to be doing something different.

                          Comment


                          • #14
                            If you're using -outfmt 5, do you see the whole 5000 <hsp> too?

                            Comment


                            • #15
                              No. Only 500 <Hsp> or </Hsp> sections.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X