Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standalone Blast+ Database Help

    Hello,

    I have installed Blast+ on my local computer; however, I am confused about the database creation. I have a binary of COGS that I want to blast against different alignments generated from Abyss to select the best one. Therefore I have the following questions:

    1. When creating the database, is there a way to create the whole binary,or do I have to do it file by file?

    2. How do I batch blast all the COGS against the different alignments generated from Abyss?

    I will really appreciate your help.

    Thank you

  • #2
    What do you mean by a 'binary of COGS'? You need a (plain text) FASTA file to make a BLAST database, or to use as BLAST queries.

    Comment


    • #3
      Originally posted by maubp View Post
      What do you mean by a 'binary of COGS'? You need a (plain text) FASTA file to make a BLAST database, or to use as BLAST queries.
      Hi maubp,

      By binary of COGS I mean a folder containing 350 FASTA files that I want to blast against 20 differnt contigs.fa assembly files generated by Abyss, and then based on the results pick the best alignment. Is there a way to batch BLAST all COGS against all assembly files? I would really appreciate your help.

      Comment


      • #4
        I guess what you want to know is which assembly has all 350 of your COGs? If that's the case, then you'd want to make 20 blast databases, one for each assembly. Then concatenate (join) together all of the COG sequences into one fasta file somehow (easy to do on linux or mac command line). Then on the command line you can run one batch blast of all 350 sequences at once. do that for each blast database and you will have 20 huge blast reports...
        You can maybe limit the blast report to show only one hit per COG sequence and also format it for tab-delimited text which you could import into a spreadsheet to examine somehow.
        Or you could use a desktop application like Geneious to do this.
        If it were me I would either use Geneious or write a script to analyze the blast reports.

        Comment


        • #5
          Yes you can do the blast in one go.

          1. combine your 350 fasta files into 1 file. In command line you can simply use 'cat' command. Maker sure the headers for each sequence is unique.

          2. Do the same for your assemblies from Abyss. Note however your contigs in your assemblies might have the same header name if they were created separately. You will need to somehow rename the header for each assembly to be specific for that assembly
          e.g. from >k40_000001 to something like >asm1_k40_000001 for 1st assembly, >asm2_....
          etc.

          If however the headers are already unique don't worry about this.

          3. Create a blast database on the combined assemblies from Abyss

          4. Run blast. You might want to make it only output 1 match per sequence by the '-max_target_seqs' parameter (set it to -max_target_seqs 1), and output a table format for easy parsing using the '-outfmt' parameter (set it to -outfmt 6).
          (note however if you make it output only the best hit, you might be missing on other information. Play around with the outfmt parameter to get a format you like).

          You can get a full explanation of the blastn commands by typing 'blastn -help'

          Comment


          • #6
            Thank you all for your help. I will give it a try and let you know if I have any problems.

            Comment


            • #7
              Hello,

              So I am having issues running the following blastx commands:

              1.) "blastx -db ../gjhk34 -query verifyCOSIIcombined.fasta -evalue 0.00001 -max_target_seqs 1 -num_threads 8 -outfmt '10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score' -out output.blastx.csv", but I get the following error:

              BLAST Database error: No alias or index file found for protein database [../gjhk34] in search path [/home/youngsook/Documents/blast::]


              2.)"blastx -db "../gjhsk34/gjk34-contigs.fa" -query verifyCOSIIcombined.fasta -evalue 0.00001 -max_target_seqs 1 -num_threads 8 -outfmt '10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score' -out output.blastx.csv" and runs perfectly.

              My questions is: After creating the database, do I run it against the .fa file or against any of the ".phr, ".pin", ".psq", ".pal" database files? I noticed on other examples that the database on the command does not have an index like "nr" and still works.

              I really appreciate your help.

              Thank you,

              -Milo

              Comment


              • #8
                If your database files are named gjhk34.phr, gjhk34.pin, etc, the database name is just gjhk34 only.

                If your database files are named gjhk34.fa.phr, gjhk34.fa.pin, etc, the database name is gjhk34.fa instead.

                You can have either of these situations from a FASTA file gjhk34.fa depending on the options you used for makeblastdb.

                Comment


                • #9
                  Hello,

                  The command that I used to create the database is:

                  makeblastdb -in gjhk34-contigs.fa -dbtype prot -parse_seqids

                  However, I am running the following command:

                  "blastx -db "../gjhsk34/gjk34-contigs.fa" -query verifyCOSIIcombined.fasta -evalue 0.00001 -max_target_seqs 1 -num_threads 8 -outfmt '10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score' -out output.blastx.csv"

                  After blastx is done running, I get a csv file but it is empty, even if I change the output to a .out file. Do you know what I am doing wrong or why it is generating an empty output?

                  Once again, thank you for your help.

                  Comment


                  • #10
                    Why are you including the quotes here?

                    -db "../gjhsk34/gjk34-contigs.fa"
                    Just checking to confirm that "gjk34-contigs.fa" contains protein sequences (since you are going blastx).

                    When debugging an problem like this make a test file with just one query sequence. This way you can debug problems rapidly instead of waiting for the full set to go through.
                    Last edited by GenoMax; 10-07-2013, 11:38 AM.

                    Comment


                    • #11
                      Hi GenoMax,

                      I am including quotes to point to my database location, but even if I dont include the quotes I would still get an empty output.

                      I am pretty sure "gjk34-contigs.fa" is a protein sequence. I created the database with the "-dbtype prot." So basically what you are saying is that if my contigs.fa file is not a protein sequence, I first need to translate it to protein and then create the database? Below is a screenshot of the gjk34-contigs.fa file...

                      Comment


                      • #12
                        The screenshot did not come through. Use the "Go advanced" button as you are editing the message. That will allow you to attach PNG files to your post.

                        If "gjk34-contigs.fa" has DNA sequence (which looking at the name may be the case) you will need to do "tblastx" if you wanted to do a ranslated query/db search.

                        NOTE: Just checked the screenshot link in your post (https://www.dropbox.com/s/3uznelmn1d...contigs.fa.png). That is indeed DNA sequence. So that is the reason you are not getting anything in the output. You can't do a "blastx" search against DNA database.
                        Last edited by GenoMax; 10-07-2013, 12:16 PM.

                        Comment


                        • #13
                          Originally posted by milo0615 View Post
                          After blastx is done running, I get a csv file but it is empty, even if I change the output to a .out file. Do you know what I am doing wrong or why it is generating an empty output?
                          If there are no BLAST hits, then the tabular and csv output would be emtpy.

                          Try asking for commented tabular, commented cvs, or the default plain text output to double check this.

                          Comment


                          • #14
                            Hi All,

                            Yes, I had to re-create my database and now it works perfectly. However, I do have a few more questions:

                            - What would be the best way or the best practice to analyze all of the blast results to check for the assembly with the most hits?

                            - Is there a free application that would help with the analysis?

                            I was thinking about exporting all the blast results into excel and then analyze them from there....

                            Thank you

                            Comment


                            • #15
                              Originally posted by milo0615 View Post
                              Hi All,

                              Yes, I had to re-create my database and now it works perfectly. However, I do have a few more questions:

                              - What would be the best way or the best practice to analyze all of the blast results to check for the assembly with the most hits?

                              - Is there a free application that would help with the analysis?

                              I was thinking about exporting all the blast results into excel and then analyze them from there....

                              Thank you
                              CLI is by far the most efficient way handle large tables. Google: man sort, man awk, man sed, man grep, man cut, and man paste. There are related threads in this forum too. Then R for statistical analysis and plotting.
                              Last edited by rhinoceros; 10-13-2013, 04:35 AM.
                              savetherhino.org

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM
                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 05-24-2024, 07:15 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 10:28 AM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 07:35 AM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-22-2024, 02:06 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X