Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • milo0615
    Member
    • Dec 2012
    • 39

    Standalone Blast+ Database Help

    Hello,

    I have installed Blast+ on my local computer; however, I am confused about the database creation. I have a binary of COGS that I want to blast against different alignments generated from Abyss to select the best one. Therefore I have the following questions:

    1. When creating the database, is there a way to create the whole binary,or do I have to do it file by file?

    2. How do I batch blast all the COGS against the different alignments generated from Abyss?

    I will really appreciate your help.

    Thank you
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    What do you mean by a 'binary of COGS'? You need a (plain text) FASTA file to make a BLAST database, or to use as BLAST queries.

    Comment

    • milo0615
      Member
      • Dec 2012
      • 39

      #3
      Originally posted by maubp View Post
      What do you mean by a 'binary of COGS'? You need a (plain text) FASTA file to make a BLAST database, or to use as BLAST queries.
      Hi maubp,

      By binary of COGS I mean a folder containing 350 FASTA files that I want to blast against 20 differnt contigs.fa assembly files generated by Abyss, and then based on the results pick the best alignment. Is there a way to batch BLAST all COGS against all assembly files? I would really appreciate your help.

      Comment

      • mike.t
        Member
        • Mar 2010
        • 36

        #4
        I guess what you want to know is which assembly has all 350 of your COGs? If that's the case, then you'd want to make 20 blast databases, one for each assembly. Then concatenate (join) together all of the COG sequences into one fasta file somehow (easy to do on linux or mac command line). Then on the command line you can run one batch blast of all 350 sequences at once. do that for each blast database and you will have 20 huge blast reports...
        You can maybe limit the blast report to show only one hit per COG sequence and also format it for tab-delimited text which you could import into a spreadsheet to examine somehow.
        Or you could use a desktop application like Geneious to do this.
        If it were me I would either use Geneious or write a script to analyze the blast reports.

        Comment

        • Kennels
          Senior Member
          • Feb 2011
          • 149

          #5
          Yes you can do the blast in one go.

          1. combine your 350 fasta files into 1 file. In command line you can simply use 'cat' command. Maker sure the headers for each sequence is unique.

          2. Do the same for your assemblies from Abyss. Note however your contigs in your assemblies might have the same header name if they were created separately. You will need to somehow rename the header for each assembly to be specific for that assembly
          e.g. from >k40_000001 to something like >asm1_k40_000001 for 1st assembly, >asm2_....
          etc.

          If however the headers are already unique don't worry about this.

          3. Create a blast database on the combined assemblies from Abyss

          4. Run blast. You might want to make it only output 1 match per sequence by the '-max_target_seqs' parameter (set it to -max_target_seqs 1), and output a table format for easy parsing using the '-outfmt' parameter (set it to -outfmt 6).
          (note however if you make it output only the best hit, you might be missing on other information. Play around with the outfmt parameter to get a format you like).

          You can get a full explanation of the blastn commands by typing 'blastn -help'

          Comment

          • milo0615
            Member
            • Dec 2012
            • 39

            #6
            Thank you all for your help. I will give it a try and let you know if I have any problems.

            Comment

            • milo0615
              Member
              • Dec 2012
              • 39

              #7
              Hello,

              So I am having issues running the following blastx commands:

              1.) "blastx -db ../gjhk34 -query verifyCOSIIcombined.fasta -evalue 0.00001 -max_target_seqs 1 -num_threads 8 -outfmt '10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score' -out output.blastx.csv", but I get the following error:

              BLAST Database error: No alias or index file found for protein database [../gjhk34] in search path [/home/youngsook/Documents/blast::]


              2.)"blastx -db "../gjhsk34/gjk34-contigs.fa" -query verifyCOSIIcombined.fasta -evalue 0.00001 -max_target_seqs 1 -num_threads 8 -outfmt '10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score' -out output.blastx.csv" and runs perfectly.

              My questions is: After creating the database, do I run it against the .fa file or against any of the ".phr, ".pin", ".psq", ".pal" database files? I noticed on other examples that the database on the command does not have an index like "nr" and still works.

              I really appreciate your help.

              Thank you,

              -Milo

              Comment

              • maubp
                Peter (Biopython etc)
                • Jul 2009
                • 1544

                #8
                If your database files are named gjhk34.phr, gjhk34.pin, etc, the database name is just gjhk34 only.

                If your database files are named gjhk34.fa.phr, gjhk34.fa.pin, etc, the database name is gjhk34.fa instead.

                You can have either of these situations from a FASTA file gjhk34.fa depending on the options you used for makeblastdb.

                Comment

                • milo0615
                  Member
                  • Dec 2012
                  • 39

                  #9
                  Hello,

                  The command that I used to create the database is:

                  makeblastdb -in gjhk34-contigs.fa -dbtype prot -parse_seqids

                  However, I am running the following command:

                  "blastx -db "../gjhsk34/gjk34-contigs.fa" -query verifyCOSIIcombined.fasta -evalue 0.00001 -max_target_seqs 1 -num_threads 8 -outfmt '10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score' -out output.blastx.csv"

                  After blastx is done running, I get a csv file but it is empty, even if I change the output to a .out file. Do you know what I am doing wrong or why it is generating an empty output?

                  Once again, thank you for your help.

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    #10
                    Why are you including the quotes here?

                    -db "../gjhsk34/gjk34-contigs.fa"
                    Just checking to confirm that "gjk34-contigs.fa" contains protein sequences (since you are going blastx).

                    When debugging an problem like this make a test file with just one query sequence. This way you can debug problems rapidly instead of waiting for the full set to go through.
                    Last edited by GenoMax; 10-07-2013, 11:38 AM.

                    Comment

                    • milo0615
                      Member
                      • Dec 2012
                      • 39

                      #11
                      Hi GenoMax,

                      I am including quotes to point to my database location, but even if I dont include the quotes I would still get an empty output.

                      I am pretty sure "gjk34-contigs.fa" is a protein sequence. I created the database with the "-dbtype prot." So basically what you are saying is that if my contigs.fa file is not a protein sequence, I first need to translate it to protein and then create the database? Below is a screenshot of the gjk34-contigs.fa file...

                      Comment

                      • GenoMax
                        Senior Member
                        • Feb 2008
                        • 7142

                        #12
                        The screenshot did not come through. Use the "Go advanced" button as you are editing the message. That will allow you to attach PNG files to your post.

                        If "gjk34-contigs.fa" has DNA sequence (which looking at the name may be the case) you will need to do "tblastx" if you wanted to do a ranslated query/db search.

                        NOTE: Just checked the screenshot link in your post (https://www.dropbox.com/s/3uznelmn1d...contigs.fa.png). That is indeed DNA sequence. So that is the reason you are not getting anything in the output. You can't do a "blastx" search against DNA database.
                        Last edited by GenoMax; 10-07-2013, 12:16 PM.

                        Comment

                        • maubp
                          Peter (Biopython etc)
                          • Jul 2009
                          • 1544

                          #13
                          Originally posted by milo0615 View Post
                          After blastx is done running, I get a csv file but it is empty, even if I change the output to a .out file. Do you know what I am doing wrong or why it is generating an empty output?
                          If there are no BLAST hits, then the tabular and csv output would be emtpy.

                          Try asking for commented tabular, commented cvs, or the default plain text output to double check this.

                          Comment

                          • milo0615
                            Member
                            • Dec 2012
                            • 39

                            #14
                            Hi All,

                            Yes, I had to re-create my database and now it works perfectly. However, I do have a few more questions:

                            - What would be the best way or the best practice to analyze all of the blast results to check for the assembly with the most hits?

                            - Is there a free application that would help with the analysis?

                            I was thinking about exporting all the blast results into excel and then analyze them from there....

                            Thank you

                            Comment

                            • rhinoceros
                              Senior Member
                              • Apr 2013
                              • 372

                              #15
                              Originally posted by milo0615 View Post
                              Hi All,

                              Yes, I had to re-create my database and now it works perfectly. However, I do have a few more questions:

                              - What would be the best way or the best practice to analyze all of the blast results to check for the assembly with the most hits?

                              - Is there a free application that would help with the analysis?

                              I was thinking about exporting all the blast results into excel and then analyze them from there....

                              Thank you
                              CLI is by far the most efficient way handle large tables. Google: man sort, man awk, man sed, man grep, man cut, and man paste. There are related threads in this forum too. Then R for statistical analysis and plotting.
                              Last edited by rhinoceros; 10-13-2013, 04:35 AM.
                              savetherhino.org

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                Today, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              26 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              33 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              25 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              190 views
                              0 reactions
                              Last Post seqadmin  
                              Working...