Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BLAST filtering out 'no hits' in output

    Hi,

    I am using a local install of blastn to look for hits to a custom database from a bunch of short reads. In my final output file, I really only care about reads that found a 'hit' and so want to filter out all reads with 'no hits'. My strategy is as follows:

    Read 1
    Read 2
    .
    .
    Read X

    -> Put into one big FASTA file
    -> Run blastn against the custom database
    -> Save to file reads with hits only.

    Right now (the standard blast output) would be:
    Read 1 -> hit 1
    Read 2 -> no hit
    . -> no hit
    . -> no hit
    Read X -> hit X

    But I want my output to only show:
    Read 1 -> hit 1
    Read x -> hit X

    Since I have many reads and very few hits, my output file very rapidly gets filled with all the 'no hits' reads slowing down the whole process. Is there a way to filter my output directly via the command line, or do I need to build a script to do this?

    Thanks in advance,
    Kristian

  • #2
    I would use the tabular output, which just writes nothing at all for no hits.

    Comment


    • #3
      Hmm, tried that and I get the following error:
      BLAST query/options error: No hits are being saved

      I run the following command:
      blastn -query reads.fasta -db nt -num_descriptions 1 -num_alignments 0 -out reads_blastNT.txt -num_threads 8 -outfmt 6

      Instead of "-outfmt 6", I have also used:
      "7 sseqid ssac qstart qend sstart send qseq evalue bitscore"

      but I get the same result. Any thoughts? The output file is also empty.

      Thanks very much.

      Comment


      • #4
        Okay, so I solved the error, which was caused by my "-num_alignments 0" handle. However, the output still contains my reads with no hits - it just writes '0 hits found' under the read. This is not what I am after - I want to completely ignore '0 hits' reads so they don't even get written to the file.

        Comment


        • #5
          What exactly is the new command line you are running?

          Comment


          • #6
            Hi maubp.

            Sorry, I don't understand that question - I'm not running a new command line - I'm just trying to figure out whether I can tell blastn not to store my 'no hits' hits in the output file.

            Comment


            • #7
              I would expect you to use something like this command line:

              blastn -query reads.fasta -db nt -out reads_blastNT.txt -num_threads 8 -outfmt 6

              Rather than messing about with -num_descriptions and/or -num_alignments which only really apply to the plain text and HTML output, from memory I would use -max_target_seqs instead.

              Comment


              • #8
                I tried that one, but I still have the same problem - the '0 hits' hits are still recorded in the output file, rather than discarded.

                I guess I just have to pipe it and make a script - but it would have been easier if it could have been done directly from the command line.

                Comment


                • #9
                  I am surprised. What version of BLAST+ do you have? And could you post a bit of the example output?

                  Comment


                  • #10
                    I should have the newest version:
                    blastn: 2.2.25+
                    Package: blast 2.2.25, build Mar 21 2011 12:21:32

                    An example output looks like this (tabular format):
                    # BLASTN 2.2.25+
                    # Query: ILLUMINA:203:C07LRACXX:5:1101:1348:1952 1:N:0:TAGCTT
                    # Database: custom
                    # 0 hits found
                    # BLASTN 2.2.25+
                    # Query: ILLUMINA:203:C07LRACXX:5:1101:1366:1952 1:Y:0:TAGCTT
                    # Database: custom
                    # 0 hits found
                    # BLASTN 2.2.25+
                    # Query: ILLUMINA:203:C07LRACXX:5:1101:1472:1955 1:N:0:TAGCTT
                    # Database: custom
                    # 0 hits found
                    # BLASTN 2.2.25+
                    # Query: ILLUMINA:203:C07LRACXX:5:1101:1456:1981 1:N:0:TAGCTT
                    # Database: custom
                    # 0 hits found
                    # BLASTN 2.2.25+
                    # Query: ILLUMINA:203:C07LRACXX:5:1101:1414:1981 1:N:0:TAGCTT
                    # Database: custom
                    # 0 hits found

                    Basically, these are all the reads I don't want listed in the file as they have no hits.

                    Thanks a bunch.

                    Comment


                    • #11
                      All those lines starting with a hash (#) are comments which you asked BLAST to produce by using -outfmt 7,
                      Code:
                      $ blastn -help
                      ...
                       *** Formatting options
                       -outfmt <String>
                         alignment view options:
                           0 = pairwise,
                           1 = query-anchored showing identities,
                           2 = query-anchored no identities,
                           3 = flat query-anchored, show identities,
                           4 = flat query-anchored, no identities,
                           5 = XML Blast output,
                           6 = tabular,
                           7 = tabular with comment lines,
                           8 = Text ASN.1,
                           9 = Binary ASN.1,
                          10 = Comma-separated values,
                          11 = BLAST archive format (ASN.1)
                      Use -outfmt 6 as I suggested, and they'll go away.

                      P.S. That does mean for this example as there are no hits at all, the output file will be empty.

                      Comment


                      • #12
                        Yes! That worked - thanks so much. I skipped the -outfmt 6 handle when it initially failed due to my "-num_alignments 0".

                        Thanks for your help - that's exactly what I needed.

                        Comment


                        • #13
                          Great

                          Thanks for confirming it worked.

                          Comment


                          • #14
                            I made the same mistake just now by choosing -outfmt 7

                            However, it's a rather easy fix to remove the comments:

                            Code:
                            awk '!/^\#/' file > newfile
                            Should do the trick.
                            Last edited by jmjensen; 06-01-2013, 12:50 AM.

                            Comment


                            • #15
                              This is maybe not the right thread, but I am interested in only the queries with 'no hits'. So far I can only extract the reads with hits, and then compare the differences between the original query file and the file with hits only.

                              Any good tips on how to directly extract the queries with no hits?

                              Thanks, Jon

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Exploring the Dynamics of the Tumor Microenvironment
                                by seqadmin




                                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                07-08-2024, 03:19 PM
                              • seqadmin
                                Exploring Human Diversity Through Large-Scale Omics
                                by seqadmin


                                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                06-25-2024, 06:43 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 07-19-2024, 07:20 AM
                              0 responses
                              40 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-16-2024, 05:49 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-15-2024, 06:53 AM
                              0 responses
                              64 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-10-2024, 07:30 AM
                              0 responses
                              43 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X