Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to concatenate blast results (m8) via setting threshold of distance between hits

    hi, dear guys

    I performance blastn (-m 8) using a query file of many sequences, and for each query sequence, the output contains many fragmental hits of significance.

    however, these hits have no overlap, and what is interesting is that most gaps < 300bp (much shorter than full-length of the query sequence).

    so, how can i concatenate those closely related hits into one via setting a value (e.g 300bp) when these hits match the same subject (different regions), ——also to reduce the number of output hits per query.

    for example:



    are there any scripts or tools for this purpose?

    all your replies are welcome!
    Attached Files

  • #2
    It looks like you're blasting against a protein database and getting hits to CDS (i.e. the gaps represent introns or intergenic regions). If this is the case, you could just extract the ORFs and concatenate them by some simple command like cat. In any case, there must be a one-liner of cat grep cut and awk that would solve your problem..
    savetherhino.org

    Comment


    • #3
      Originally posted by rhinoceros View Post
      It looks like you're blasting against a protein database and getting hits to CDS (i.e. the gaps represent introns or intergenic regions). If this is the case, you could just extract the ORFs and concatenate them by some simple command like cat. In any case, there must be a one-liner of cat grep cut and awk that would solve your problem..
      thank you for your reply!
      actually, what you mentioned may typify one of the cases. then here i performed a tblastx using two genome sequences and found some of the hits per query had close relationship (certain significant colinearity) although each of the component hits seemed to be less related. so If I arbitrarily set a E/score value I think I will lose some informative hits.
      here I asked a general question for this kind of application, and I hope there is a versatile scripts...

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      31 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      33 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X