Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • illinu
    Member
    • Jul 2013
    • 55

    identify duplicated transcripts from blast report

    I am working with a transcriptome assembled de novo. After a blastx analysis I kept the besthits (one hit per isoform) from which I isolated those annotations that are present more than once belonging to isoforms that are either duplicates or non-full-length (assembled in more than one contig).

    How can I discern which isoforms are duplicates or multiassembled based on the start-end position of the annotation?

    Any tools or scripts out there?

    Thanks
    Il.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    What format is your data in (GTF)?

    Comment

    • illinu
      Member
      • Jul 2013
      • 55

      #3
      The blast report is a tab separated file, the transcritps in fasta, but I am also producing a GTF3 file with another annotator

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Depending on what your files look like you may be able use some combination of unix utilities (grep/sort/uniq/awk etc) but otherwise this may need custom code.

        Have you tried doing some basic sorting in excel on the GTF file?

        Comment

        • illinu
          Member
          • Jul 2013
          • 55

          #5
          Yes, when I sort by the subject hit I clearly see when two or more isoforms are duplicated or not (when the subject start-end position overlaps between the two isoforms) or multiassembled (when each isoform hits a different part of the subject), but there are 40k isoforms and I want to automate the analysis.

          Using grep/sort/uniq/awk would be great. I'v done all the previous filtering with those tools but I don't see how I could use them with what I want to do now.

          Il.

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          27 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          34 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          40 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          62 views
          0 reactions
          Last Post SEQadmin2  
          Working...