Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to merge the annotations

    Hello,

    I have the lincRNA annotation file from UCSC NR*, GENCODE, and other published lincRNA collections. However, i want to merge them into one larger lincRNA collections. what pipeline can do this ?

  • #2
    cat file1 >combinedset.txt
    cat file2 >>combinedset.txt
    cat file3 >>combinedset.txt

    If you need to reformat:
    # Column 1,2,3
    cat file1 | awk -v "OFS=\t" '{ print $1, $2,$3;} >combinedset.txt
    # Column 3,4,5
    cat file2 | awk -v "OFS=\t" '{ print $3, $4,$5;} >>combinedset.txt
    # Column 1, 2,3 : change col 2 from 1 based to 0 based
    cat file3 | awk -v "OFS=\t" '{ print $1, int($2)-1, $3;} >>combinedset.txt

    Comment


    • #3
      Originally posted by masylichu View Post
      Hello,

      I have the lincRNA annotation file from UCSC NR*, GENCODE, and other published lincRNA collections. However, i want to merge them into one larger lincRNA collections. what pipeline can do this ?
      can you paste those linCRNA annotation file's weblinks out? i want it either
      Last edited by zinky; 12-05-2012, 11:49 PM.

      Comment


      • #4
        in your merging i assume you might need to check each separate annotation for duplicates between annotations. is that the case?

        if not then 'catting' them together is the right thing to do (assuming you're using a *nix) based system or cygwin in windows. just a dorky note...you can do those cat's in one line:

        Code:
        cat file1 file2 file3 > combinedset.txt
        and you could also do the reformats in one line:

        Code:
        cat <(cut -f1,2,3 file1) <(cut -f3,4,5 file2) <(cut -f1,2,3 file3) > combinedset.txt
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment


        • #5
          Ok, if you end up with something like:

          chr1 1002 9005 linRNA1 . + (BED FORMAT)

          Then you can

          cat combinedfile.bed | sort -k1,1 -k2,2n | uniq >combined.sorted.collapsed.bed

          Then it is sorted by chromosome and only contains unique entries.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Recent Developments in Metagenomics
            by seqadmin





            Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
            09-23-2024, 06:35 AM
          • seqadmin
            Understanding Genetic Influence on Infectious Disease
            by seqadmin




            During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

            Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
            09-09-2024, 10:59 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 10-02-2024, 04:51 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 10-01-2024, 07:10 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-30-2024, 08:33 AM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 09-26-2024, 12:57 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Working...
          X