Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AlexReynolds
    replied
    The BEDOPS suite contains the bedmap application, which is perfect for annotation tasks involving BED files.

    The bedmap application takes three inputs:
    1. Operators
    2. Reference BED file (e.g., your intervals)
    3. Map BED file (e.g., your regions denoting "UTR" elements)


    One or more operations are applied on elements of the Map file that overlap elements in the Reference file by one or more bases. These operations can calculate a numerical result, or summarize ID, range or other information about mapped elements.

    I'll show you below how you can use bedmap with the --echo-map-id operator to grab UTR IDs.

    As an example, let's say your intervals are the following regions, adjacent 2000 bp windows that are stored in the tab-delimited file Intervals.bed:

    $ more Intervals.bed
    chr1 10000 12000
    chr1 12000 14000
    chr1 14000 16000
    ...


    Let's say your UTRs are stored in a file called UTRs.bed.

    (If you don't have this yet, you can grab these for the entire genome from the UCSC Genome Browser's table browser, through the Genes and Gene Prediction Tracks group and the knownGene table. Change the output format to BED and click get output, selecting the desired UTR category. Save the resulting BED file to your file system.)

    For hg19, here are a few of the first 5' UTRs over the genome, taken from the UCSC Genome Browser:

    $ more UTRs.bed
    chr1 11873 12189 uc010nxq.1_utr5_0_0_chr1_11874_f 0 +
    chr1 14361 14829 uc009vis.3_utr5_0_0_chr1_14362_r 0 -
    chr1 14969 15038 uc009vis.3_utr5_1_0_chr1_14970_r 0 -
    chr1 15795 15942 uc009vis.3_utr5_2_0_chr1_15796_r 0 -
    chr1 16606 16765 uc009vis.3_utr5_3_0_chr1_16607_r 0 -
    ...


    We can find the IDs of the UTRs that overlap our Intervals.bed elements with the following command:

    $ bedmap --echo --echo-map-id --delim '\t' Intervals.bed UTRs.bed > UTR_IDs_overlapping_Intervals.bed

    Let's take a look at the results:

    $ more UTR_IDs_overlapping_Intervals.bed
    chr1 10000 12000 uc010nxq.1_utr5_0_0_chr1_11874_f
    chr1 12000 14000
    chr1 14000 16000 uc009vis.3_utr5_0_0_chr1_14362_r;uc009vis.3_utr5_1_0_chr1_14970_r;uc009vis.3_utr5_2_0_chr1_15796_r
    ...


    In other words, the UTR uc010nxq.1_utr5_0_0_chr1_11874_f overlaps interval chr1:10000-12000, no UTRs overlap the interval chr1:12000-14000, three UTRs overlap the interval chr1:14000-16000 and so on...

    Let's explain what the command did. We used three operators with this application of bedmap:
    1. --echo
    2. --echo-map-id
    3. --delim


    The --echo operator prints each of the interval elements.

    The --echo-map-id operator prints each of the UTR IDs that overlap the specified interval element. If there are multiple IDs, they are delimited with a semi-colon.

    The --delim operator separates the --echo and --echo-map-id results with a tab character. This allows the output to remain a relaxed, three-column UCSC BED file, so you can process this with any downstream tools that take in BED data (like bedmap, bedops, or other BEDOPS utilities).

    You don't have to just grab IDs from the UTR file. If you want the entire UTR element, use --echo-map. If you want just the UTR regions, use --echo-map-range.

    There are several other --echo-map-* operators which summarize different information from mapped elements. Check out the Echo section of the bedmap documentation for more detail.

    Note: BEDOPS utilities run very fast and use very little memory, compared with alternative toolkits which do not yet use our design optimizations. This is because we require that BED inputs are sorted (at most, only once), which adds structure that we can take advantage of. Data you get through the UCSC Genome Browser will be sorted. However, you might need to sort data in your intervals file, if you do not know its sort status. This is easy and relatively quick, making use of the BEDOPS sort-bed tool:

    $ sort-bed Intervals.bed > Sorted_Intervals.bed

    Again, this only needs to be done once. BEDOPS tools read and write sorted data, so any results from, say, bedmap do not need any further sorting before using results with downstream tools.
    Last edited by AlexReynolds; 02-21-2013, 01:47 PM.

    Leave a comment:


  • oiiio
    started a topic Annotating a BED file for UTRs -- which data to use?

    Annotating a BED file for UTRs -- which data to use?

    I've got a BED of intervals that I need to annotate for 3' or 5' UTRs. I've been trying to work with biomart and UCSC table browser to produce a relevant GFF/GTF/BED file containing the annotated UTR regions, but have not managed to get what I want yet.

    Does anyone know how to use either UCSC/biomart to get this information, or perhaps another way? Any recommendations useful!

Latest Articles

Collapse

  • seqadmin
    Investigating the Gut Microbiome Through Diet and Spatial Biology
    by seqadmin




    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
    02-24-2025, 06:31 AM
  • seqadmin
    Quality Control Essentials for Next-Generation Sequencing Workflows
    by seqadmin




    Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

    Nucleic Acid Quality Control
    Preparing for NGS starts with isolating the...
    02-10-2025, 01:58 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 02-24-2025, 02:48 PM
0 responses
295 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-21-2025, 02:46 PM
0 responses
213 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-18-2025, 03:06 PM
0 responses
73 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-07-2025, 09:30 AM
0 responses
81 views
0 likes
Last Post seqadmin  
Working...
X