Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bedtools, vcf file with structural variants

    Hi,

    There is bug in bedtools when use with it on vcf files with structural variation:
    Here the report on the bug and how you can to fix it:


    I use with bedtools version v2.23.0-29-gab600e6-dirty.

    I try to use with "bedtools subtract" command on vcf file with structural variation (long deletions) - My vcf version is: ##fileformat=VCFv4.1

    Bedtools calculate the end position of the deletion according to the POS filed (start position) and the SVLEN key in INFO field in the vcf file records.

    In previous bedtools versions (I tried the version: v2.17.0), bedtools not support in structural variation, and not search the SVLEN key, and consider the length of the deletion as one base.

    Anyway, I find bug also in the bedtools version v2.23.0-29-gab600e6-dirty:
    when the SVLEN key was written in last position of INFO field (and the next character is tab), bedtools not succeed to find it, because bedtools seek the ";" character after the value of the SVLEN.

    Unfortunately, some software for finding structural variants (like Delly) not write the SVLEN key in their vcf output files. So I use with vcf package of python in order add the SVLEN key to INFO field (with the Record.add_info() function). But, the vcf package write the SVLEN key in the last position of the INFO field.

    For fixing this bug, you can add command to source code of bedtools:

    1) open the file: your_source_code_bedtools_path/bedtools2/src/utils/FileRecordTools/FileReaders/SingleLineDelimTextFileReader.cpp

    2) add the following row to function int SingleLineDelimTextFileReader::getVcfSVlen():
    if(endPtr == NULL) {endPtr = strchr(startPtr, '\t');}
    after the row:
    const char *endPtr = strchr(sta rtPtr, ';');

    3) Compile the source code:
    make install -C your_source_code_bedtools_path/bedtoolsSV/bedtools2/
    or if you no permission to write to /usr/local/bin:
    make -C your_source_code_bedtools_path/bedtoolsSV/bedtools2/ and use with the bedtools file in your_source_code_bedtools_path/bedtoolsSV/bedtools2/bin folder.

    Now, bedtools will succeed to identify the SVLEN key, also if it is written in last position of the INFO field.

    Refael.
    Last edited by refael.kohen; 04-06-2016, 04:53 AM.

  • #2
    Another bug:

    Now I see that other sub-programs of bedtools like "bedtools cluster" not use with SVLEN key, but expected to get the length of the deletion from other field (column number 4 in vcf).

    But vcf version 4.2, use with 'SVLEN' or 'END' keys in INFO field for end point of the deletion.

    You can fix the bedtools cluster (and other sub-programs of bedtools that not uses with SVLEN key), so bedtools will take the 'END' key in 'INFO' field:

    1) open the file: your_source_code_bedtools_path/bedtools2/src/utils/bedFile/bedFile.h

    2) add the following rows to function parseVcfLine:
    unsigned int end_index_s = fields[7].find(";END")+5;
    unsigned int end_index_e = fields[7].find(";", end_index_s);
    bed.end = atoi(fields[7].substr(end_index_s, end_index_e-end_index_s).c_str())-1;

    after the row:
    bed.end = bed.start + fields[3].size();

    3) Compile the source code:
    make install -C your_source_code_bedtools_path/bedtoolsSV/bedtools2/
    or if you no permission to write to /usr/local/bin:
    make -C your_source_code_bedtools_path/bedtoolsSV/bedtools2/ and use with the bedtools file in your_source_code_bedtools_path/bedtoolsSV/bedtools2/bin folder.

    Refael.

    Comment


    • #3
      Please report these via BEDTools discussion list: https://groups.google.com/forum/#!fo...dtools-discuss Dr. Quinlan probably monitors that regularly.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 08:47 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      59 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X