Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bedtools, vcf file with structural variants

    Hi,

    There is bug in bedtools when use with it on vcf files with structural variation:
    Here the report on the bug and how you can to fix it:


    I use with bedtools version v2.23.0-29-gab600e6-dirty.

    I try to use with "bedtools subtract" command on vcf file with structural variation (long deletions) - My vcf version is: ##fileformat=VCFv4.1

    Bedtools calculate the end position of the deletion according to the POS filed (start position) and the SVLEN key in INFO field in the vcf file records.

    In previous bedtools versions (I tried the version: v2.17.0), bedtools not support in structural variation, and not search the SVLEN key, and consider the length of the deletion as one base.

    Anyway, I find bug also in the bedtools version v2.23.0-29-gab600e6-dirty:
    when the SVLEN key was written in last position of INFO field (and the next character is tab), bedtools not succeed to find it, because bedtools seek the ";" character after the value of the SVLEN.

    Unfortunately, some software for finding structural variants (like Delly) not write the SVLEN key in their vcf output files. So I use with vcf package of python in order add the SVLEN key to INFO field (with the Record.add_info() function). But, the vcf package write the SVLEN key in the last position of the INFO field.

    For fixing this bug, you can add command to source code of bedtools:

    1) open the file: your_source_code_bedtools_path/bedtools2/src/utils/FileRecordTools/FileReaders/SingleLineDelimTextFileReader.cpp

    2) add the following row to function int SingleLineDelimTextFileReader::getVcfSVlen():
    if(endPtr == NULL) {endPtr = strchr(startPtr, '\t');}
    after the row:
    const char *endPtr = strchr(sta rtPtr, ';');

    3) Compile the source code:
    make install -C your_source_code_bedtools_path/bedtoolsSV/bedtools2/
    or if you no permission to write to /usr/local/bin:
    make -C your_source_code_bedtools_path/bedtoolsSV/bedtools2/ and use with the bedtools file in your_source_code_bedtools_path/bedtoolsSV/bedtools2/bin folder.

    Now, bedtools will succeed to identify the SVLEN key, also if it is written in last position of the INFO field.

    Refael.
    Last edited by refael.kohen; 04-06-2016, 04:53 AM.

  • #2
    Another bug:

    Now I see that other sub-programs of bedtools like "bedtools cluster" not use with SVLEN key, but expected to get the length of the deletion from other field (column number 4 in vcf).

    But vcf version 4.2, use with 'SVLEN' or 'END' keys in INFO field for end point of the deletion.

    You can fix the bedtools cluster (and other sub-programs of bedtools that not uses with SVLEN key), so bedtools will take the 'END' key in 'INFO' field:

    1) open the file: your_source_code_bedtools_path/bedtools2/src/utils/bedFile/bedFile.h

    2) add the following rows to function parseVcfLine:
    unsigned int end_index_s = fields[7].find(";END")+5;
    unsigned int end_index_e = fields[7].find(";", end_index_s);
    bed.end = atoi(fields[7].substr(end_index_s, end_index_e-end_index_s).c_str())-1;

    after the row:
    bed.end = bed.start + fields[3].size();

    3) Compile the source code:
    make install -C your_source_code_bedtools_path/bedtoolsSV/bedtools2/
    or if you no permission to write to /usr/local/bin:
    make -C your_source_code_bedtools_path/bedtoolsSV/bedtools2/ and use with the bedtools file in your_source_code_bedtools_path/bedtoolsSV/bedtools2/bin folder.

    Refael.

    Comment


    • #3
      Please report these via BEDTools discussion list: https://groups.google.com/forum/#!fo...dtools-discuss Dr. Quinlan probably monitors that regularly.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X