Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • strange behavior in vcftools

    Hi all,

    I want to split my VCF file into two files, one with SNPs and another with INDELs. To that I am using VCF tools, with the following sentences:

    # to keep only SNPs
    vcftools --vcf myvariants.vcf --remove-indels --recode-INFO-all --out only_SNPs --recode

    # to keep only INDELs
    vcftools --vcf myvariants.vcf --keep-only-indels --recode-INFO-all --out only_INDELs --recode

    but when I check the files, I get this:

    INDELs:

    Code:
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
    CM003279.1      1274    C       A       999     .
    CM003279.1      3637    A       C       157     .
    CM003279.1      3788    GCCCC   GCCCCC  130     .
    CM003279.1      3879    A       C       999     .
    .
    .
    .
    SNPs:

    Code:
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
    CM003279.1      25370   TAAA    TAA     999     .
    CM003279.1      75537   TACAC   TAC     999     .
    CM003279.1      77780   ACATCA  ACA     999     .
    CM003279.1      3177577 CTTT    CTT     999     .
    .
    .
    .
    The splitting process doesn't make any sense, I have SNPs and INDELs in both files (I didn't add the genotype data here because it would be very difficult trying to read it)

    In attachment the firts lines of my original VCF file.

    I am pretty sure that the problem comes from my VCF file, not from vcftools, but I can't see the problem.

    is there a tool to check if a vcf file is malformed?

    Thanks in advance
    Attached Files
    Last edited by diego diaz; 07-20-2015, 05:51 PM.

  • #2
    I forgot to mention that the variants were called in the scaffolds not in the chromosomes, then I had to code a custom script to transform scaffolds coordinates into chromosomes coordinates, maybe during the process I forgot to put something but I can't see it.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 08:47 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    54 views
    0 likes
    Last Post seqadmin  
    Working...
    X