Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • coco90417
    replied
    Varscan vcf output for indel

    Hi,

    I am also encountering issues with vcf output of indels. I have got indels that look like this:

    1 984171 . CAG AG .
    1 1588744 . AGCG GCG .

    I checked genome browser for the context of both mutations(http://genome.ucsc.edu/cgi-bin/hgTra...A984170-984180 and http://genome.ucsc.edu/cgi-bin/hgTra...588740-1588750), it seems that the first one is supposed to be simple deletion of the first base and the should look like this:

    1 984170 . GC G .

    And the second one can be either represented by a block substitution that looks like this:

    1 1588743 . AAG AG .

    or a deletion (if you align the deletion to the left) that looks like this:

    1 1588742 . GA G .

    So I do not know whether I did something wrong or it was because Varscan has a different vcf output format for indels?

    Please help me. Many many thanks.

    Leave a comment:


  • IsmailM
    replied
    If you are using VarScan mpileup2snp or mpileup2indel, why does the QUAL column not have a number in it?

    Leave a comment:


  • bw.
    replied
    I'm also seeing the slashes with VarScan v2.3.6
    I wrote this script to convert the slashes to commas:

    Code:
    import sys
    
    if len(sys.argv) < 2:  sys.exit("Usage: " + sys.argv[0] + "  vcf_filename")
    
    in_fname = sys.argv[1]
    out_fname = (in_fname[:-4] if in_fname.endswith(".vcf") else in_frame) + ".fixed.vcf"
    print("Writing to: " + out_fname)
    out = open(out_fname, "w")
    for line in open(in_fname):
            if not line or line[0] is "#":
                    out.write(line)
            else:
                    fields = line.split("\t")
                    fields[3] = fields[3].replace("/", ",").replace("\\", ",")   # remove any slashes from REF field
                    fields[4] = fields[4].replace("/", ",").replace("\\", ",")   # remove any slashes from ALT field
                    out.write("\t".join(fields))
    To use, just copy-paste into a file (lets say script.py) and run:

    python script.py file.vcf


    Also, this version of the script just removes the vcf records with slashes:

    Code:
    import sys
    
    if len(sys.argv) < 2:  sys.exit("Usage: " + sys.argv[0] + "  vcf_filename")
    
    in_fname = sys.argv[1]
    out_fname = (in_fname[:-4] if in_fname.endswith(".vcf") else in_frame) + ".fixed.vcf"
    print("Writing to: " + out_fname)
    out = open(out_fname, "w")
    for line in open(in_fname):
            if not line or line[0] is "#":
                    out.write(line)
            else:
                    fields = line.split("\t")
                    if "\\" not in (fields[3]+fields[4]) and "/" not in (fields[3]+fields[4]):
                            out.write("\t".join(fields))
    Last edited by bw.; 02-05-2014, 02:16 PM. Reason: Turns out slashes also sometimes appear in the REF field, so added checks for that.

    Leave a comment:


  • rnahar
    replied
    I am also facing the +/- issue in the varscan indel notations - however i do not use the vcf output but prefer the regular tabular output of Varscan. Is there a way that this indel notation can be changed so as to be compatible with annovar ? I use Varscan 2.3.6

    Leave a comment:


  • IsmailM
    replied
    solved indel vcf format with awk command

    Here is an awk command that can change your indel vcf format into the correct format.

    cat Original_VCF | awk 'BEGIN {OFS="\t"} NR <= 24' > FINAL_VCF && cat Original_VCF | awk 'BEGIN {OFS="\t"} NR >= 25 { if (length($4)>length($5)) {$5 = substr($4, 0, 1)}; print }' >> FINAL_VCF


    It uses two awk commands because the second command changes the header of the file if you run it on the whole file. So the first awk command transfers the header(assumed to be 24 lines) and then from the 25th line down is the vcf indels that are changed to the correct indel format using the second awk command.

    Leave a comment:


  • eeyun
    replied
    Originally posted by eeyun View Post
    As far as I can tell, it should be ref = TTCC and alt = T
    Attachment included here to show the variant in question.
    Attached Files

    Leave a comment:


  • eeyun
    replied
    Originally posted by eeyun View Post
    We are having the same problem with 2.3.5

    <pre>#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1
    chr1 6529182 . TTCC TCC . PASS ADP=314;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDPP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:255:322:314:178:138:43.67%:1.1101E-50:34:31:88:90:70:68</pre>
    As far as I can tell, it should be ref = TTCC and alt = T

    Leave a comment:


  • eeyun
    replied
    We are having the same problem with 2.3.5

    <pre>#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1
    chr1 6529182 . TTCC TCC . PASS ADP=314;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDPP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:255:322:314:178:138:43.67%:1.1101E-50:34:31:88:90:70:68</pre>

    Leave a comment:


  • sophiespo
    replied
    Originally posted by NestorNotabilis View Post
    To add to Olivia's comment, I'm also using 2.3.4 and still getting the +/- issue when using mpileup2snp.
    I am as well when using the somatic/processSomatic functions.

    Annovar doesn't like this.. can anyone help?

    Leave a comment:


  • NestorNotabilis
    replied
    Originally posted by tommivat View Post
    Olivia, +/- issue is fixed in the latest version 2.3.4.
    To add to Olivia's comment, I'm also using 2.3.4 and still getting the +/- issue when using mpileup2snp.

    Leave a comment:


  • tommivat
    replied
    Originally posted by oliviajm View Post
    Is it fixed in all VarScan tools?
    Which one do you use?

    I use VarScan.v2.3.4.jar mpileup2indel and I still get some "C +AAAG" or "G -AT" in my vcf output file.
    That explains.. I use somatic for tumor-normal pairs.

    Tommi

    Leave a comment:


  • oliviajm
    replied
    Is it fixed in all VarScan tools?

    Which one do you use?

    I use VarScan.v2.3.4.jar mpileup2indel and I still get some "C +AAAG" or "G -AT" in my vcf output file.


    Olivia

    Leave a comment:


  • tommivat
    replied
    Olivia, +/- issue is fixed in the latest version 2.3.4.

    However, the way variant alleles are coded is still unconventional. vcf format uses comma to separate alleles whereas varscan uses slash so I hope this can be fixed in future releases:

    Code:
    A/C -> A,C
    ACG/CG -> ACG,CG
    br,
    Tommi

    Leave a comment:


  • oliviajm
    replied
    Hello all,

    I realised that the missing qual field had been added in one of the last versions of VarScan. As I use it in a pipeline, I did not update it recently to avoid compatibility problems.

    But after a few tests, it seems to me that the insertion and deletion are still coded with + and - in the ref and alt column, which don't match with the vcf specifications. I think an insertion of a T after a C should be written C in the ref field and CT in the alt field (and not by +T) for example.

    Regards,

    Olivia

    Leave a comment:


  • tommivat
    replied
    Hello Dan and others,

    First, thanks for the great piece of software! It would space us some work if somaticFilter supported .vcf files as well. I don't know if it is tedious to implement.

    Another thing I wanted to ask, not related to vcf, concerns false-positive filtering (fpfilter.pl). I'm using bam-readcount to produce input for the script, but even if I do it chromosome by chromosome, the files are too big (>50G) and my computer (with 8Gb memory) just gets jammed when running the fpfilter.pl. Is there a way to do modify the script to support pipeing? And please tell me if it already does.

    br,
    Tommi

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Choosing Between NGS and qPCR
    by seqadmin



    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
    10-18-2024, 07:11 AM
  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin




    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    Nobel Prize for MicroRNA Discovery
    This week,...
    10-07-2024, 08:07 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 05:31 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-24-2024, 06:58 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-23-2024, 08:43 AM
0 responses
48 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-17-2024, 07:29 AM
0 responses
58 views
0 likes
Last Post seqadmin  
Working...
X