Header Leaderboard Ad


Help with Varscan somatic bug report and interpret mpileup2cns result



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with Varscan somatic bug report and interpret mpileup2cns result

    I'm facing the following bug report while running varscan somatic
    The bug report shown as below:
    Bug report:
    # A fatal error has been detected by the Java Runtime Environment:
    #  SIGSEGV (0xb) at pc=0x00007f4a3bf04fe8, pid=21559, tid=139956786239248
    # JRE version: 6.0_17-b17
    # Java VM: OpenJDK 64-Bit Server VM (14.0-b16 mixed mode linux-amd64 )
    # Derivative: IcedTea6 1.7.4
    # Distribution: Custom build (Thu Jul 29 16:49:18 EDT 2010)
    # Problematic frame:
    # V  [libjvm.so+0x57dfe8]
    # If you would like to submit a bug report, please include
    # instructions how to reproduce the bug and visit:
    #   http://icedtea.classpath.org/bugzilla
    ---------------  T H R E A D  ---------------
    Current thread (0x00007f4a34012000):  GCTaskThread [stack: 0x00007f4a3a771000,0x00007f4a3a872000] [id=21561]
    siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x0000000000000000
    7fff101ff000-7fff10200000 r-xp 00000000 00:00 0                          [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
    VM Arguments:
    java_command: java -jar VarScan.jar somatic normal_tissue.mpileup infected_tissue.mpileup normal_infected_comparison --mpileup 1 --min-var-freq 0.08 --p-value 0.10 --somatic-p-value 0.05 --output-vcf 1
    Launcher Type: SUN_STANDARD
    Environment Variables:
    log file:
    [b]Min coverage:	8x for Normal, 6x for Tumor
    Min reads2:	2
    Min strands2:	1
    Min var freq:	0.08
    Min freq for hom:	0.75
    Normal purity:	1.0
    Tumor purity:	1.0
    Min avg qual:	15
    P-value thresh:
    Somatic p-value:	0.05
    Reading input from normal_tissue.mpileup
    Reading mpileup input...
    Parsing Exception on line:
    normal_tissue_seq1_630	286	A	40	^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.^~.	[email protected]@@C<@[email protected]@[email protected]@@@CCCCC<@@[email protected]@<@
    The command I run is shown as:
    samtools mpileup -f reference.fasta normal.bam > normal_tissue.mpileup
    samtools mpileup -f reference.fasta infected.bam > infected_tissue.mpileup
    java -jar VarScan.jar somatic normal_tissue.mpileup infected_tissue.mpileup normal_infected_comparison --mpileup 1 --min-var-freq 0.08 --p-value 0.10 --somatic-p-value 0.05 --output-vcf 1

    Apart from that, below is the output result after running the command:
    samtools mpileup -f reference.fasta normalA.bam infectedA.bam normalB.bam infectedB.bam | java -jar VarScan.jar mpileup2cns --min-var-freq 0.08 --p-value 0.05 --output-vcf 1 >cross-sample.varScan.vcf
    ##FORMAT=<ID=ADR,Number=1, Type=Integer,Description=" Depth of variant-supporting bases on reverse strand (reads2minus)">
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1 Sample2 Sample3 Sample4
    normal_tissue_seq1_630     101     .       A       .       .       PASS    ADP=0;WT=0;HET=0;HOM=0;NC=4     GT:GQ:SDP:DP:RD:AD:FREQ:PVAL: RBQ:ABQ:RDF:RDR:ADF:ADR    ./.:.:0 ./.:.:1 ./.:.:0 ./.:.:0
    normal_tissue_seq5_580      532     .       A       .       .       PASS    ADP=1548;WT=4;HET=0;HOM=0;NC=0  GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/0:2147483647:1957:1820:1817:2:0.11%:5E-1:33:23:923:894:0:2    0/0:2147483647:1987:1894:1893:1:0.05%:7.5007E-1:34:17:1189:704:0:1
    normal_tissue_seq10_950      533     .       C       T       .       PASS    ADP=1611;WT=3;HET=1;HOM=0;NC=0  GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/0:303:1969:1843:1820:23:1.25%:1.3987E-6:33:24:880:940:4:19    0/0:2147483647:1981:1916:1908:8:0.42%:1.9421E-2:35:23:1162:746:2:6
    I not sure how to interpret the output result of mpileup2cns
    Thanks for any advice.

  • #2
    Which version of Varscan are you using?
    I never noticed this option --mpileup. What is it for?


    • #3
      I used the latest version of Varscan.
      The mpileup is replaced the pileup right now.
      I able to run Varscan right now.
      The above error is due to the problem of my java version

      Apart from that, below is one of the output result after running VarScan somatic:
      read9786_577      111     .       G       A       .       PASS    DP=951;SS=3;SSC=32;GPV=1E0;SPV=5.8927E-4        GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:8:4:2:33.33%:3,1,1,1      0/0:.:943:859:4:0.46%:611,248,2,2
      As I know, 8 is refer to data depth, 4 is refer to total number of reference read and 2 is refer to total number of allele read.
      Just wondering why the sum of total number of reference read and total number of allele read is less than total data depth?
      Is it due to the quality score of bases that consider good quality bases just only 6 bases?
      The above output result pattern is looked quite frequent at my data set.

      Apart from that, do you mind to share more or perhaps just provided me some simple example regarding how to interpret genotype in the output result?
      As I know, 0/0 = homozygote reference, 1/1 homozygote alternate, 0/1 is heterozygous and -/- is no call.
      But I just a bit blur to distinguish 3 of the above cases, especially "1/1"


      • #4

        I'm glad you figured out the Java JRE issue behind that exception. As for your second question, the differences in read depth are because of the minimum base quality requirement. DP reflects the SAMtools depth (no base quality requirement), but RD/AD are VarScan's readcounts (by default, qual>15).

        I'm confused by your question about the genotype... its interpretation is spelled out quite clearly in the VCF specification. In your example:

        Sample 1 is 0/1, or heterozygous-variant, with genotype GA.
        Sample 2 is 0/0, or wildtype, with genotype GG.

        If there were a third sample that was 1/1, its genotype would be AA.


        • #5
          Hi Edge,

          In relation to the genotypes, I am using VarScan v2.3.6. I found several lines in which the genotypes are marked as 1/1 while both samples are equal to the reference (0/0).

          Here few examples:
          chr1 721668 . C . PASS DP=168;SS=0;SSC=0;GPV=1E0;SPV=1E0 GT:GQP:RD:AD:FREQP4 1/1:.:78:78:0:0%:34,44,0,0 1/1:.:90:90:0:0%:38,52,0,0

          REFERENCE: chr1 721687 . C . PASS DP=139;SS=0;SSC=0;GPV=1E0;SPV=1E0 GT:GQP:RD:AD:FREQP4 1/1:.:71:71:0:0%:22,49,0,0 1/1:.:68:67:0:0%:22,45,0,0

          Do you know why have this genotypes been classified as 1/1?
          Thank you in advance,


          Latest Articles


          • seqadmin
            A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
            by seqadmin

            ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

            01-24-2023, 01:19 PM
          • seqadmin
            Introduction to Single-Cell Sequencing
            by seqadmin
            Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

            The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
            01-09-2023, 03:10 PM
          • seqadmin
            AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
            by seqadmin
            Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

            Read type and length
            AVITI is a short-read benchtop sequencer that also offers an innovative...
            12-29-2022, 10:44 AM