Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • VarScan copynumber error when filtering with copycaller, plus GC content > 100

    Hi,

    Im using VarScan.v2.3.2 to do CNV analysis on HighSeq exome data from tumor-normal pairs. When running copynumber, all appears ok, but when filtering the resulting .copynumber file using Copycaller, I get an error ("Parsing Exception", please see below).

    When running VarScan copynumber like this:
    java -jar /VarScan.v2.3.2.jar copynumber $NOR $TUM $BASENAME

    I got the following output:
    ########
    Normal Pileup: /177_1N.prmdup.realign.recal_sorted.mpileup
    Tumor Pileup: /177_1T.prmdup.realign.recal_sorted.mpileup
    Min coverage: 10
    Min avg qual: 15
    P-value thresh: 0.01
    Not resetting normal file because chrM < chrY
    561343988 positions in tumor
    557785077 positions shared in normal
    38214383 had sufficient coverage for comparison
    482476 raw copynumber segments with size > 10
    474997 good copynumber segments with depth > 10
    ##########

    So we have an error stating "Not resetting normal file because chrM < chrY".

    I saw an answer that dkobolt had given regarding this error message saying that "This is just a warning printed by VarScan as it's simultaneously parsing normal and tumor files. As long as your output files contain all of the chromosomes that you expect, you can safely ignore it."

    So I double checked and all chromosomes are present in the copynumber file.

    Then I ran Copycaller like this:
    java -jar /VarScan.v2.3.2.jar copyCaller $IN --output-file ${IN}.called

    The output I get from the CopyCaller is the following:
    #####################
    Min coverage: 20
    Reading input from /177_1T.copynumber
    Parsing Exception on line:
    chr1 10010 10109 100 30,4 28,4 -0,097 51,0
    For input string: "30,4"
    Error parsing input: null
    java.lang.NullPointerException
    at net.sf.varscan.CopyCaller.<init>(CopyCaller.java:293)
    at net.sf.varscan.VarScan.copyCaller(VarScan.java:344)
    at net.sf.varscan.VarScan.main(VarScan.java:173)

    ##################
    I am not sure what is wrong with this line, however when looking at the output from "copynumber" (please see below for a sample) I noticed that the GC content sometimes exceeds 100, please see attached picture.

    ##################
    chrom chr_start chr_stop num_positions normal_depth tumor_depth log2_ratio gc_content
    chr1 10010 10109 100 30,4 28,4 -0,097 51,0
    chr1 10110 10209 100 23,4 21,4 -0,132 51,0
    chr1 10210 10240 31 14,3 11,9 -0,260 51,6
    chr1 10359 10458 100 20,7 14,1 -0,556 51,0
    chr1 12202 12226 25 10,2 2,0 -2,350 48,0
    chr1 13425 13438 14 10,0 5,9 -0,754 50,0
    chr1 69005 69104 100 22,5 24,1 0,098 40,0
    chr1 69105 69204 100 41,9 38,2 -0,133 77,0
    chr1 69205 69304 100 74,6 70,9 -0,074 127,0
    chr1 69305 69404 100 42,5 45,8 0,108 171,0
    chr1 69405 69504 100 20,0 20,1 0,003 216,0
    chr1 69505 69604 100 26,9 22,2 -0,277 265,0
    chr1 69605 69704 100 66,4 64,2 -0,049 308,0
    chr1 69705 69804 100 86,8 83,1 -0,064 42,0
    chr1 69805 69904 100 73,5 71,1 -0,047 78,0
    chr1 69905 70004 100 55,7 49,0 -0,185 119,0
    chr1 70005 70043 39 22,4 23,9 0,096 25,6
    chr1 367991 368039 49 11,2 5,3 -1,078 51,0
    ################

    Any feedback would be greatly appreciated!

    Thank you in advance.

  • #2
    Hello,

    Thank you for posting this message... I have seen this issue before, and thought it was fixed in v2.3.2. You're encountering a "locale error" caused by European representation of floating-point numbers (decimal numbers) with a comma (e.g. 3,1415926) rather than a decimal (3.1415926).

    If it's possible for you to change the locale preference in your java setting, that's one way to address the problem. Another way would be a global search-and-replace (perl -pi -e s'/\,/\./g' output.copynumber

    I will look again in the code to see if I can determine why the locale parsing correction isn't working.

    Comment


    • #3
      Hello Dan,

      I have a question about Varscan. It is posted at

      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc



      It will be so grateful if you can answer me.

      Thank you very much.

      Comment


      • #4
        Thank you again for following up. I'd thought that the locale-parsing issue was fixed, but using your file was able to duplicate the problem and fix it. Now VarScan copyCaller should work for you even with European-style floating point numbers.

        Also, the GC content values > 100 were traced to a bug in GC counting that has also been fixed.

        These fixes are all in VarScan v2.3.3, which was posted today.

        Comment


        • #5
          Thanks a thousand!

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 11:49 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          61 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X