Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK GermlineCNVCaller & PostprocessGermlineCNVCalls

    Hi, I was wondering if anyone here has experience in running GATK GermlineCNVCaller & PostprocessGermlineCNVCalls for calling CNVs in germline samples?

    The VCF files that I'm getting always have ALT to be "<DEL>,<DUP>". Shouldn't ALT be just one of them or neither? Somehow both the interval and segment VCF files I'm looking at have all positions marked as "<DEL>,<DUP>".

    If anyone here has experience with this, I would really appreciate some feedback. Thanks!

  • #2
    Hi rajitz,

    I am trying to run the GermlineCNV caller and had problems in DetermineGermlineContigPloidy section.
    Since I found out you have reached at least to the final step, I was wondering if you can give me some advice in this step.

    I used the following command in this step.


    /data/NGS/Reanalysis-Package/gatk-4.1.4.0/gatk -L Filtered_annotated_preprocessed_intervals_Twist.interval_list --interval-merging-rule OVERLAPPING_ONLY -I /data/NGS/Reanalysis-Package/CNV/HDF5-200/S1071Nr10.counts.hdf5 -I /data/NGS/Reanalysis-Package/CNV/HDF5-200/S1071Nr11.counts.hdf5 -I /data/NGS/Reanalysis-Package/CNV/HDF5-200/S1071Nr12.counts.hdf5 -I /data/NGS/Reanalysis-Package/CNV/HDF5-200/S1071Nr13.counts.hdf5 ( added 200 samples here as input, skipped those lines here to save the space) --contig-ploidy-priors /data/NGS/Reanalysis-Package/CNV/Bed-Files/contig_ploidy_priors.tsv --output . --output-prefix ploidy --verbosity DEBUG --mapping-error-rate 0.01 --global-psi-scale 0.001 --sample-psi-scale 1.0E-4 --mean-bias-standard-deviation 0.01

    I installed the conda environment following https://gatk.broadinstitute.org/hc/e...de44460155fb6#

    Everything was working until I got the following error, which I cannot understand what it is and how I can solve it.

    The command was running properly until I got the following error:

    16:54:47.473 DEBUG ScriptExecutor - --contig_ploidy_prior_table=/data/NGS/Reanalysis-Package/CNV/Bed-Files/contig_ploidy_priors.tsv
    16:54:47.473 DEBUG ScriptExecutor - --output_model_path=/data/NGS/Reanalysis-Package/CNV/ploidy-model
    /homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
    from ._conv import register_converters as _register_converters
    Traceback (most recent call last):
    File "/tmp/cohort_determine_ploidy_and_depth.1941148667013278511.py", line 79, in <module>
    args.contig_ploidy_prior_table)
    File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_ploidy.py", line 182, in get_contig_ploidy_prior_map_from_tsv_file
    delimiter=delimiter)
    File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/gcnvkernel/io/io_commons.py", line 50, in read_csv
    input_pd = pd.read_csv(fh, delimiter=delimiter, dtype=dtypes_dict) # dtypes_dict keys may not be present
    File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py", line 705, in parser_f
    return _read(filepath_or_buffer, kwds)
    File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py", line 451, in _read
    data = parser.read(nrows)
    File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py", line 1065, in read
    ret = self._engine.read(nrows)
    File "/homefolder/zfatahi/miniconda3/envs/gatk/lib/python3.6/site-packages/pandas/io/parsers.py", line 1828, in read
    data = self._reader.read(nrows)
    File "pandas/_libs/parsers.pyx", line 894, in pandas._libs.parsers.TextReader.read
    File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._read_low_memory
    File "pandas/_libs/parsers.pyx", line 970, in pandas._libs.parsers.TextReader._read_rows
    File "pandas/_libs/parsers.pyx", line 957, in pandas._libs.parsers.TextReader._tokenize_rows
    File "pandas/_libs/parsers.pyx", line 2200, in pandas._libs.parsers.raise_parser_error
    pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 58, saw 7

    16:54:55.812 DEBUG ScriptExecutor - Result: 1
    16:54:55.813 INFO DetermineGermlineContigPloidy - Shutting down engine
    [February 3, 2020 4:54:55 PM IRST] org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy done. Elapsed time: 0.78 minutes.
    Runtime.totalMemory()=3370123264
    org.broadinstitute.hellbender.utils.python.PythonScriptExecutorException:
    python exited with 1
    Command Line: python /tmp/cohort_determine_ploidy_and_depth.1941148667013278511.py --sample_coverage_metadata=/tmp/samples-by-coverage-per-contig3314282489028474630.tsv --output_calls_path=/data/NGS/Reanalysis-Package/CNV/ploidy-calls --mapping_error_rate=1.000000e-02 --psi_s_scale=1.000000e-04 --mean_bias_sd=1.000000e-02 --psi_j_scale=1.000000e-03 --learning_rate=5.000000e-02 --adamax_beta1=9.000000e-01 --adamax_beta2=9.990000e-01 --log_emission_samples_per_round=2000 --log_emission_sampling_rounds=100 --log_emission_sampling_median_rel_error=5.000000e-04 --max_advi_iter_first_epoch=1000 --max_advi_iter_subsequent_epochs=1000 --min_training_epochs=20 --max_training_epochs=100 --initial_temperature=2.000000e+00 --num_thermal_advi_iters=5000 --convergence_snr_averaging_window=5000 --convergence_snr_trigger_threshold=1.000000e-01 --convergence_snr_countdown_window=10 --max_calling_iters=1 --caller_update_convergence_threshold=1.000000e-03 --caller_internal_admixing_rate=7.500000e-01 --caller_external_admixing_rate=7.500000e-01 --disable_caller=false --disable_sampler=false --disable_annealing=false --interval_list=/tmp/intervals2626211694091496982.tsv --contig_ploidy_prior_table=/data/NGS/Reanalysis-Package/CNV/Bed-Files/contig_ploidy_priors.tsv --output_model_path=/data/NGS/Reanalysis-Package/CNV/ploidy-model
    at org.broadinstitute.hellbender.utils.python.PythonExecutorBase.getScriptException(PythonExecutorBase.java:75)
    at org.broadinstitute.hellbender.utils.runtime.ScriptExecutor.executeCuratedArgs(ScriptExecutor.java:126)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeArgs(PythonScriptExecutor.java:170)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:151)
    at org.broadinstitute.hellbender.utils.python.PythonScriptExecutor.executeScript(PythonScriptExecutor.java:121)
    at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.executeDeterminePloidyAndDepthPythonScript(DetermineGermlineContigPloidy.java:411)
    at org.broadinstitute.hellbender.tools.copynumber.DetermineGermlineContigPloidy.doWork(DetermineGermlineContigPloidy.java:288)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)


    So, it seems that the error is;

    pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 58, saw 7

    I googled a lot but I could not figure out what the problem is ( I have no experience working with python, I am just following the steps in here; https://gatkforums.broadinstitute.or...scussion/11684

    I am looking forward to hearing from you or anyone else with experience in this.

    Cheers,
    Zohreh

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • seqadmin
      Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 06:37 PM
    0 responses
    11 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 06:07 PM
    0 responses
    10 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2024, 10:03 AM
    0 responses
    51 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-21-2024, 07:32 AM
    0 responses
    68 views
    0 likes
    Last Post seqadmin  
    Working...
    X