Seqanswers Leaderboard Ad

**etal** · 03-12-2015, 01:20 PM

Hi Ies,

Sorry I missed your post earlier. There isn't a verbose logging mode in CNVkit, but the messages on standard error are fairly verbose already and should always report when there is an error. In particular, if something crashes then you'll see a Python traceback message. However, if you parallelized the batch run (-p >1), the messages from each process will be interleaved, which makes them somewhat harder to intepret.

The scatter and diagram PDFs should always be generated in a batch run; there isn't a special code path where they would be skipped. Does the log say something like "Wrote MySample-scatter.pdf" for the missing PDFs, or not?

The --scatter option uses matplotlib to generate a PDF. On a cluster, the default matplotlib backend (e.g. Wx or Gtk) might not be available, and so I guess it's possible the plotting engine of matplotlib gets confused and silently fails to write the file. You could address that by setting a different backend on your cluster -- create a file called "matplotlibrc" in the current working directory or your home folder, with the contents:

backend : pdf

The --diagram option uses a different backend, Reportlab, which always generates a PDF from scratch and does not have an interactive mode. I can't think of a reason why this one would occasionally fail to write a file. Can you suggest anything unusual about your system's configuration? Outdated software versions, maybe?

If the diagram is showing labels for hundreds of genes, that means:
(a) you did exome sequencing, so there's lots to show;
(b) significant copy number alterations cover large regions of chromosomes in your sample; and/or
(c) the purity of your tumor samples is fairly high.

You can:

- Thin out the labeling to some extent by specifying a higher threshold (-t) log2 ratio value in the diagram command; the default is 0.6, so try 0.8 or 0.9 to only show the higher-amplitude CNAs.
- Drop the labels altogether by specifying a high value for -t or just passing the .cns segment file (with -s), without the .cnr.
- Use the "heatmap" command instead to view the unlabeled CNA regions for many samples at once.
- Use the "gainloss" command to list all genes with log2 ratio amplitudes beyond a given threshold, essentially the labels you're currently seeing on the diagram but in a more manageable plain-text, tabular format.

The diagram is based on Biopython's Graphics/BasicChromosome module. If you're handy with Python and have a specific modification in mind, you could edit cnvlib/diagram.py (202 lines) to do it. For example, you can change PAGE_SIZE to much larger dimensions like 22x17" and the chromosomes will scale proportionally, but the gene labels stay the same size and will be more readable if they were overlapping before.

Thanks for the suggestion on sample.bai, I'll look into it.

Cheers,
Eric

**lethalfang** · 04-06-2015, 09:44 AM

When I tried to run more tumor samples using the reference .cnn file from a past normal run, there is no .cnr output.

This gives me .cnr files:
cnvkit.py batch T1.bam T2.bam T3.bam T4.bam T5.bam T6.bam T7.bam T8.bam T9.bam T10.bam --normal Na.bam Nb.bam Nc.bam --targets captures.bed --fasta b37_decoy.fasta --split --access b37.decoy.accessibles.bed --scatter --diagram --output-reference N.cnn --output-dir .

This does not output .cnr pr antitargetcoverage.cnn file:
cnvkit.py batch T1.bam T2.bam T3.bam T4.bam T5.bam T6.bam T7.bam T8.bam T9.bam T10.bam -r N.cnn --targets captures.bed --fasta b37_decoy.fasta --split --access b37.decoy.accessibles.bed --scatter --diagram --output-dir .

What is missing? Thanks.

**etal** · 04-06-2015, 10:05 AM

What files were generated by the second run, if any? Can you show me the status messages or any errors that were printed?

When you run CNVkit with a reference, you don't need the "--targets", "--fasta", "--split" and "--access" arguments as that information has already been captured in the reference file. The default output directory is the current directory ("."). Try this instead:

cnvkit.py batch T1.bam T2.bam T3.bam T4.bam T5.bam T6.bam T7.bam T8.bam T9.bam T10.bam -r N.cnn --scatter --diagram

**lethalfang** · 04-06-2015, 10:15 AM

Originally posted by etal View Post

What files were generated by the second run, if any? Can you show me the status messages or any errors that were printed?

When you run CNVkit with a reference, you don't need the "--targets", "--fasta", "--split" and "--access" arguments as that information has already been captured in the reference file. The default output directory is the current directory ("."). Try this instead:

cnvkit.py batch T1.bam T2.bam T3.bam T4.bam T5.bam T6.bam T7.bam T8.bam T9.bam T10.bam -r N.cnn --scatter --diagram

Thanks. For the second run where I was having question, there is no error message, and a bunch of .targetcoverage.cnn files are generated.

The status messages are:

Code:

Summary: #bins=292, #reads=7633921, mean=26143.5653, min=212.35, max=224646.99
On-target percentage: 37.877 (of 20154547 mapped)
Wrote ./T4.targetcoverage.cnn
Running the CNVkit pipeline on T9.bam ...
Processing reads in T9.bam
Time: 52.775 seconds (154291 reads/sec, 6 bins/sec)
Summary: #bins=292, #reads=8142711, mean=27885.9981, min=268.94, max=279225.27
On-target percentage: 38.472 (of 21165340 mapped)
Wrote ./T3.targetcoverage.cnn
Running the CNVkit pipeline on T10.bam ...
Processing reads in T10.bam
Time: 55.564 seconds (154063 reads/sec, 5 bins/sec)
Summary: #bins=292, #reads=8560291, mean=29316.0670, min=326.38, max=293055.07
On-target percentage: 38.593 (of 22181003 mapped)
Wrote ./T6.targetcoverage.cnn
Time: 57.101 seconds (149544 reads/sec, 5 bins/sec)
Summary: #bins=292, #reads=8539037, mean=29243.2782, min=302.54, max=292123.56
Time: 57.092 seconds (155098 reads/sec, 5 bins/sec)
Summary: #bins=292, #reads=8854878, mean=30324.9281, min=281.23, max=295576.58
Time: 57.155 seconds (147061 reads/sec, 5 bins/sec)
Summary: #bins=292, #reads=8405294, mean=28785.2540, min=254.16, max=285135.1
On-target percentage: 38.485 (of 22188207 mapped)
On-target percentage: 39.087 (of 22654458 mapped)
On-target percentage: 38.633 (of 21757000 mapped)
Wrote ./T2.targetcoverage.cnn
Wrote ./T8.targetcoverage.cnn
Wrote ./T1.targetcoverage.cnn
Time: 58.704 seconds (143289 reads/sec, 5 bins/sec)
Summary: #bins=292, #reads=8411624, mean=28806.9335, min=241.63, max=282191.54
On-target percentage: 38.845 (of 21654563 mapped)
Wrote ./T7.targetcoverage.cnn
Time: 60.050 seconds (150608 reads/sec, 5 bins/sec)
Summary: #bins=292, #reads=9044023, mean=30972.6832, min=298.46, max=306083.6
On-target percentage: 38.815 (of 23300051 mapped)
Wrote ./T5.targetcoverage.cnn
Time: 50.180 seconds (165905 reads/sec, 6 bins/sec)
Summary: #bins=292, #reads=8325083, mean=28510.5614, min=309.7, max=283705.95
On-target percentage: 38.622 (of 21555485 mapped)
Wrote ./T9.targetcoverage.cnn
Time: 51.261 seconds (173281 reads/sec, 6 bins/sec)
Summary: #bins=292, #reads=8882609, mean=30419.8946, min=299.88, max=296973.17
On-target percentage: 38.693 (of 22956916 mapped)
Wrote ./T10.targetcoverage.cnn

I ran again with the command you suggested, and the results are now as expected, i.e., identical to the first run.

Thanks.

**wisekh** · 05-11-2015, 01:33 PM

calling LOH from a pair of tumor and normal exome bams

Dear Eric,

I wonder if you can provide an instruction/example on how to generate LOH calls from a pair of tumor and normal exom bams.
I assume this tool can generate them, but I couldn't find a tuturoal in the cnvkit website. probably, I missed it?

thank you very much in advance,

wisekh

**etal** · 05-11-2015, 03:15 PM

Hi wisekh,

The LOH functionality in CNVkit is described here:

Plots and graphics — CNVkit 0.9.6.dev0 documentation

http://cnvkit.readthedocs.org/en/latest/plots.html#scatter

However, the "calls" are simply displayed visually -- the variant allele frequencies are plotted alongside the copy ratios, and a shift in VAF from 0.5 indicates LOH. I'm currently working on expanding this functionality to make it more useful.

To run the complete pipeline with a tumor-normal pair and make a plot of the copy number and LOH shifts together, follow the quick start guide here:

Quick start — CNVkit 0.9.6.dev0 documentation

http://cnvkit.readthedocs.org/en/latest/quickstart.html

Separately from running the CNVkit "batch" pipeline, you'll need to call SNPs in the tumor sample in VCF format. Then use that VCF file as input to the CNVkit "scatter" command along with the .cnr and .cns files from the CNVkit pipeline to make the plot.

Hope that helps,
Eric

**wisekh** · 05-11-2015, 08:08 PM

I got it. Thank you very much for your quick response.

hoon

**wisekh** · 05-12-2015, 07:24 PM

using a mapping quality threshold

Dear Eric,
I have another question. Is it possible to use a mapping quality threshold to, for example, a tumor bam or multiple tumor bams that I would infer copy number from?

thank you,

Hoon

**etal** · 05-14-2015, 09:08 AM

The mapping quality threshold is hard-coded to only exclude unmapped or ambiguously mapped reads, see here:

cnvkit/cnvlib/coverage.py at master · etal/cnvkit

https://github.com/etal/cnvkit/blob/master/cnvlib/coverage.py#L115

Copy number variant detection from targeted DNA sequencing - etal/cnvkit

Just change the -Q value to another integer if you'd like to try it yourself. However, I recall reading (can't find the reference at the moment) that keeping low-MAPQ reads did not harm copy number estimation and may have improved it.

In any case, in CNVkit the script genome2access.py can be used to directly exclude poorly mappable genomic regions. This is done already for hg19 in the bundled file access-5k-mappable.hg19.bed.

**oxydeepu** · 03-08-2016, 04:25 AM

Sequence IDs don't match with bed Error CNVkit.

Hi all,

I trying to run CNVkit with tumor and normal samples on exome sequencing. But I tried all possible things mentioned in the docs. I tried different bed file input as the target region and also the annotations but I always end up getting the error.
ValueError: BED file 'results/S04380110_Covered.target.bed' sequence IDs don't match any in BAM file.

Anybody came across this issue. Please help.

Thank you in advance.

Deepak

**etal** · 03-08-2016, 07:33 AM

Hi Deepak,

The sequence IDs are the chromosome names in your reference genome and the first column of your BED file. For the human genome the first chromosome might be "1" or "chr1" depending on where you got your reference genome.

Check that the name schemes match between your BED and BAM files, either "1" or "chr1". You can use "samtools view -H" to see the BAM header. If the names don't match, then you should edit your BED file, either adding or removing the "chr" prefix, so that they do match.

Hope that helps,
Eric

**DrWorm** · 03-09-2016, 01:30 PM

errors in production of cns file

Hello all,

I'm having issues working through the install of CNVkit. I've pinpointed my issues to problems with the segment sub-command. When I try to run the test procedure, it dies with this error:

Traceback (most recent call last):
File "../cnvkit.py", line 11, in <module>
args.func(args)
File "/home/snmcnulty/bin/cnvkit-master/cnvlib/commands.py", line 666, in _cmd_segment
rlibpath=args.rlibpath)
File "/home/snmcnulty/bin/cnvkit-master/cnvlib/segmentation/__init__.py", line 58, in do_segmentation
segarr = cnarr.as_dataframe(seg2cns(seg_out))
File "/home/snmcnulty/bin/cnvkit-master/cnvlib/segmentation/__init__.py", line 143, in seg2cns
+ seg_text)
ValueError: Segmentation output is not valid SEG format:

[then a big table]

Makefile:57: recipe for target 'build/p2-5_5.cns' failed
make: *** [build/p2-5_5.cns] Error 1

If anyone could help me pinpoint the problem, I would really appreciate it!

**etal** · 03-09-2016, 03:25 PM

Sorry for the trouble. Could you post the first few lines of the output table and the version of CNVkit you're using (try "cnvkit.py version")?

If you're able to load the PSCBS package in R, could you post that package's version too?

If the default CBS method is failing, as an alternative you can try the "haar" method, which does not rely on R at all:

cnvkit.py segment MySample.cnr -m haar

(Use the -t parameter to specify the significance threshold, which might need to be very small for exome sequencing data to segment well.)

**oxydeepu** · 03-10-2016, 07:39 AM

Hi Eric,

I have solved the problems with the reference. But now the problem seems to be different at the step of calculating coverage. This is the error I get.

#######################################

cnvkit.py coverage /media/ae5197fd-42a8-4d8f-96d6-1c8762b4d8ae/MM02-N_MM02-020200/colossus_analysis/BAM/MM02-020200_final.bam my_targets.bed -o .targetcoverage.cnn
Processing reads in MM02-020200_final.bam
Traceback (most recent call last):
File "/usr/local/bin/cnvkit.py", line 4, in <module>
__import__('pkg_resources').run_script('CNVkit==0.7.9.dev0', 'cnvkit.py')
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 726, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1491, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/CNVkit-0.7.9.dev0-py2.7.egg/EGG-INFO/scripts/cnvkit.py", line 11, in <module>

File "build/bdist.linux-x86_64/egg/cnvlib/commands.py", line 434, in _cmd_coverage

File "build/bdist.linux-x86_64/egg/cnvlib/commands.py", line 457, in do_coverage

File "build/bdist.linux-x86_64/egg/cnvlib/coverage.py", line 37, in interval_coverages
File "build/bdist.linux-x86_64/egg/cnvlib/coverage.py", line 110, in interval_coverages_pileup
File "build/bdist.linux-x86_64/egg/cnvlib/coverage.py", line 143, in bedcov
RuntimeError: Bad line from bedcov:
1

############################

Can you shed some light into the issue?

Thank you.

Regards,
Deepak

**DrWorm** · 03-10-2016, 12:20 PM

segmentation, ctd.

M CNVKit version is 0.7.8. R is installed, and I'm using PSCBS v0.61.0. Here's the output I get from the segmentation command.

to connect to socket /tmp/dbus-Hufo1aFYF5: Connection refused
Dropped 1 outlier bins:
chromosome start end gene log2 weight
0 chr4 46638882 46738865 Background -4.1499 0.167756
Traceback (most recent call last):
File "/opt/apps/cnvkit/0.7.8/bin/cnvkit.py", line 11, in <module>
args.func(args)
File "/opt/apps/cnvkit/0.7.8/lib/python2.7/site-packages/cnvlib/commands.py", line 666, in _cmd_segment
rlibpath=args.rlibpath)
File "/opt/apps/cnvkit/0.7.8/lib/python2.7/site-packages/cnvlib/segmentation/__init__.py", line 58, in do_segmentation
segarr = cnarr.as_dataframe(seg2cns(seg_out))
File "/opt/apps/cnvkit/0.7.8/lib/python2.7/site-packages/cnvlib/segmentation/__init__.py", line 143, in seg2cns
+ seg_text)
ValueError: Segmentation output is not valid SEG format:
WARNING: ignoring environment value of R_HOME
"sampleName" "chromosome" "start" "end" "nbrOfLoci" "mean"
"19987_main" "chr1" 93708 2582834 32 0.0317406246246301
"19987_main" "chr1" 2684720 6239917 34 -0.392436669374812
"19987_main" "chr1" 6239917 26177167 253 0.0239173699112572
"19987_main" "chr1" 26177167 29477537 33 0.379337119869004

Also, you were right -- using a non-R workflow completed perfectly.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News