Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DrWorm
    replied
    errors in production of cns file

    Hello all,

    I'm having issues working through the install of CNVkit. I've pinpointed my issues to problems with the segment sub-command. When I try to run the test procedure, it dies with this error:

    Traceback (most recent call last):
    File "../cnvkit.py", line 11, in <module>
    args.func(args)
    File "/home/snmcnulty/bin/cnvkit-master/cnvlib/commands.py", line 666, in _cmd_segment
    rlibpath=args.rlibpath)
    File "/home/snmcnulty/bin/cnvkit-master/cnvlib/segmentation/__init__.py", line 58, in do_segmentation
    segarr = cnarr.as_dataframe(seg2cns(seg_out))
    File "/home/snmcnulty/bin/cnvkit-master/cnvlib/segmentation/__init__.py", line 143, in seg2cns
    + seg_text)
    ValueError: Segmentation output is not valid SEG format:

    [then a big table]

    Makefile:57: recipe for target 'build/p2-5_5.cns' failed
    make: *** [build/p2-5_5.cns] Error 1

    If anyone could help me pinpoint the problem, I would really appreciate it!

    Leave a comment:


  • etal
    replied
    Hi Deepak,

    The sequence IDs are the chromosome names in your reference genome and the first column of your BED file. For the human genome the first chromosome might be "1" or "chr1" depending on where you got your reference genome.

    Check that the name schemes match between your BED and BAM files, either "1" or "chr1". You can use "samtools view -H" to see the BAM header. If the names don't match, then you should edit your BED file, either adding or removing the "chr" prefix, so that they do match.

    Hope that helps,
    Eric

    Leave a comment:


  • oxydeepu
    replied
    Sequence IDs don't match with bed Error CNVkit.

    Hi all,

    I trying to run CNVkit with tumor and normal samples on exome sequencing. But I tried all possible things mentioned in the docs. I tried different bed file input as the target region and also the annotations but I always end up getting the error.
    ValueError: BED file 'results/S04380110_Covered.target.bed' sequence IDs don't match any in BAM file.

    Anybody came across this issue. Please help.

    Thank you in advance.

    Deepak

    Leave a comment:


  • etal
    replied
    The mapping quality threshold is hard-coded to only exclude unmapped or ambiguously mapped reads, see here:


    Just change the -Q value to another integer if you'd like to try it yourself. However, I recall reading (can't find the reference at the moment) that keeping low-MAPQ reads did not harm copy number estimation and may have improved it.

    In any case, in CNVkit the script genome2access.py can be used to directly exclude poorly mappable genomic regions. This is done already for hg19 in the bundled file access-5k-mappable.hg19.bed.

    Leave a comment:


  • wisekh
    replied
    using a mapping quality threshold

    Dear Eric,
    I have another question. Is it possible to use a mapping quality threshold to, for example, a tumor bam or multiple tumor bams that I would infer copy number from?

    thank you,

    Hoon

    Leave a comment:


  • wisekh
    replied
    I got it. Thank you very much for your quick response.

    hoon

    Leave a comment:


  • etal
    replied
    Hi wisekh,

    The LOH functionality in CNVkit is described here:


    However, the "calls" are simply displayed visually -- the variant allele frequencies are plotted alongside the copy ratios, and a shift in VAF from 0.5 indicates LOH. I'm currently working on expanding this functionality to make it more useful.

    To run the complete pipeline with a tumor-normal pair and make a plot of the copy number and LOH shifts together, follow the quick start guide here:


    Separately from running the CNVkit "batch" pipeline, you'll need to call SNPs in the tumor sample in VCF format. Then use that VCF file as input to the CNVkit "scatter" command along with the .cnr and .cns files from the CNVkit pipeline to make the plot.

    Hope that helps,
    Eric

    Leave a comment:


  • wisekh
    replied
    calling LOH from a pair of tumor and normal exome bams

    Dear Eric,

    I wonder if you can provide an instruction/example on how to generate LOH calls from a pair of tumor and normal exom bams.
    I assume this tool can generate them, but I couldn't find a tuturoal in the cnvkit website. probably, I missed it?

    thank you very much in advance,

    wisekh

    Leave a comment:


  • lethalfang
    replied
    Originally posted by etal View Post
    What files were generated by the second run, if any? Can you show me the status messages or any errors that were printed?

    When you run CNVkit with a reference, you don't need the "--targets", "--fasta", "--split" and "--access" arguments as that information has already been captured in the reference file. The default output directory is the current directory ("."). Try this instead:

    cnvkit.py batch T1.bam T2.bam T3.bam T4.bam T5.bam T6.bam T7.bam T8.bam T9.bam T10.bam -r N.cnn --scatter --diagram
    Thanks. For the second run where I was having question, there is no error message, and a bunch of .targetcoverage.cnn files are generated.

    The status messages are:
    Code:
    Summary: #bins=292, #reads=7633921, mean=26143.5653, min=212.35, max=224646.99
    On-target percentage: 37.877 (of 20154547 mapped)
    Wrote ./T4.targetcoverage.cnn
    Running the CNVkit pipeline on T9.bam ...
    Processing reads in T9.bam
    Time: 52.775 seconds (154291 reads/sec, 6 bins/sec)
    Summary: #bins=292, #reads=8142711, mean=27885.9981, min=268.94, max=279225.27
    On-target percentage: 38.472 (of 21165340 mapped)
    Wrote ./T3.targetcoverage.cnn
    Running the CNVkit pipeline on T10.bam ...
    Processing reads in T10.bam
    Time: 55.564 seconds (154063 reads/sec, 5 bins/sec)
    Summary: #bins=292, #reads=8560291, mean=29316.0670, min=326.38, max=293055.07
    On-target percentage: 38.593 (of 22181003 mapped)
    Wrote ./T6.targetcoverage.cnn
    Time: 57.101 seconds (149544 reads/sec, 5 bins/sec)
    Summary: #bins=292, #reads=8539037, mean=29243.2782, min=302.54, max=292123.56
    Time: 57.092 seconds (155098 reads/sec, 5 bins/sec)
    Summary: #bins=292, #reads=8854878, mean=30324.9281, min=281.23, max=295576.58
    Time: 57.155 seconds (147061 reads/sec, 5 bins/sec)
    Summary: #bins=292, #reads=8405294, mean=28785.2540, min=254.16, max=285135.1
    On-target percentage: 38.485 (of 22188207 mapped)
    On-target percentage: 39.087 (of 22654458 mapped)
    On-target percentage: 38.633 (of 21757000 mapped)
    Wrote ./T2.targetcoverage.cnn
    Wrote ./T8.targetcoverage.cnn
    Wrote ./T1.targetcoverage.cnn
    Time: 58.704 seconds (143289 reads/sec, 5 bins/sec)
    Summary: #bins=292, #reads=8411624, mean=28806.9335, min=241.63, max=282191.54
    On-target percentage: 38.845 (of 21654563 mapped)
    Wrote ./T7.targetcoverage.cnn
    Time: 60.050 seconds (150608 reads/sec, 5 bins/sec)
    Summary: #bins=292, #reads=9044023, mean=30972.6832, min=298.46, max=306083.6
    On-target percentage: 38.815 (of 23300051 mapped)
    Wrote ./T5.targetcoverage.cnn
    Time: 50.180 seconds (165905 reads/sec, 6 bins/sec)
    Summary: #bins=292, #reads=8325083, mean=28510.5614, min=309.7, max=283705.95
    On-target percentage: 38.622 (of 21555485 mapped)
    Wrote ./T9.targetcoverage.cnn
    Time: 51.261 seconds (173281 reads/sec, 6 bins/sec)
    Summary: #bins=292, #reads=8882609, mean=30419.8946, min=299.88, max=296973.17
    On-target percentage: 38.693 (of 22956916 mapped)
    Wrote ./T10.targetcoverage.cnn


    I ran again with the command you suggested, and the results are now as expected, i.e., identical to the first run.

    Thanks.

    Leave a comment:


  • etal
    replied
    What files were generated by the second run, if any? Can you show me the status messages or any errors that were printed?

    When you run CNVkit with a reference, you don't need the "--targets", "--fasta", "--split" and "--access" arguments as that information has already been captured in the reference file. The default output directory is the current directory ("."). Try this instead:

    cnvkit.py batch T1.bam T2.bam T3.bam T4.bam T5.bam T6.bam T7.bam T8.bam T9.bam T10.bam -r N.cnn --scatter --diagram

    Leave a comment:


  • lethalfang
    replied
    When I tried to run more tumor samples using the reference .cnn file from a past normal run, there is no .cnr output.

    This gives me .cnr files:
    cnvkit.py batch T1.bam T2.bam T3.bam T4.bam T5.bam T6.bam T7.bam T8.bam T9.bam T10.bam --normal Na.bam Nb.bam Nc.bam --targets captures.bed --fasta b37_decoy.fasta --split --access b37.decoy.accessibles.bed --scatter --diagram --output-reference N.cnn --output-dir .


    This does not output .cnr pr antitargetcoverage.cnn file:
    cnvkit.py batch T1.bam T2.bam T3.bam T4.bam T5.bam T6.bam T7.bam T8.bam T9.bam T10.bam -r N.cnn --targets captures.bed --fasta b37_decoy.fasta --split --access b37.decoy.accessibles.bed --scatter --diagram --output-dir .

    What is missing? Thanks.

    Leave a comment:


  • etal
    replied
    Hi Ies,

    Sorry I missed your post earlier. There isn't a verbose logging mode in CNVkit, but the messages on standard error are fairly verbose already and should always report when there is an error. In particular, if something crashes then you'll see a Python traceback message. However, if you parallelized the batch run (-p >1), the messages from each process will be interleaved, which makes them somewhat harder to intepret.

    The scatter and diagram PDFs should always be generated in a batch run; there isn't a special code path where they would be skipped. Does the log say something like "Wrote MySample-scatter.pdf" for the missing PDFs, or not?

    The --scatter option uses matplotlib to generate a PDF. On a cluster, the default matplotlib backend (e.g. Wx or Gtk) might not be available, and so I guess it's possible the plotting engine of matplotlib gets confused and silently fails to write the file. You could address that by setting a different backend on your cluster -- create a file called "matplotlibrc" in the current working directory or your home folder, with the contents:

    backend : pdf

    The --diagram option uses a different backend, Reportlab, which always generates a PDF from scratch and does not have an interactive mode. I can't think of a reason why this one would occasionally fail to write a file. Can you suggest anything unusual about your system's configuration? Outdated software versions, maybe?

    If the diagram is showing labels for hundreds of genes, that means:
    (a) you did exome sequencing, so there's lots to show;
    (b) significant copy number alterations cover large regions of chromosomes in your sample; and/or
    (c) the purity of your tumor samples is fairly high.

    You can:

    - Thin out the labeling to some extent by specifying a higher threshold (-t) log2 ratio value in the diagram command; the default is 0.6, so try 0.8 or 0.9 to only show the higher-amplitude CNAs.
    - Drop the labels altogether by specifying a high value for -t or just passing the .cns segment file (with -s), without the .cnr.
    - Use the "heatmap" command instead to view the unlabeled CNA regions for many samples at once.
    - Use the "gainloss" command to list all genes with log2 ratio amplitudes beyond a given threshold, essentially the labels you're currently seeing on the diagram but in a more manageable plain-text, tabular format.

    The diagram is based on Biopython's Graphics/BasicChromosome module. If you're handy with Python and have a specific modification in mind, you could edit cnvlib/diagram.py (202 lines) to do it. For example, you can change PAGE_SIZE to much larger dimensions like 22x17" and the chromosomes will scale proportionally, but the gene labels stay the same size and will be more readable if they were overlapping before.

    Thanks for the suggestion on sample.bai, I'll look into it.

    Cheers,
    Eric

    Leave a comment:


  • inijman
    replied
    HI Etal,

    Very nice work on a promising tool.

    I'm giving it a go as well and what strikes me is that I don't always get the PDF files with the diagram and scatter option in a batch run. Is there a way to get verbose (error) logging?

    When I do generate them with the individual scatter/diagram option, they are there, but the diagram is very difficult to read as all the labels overlap each other. Is there a way to influence this, or break all chromosomes to sepperate pages?

    If i'm on a X11 terminal I can generate the plots, but when we run analyses on the cluster, there is no display available and we would need the pdf files.

    Additionally, it would be great if you could use both sample.bam.bai als sample.bai file as indices.

    Best, Ies
    ps: and Hi to all the other posters!
    Last edited by inijman; 03-02-2015, 03:42 AM.

    Leave a comment:


  • Mulos
    replied
    Works like a charm, thank you so much.

    Also, very pleased with the results!
    Last edited by Mulos; 02-27-2015, 07:25 AM.

    Leave a comment:


  • etal
    replied
    I've added the "-c/--count" option from the "coverage" command to the "batch" options as well. If you installed CNVkit from the GitHub repo, you can get the latest by either pulling the new commits or downloading the latest source code Zip file from the master branch and installing that.

    Please let me know if that works for you.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
30 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
32 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X