Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • etal
    Member
    • Oct 2013
    • 23

    CNVkit: Copy number detection and visualization for targeted sequencing using off-tar

    CNVkit is a software toolkit to infer and visualize copy number from targeted DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina.

    This method uses the nonspecifically captured off-target reads to supplement read depth information from on-target regions. With relatively simple normalization steps to make these read depths comparable across the genome, CNVkit can produce copy ratio estimates extremely close to those by array CGH.

    Manuscript preprint:
    Germline copy number variants (CNVs) and somatic copy number alterations (SCNAs) are of significant importance in syndromic conditions and cancer. Massive parallel sequencing is increasingly used to infer copy number information from variations in the read depth in sequencing data. However, this approach has limitations in the case of targeted re-sequencing, which leaves gaps in coverage between the regions chosen for enrichment and introduces biases related to the efficiency of target capture and library preparation. We present a method for copy number detection, implemented in the software package CNVkit, that uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome. This combination achieves both exon-level resolution in targeted regions and sufficient resolution in the larger intronic and intergenic regions to identify copy number changes. In particular, we successfully inferred copy number at equivalent to 100-kilobase resolution genome-wide from a platform targeting as few as 293 genes. After normalizing read counts to a pooled reference, we evaluated and corrected for three sources of bias that explain most of the extraneous variability in the sequencing read depth: GC content, target footprint size and spacing, and repetitive sequences. We compared the performance of CNVkit to copy number changes identified by array comparative genomic hybridization. We packaged the components of CNVkit so that it is straightforward to use and provides visualizations, detailed reporting of significant features, and export options for compatibility with other software. CNVkit is freely availabile from <http://github.com/etal/cnvkit>.


    Source code:


    Documentation:


    I've attempted to make CNVkit compatible with other software and easy to integrate into sequencing analysis pipelines. (Currently supported or under development: bcbio-nextgen, Galaxy, THetA2, IGV, BioDiscovery Nexus Copy Number, Java TreeView, probably others.) If you would like to see CNVkit play nicely with another existing program, please let me know.
    Last edited by etal; 02-04-2015, 09:41 AM.
  • jaspersaris
    Member
    • Nov 2008
    • 6

    #2
    Hi etal,

    As I am not fluent with python (and certainly not my collagues), I wondered what progress you made with running in Galaxy (is there already a toolshed-version?) and/or the combination with Biodiscovery's Nexus. Would the latter be a report in a txt-format readable by Nexus?

    With regards, Jasper

    Comment

    • etal
      Member
      • Oct 2013
      • 23

      #3
      Hi Jasper,

      If you would like to quickly get started without doing any technical work, I recommend the DNAnexus app, which is essentially complete and free to use initially after you've set up a DNAnexus account.

      The Galaxy tool is not feature-complete, but I've made it available for testing in the Test Tool Shed (not the production one) in order to learn how people would like to use it. I'll spend more time on this if there's interest, but at the moment it's not the easiest way to use CNVkit.

      The BioDiscovery Nexus Copy Number support in CNVkit is simple: CNVkit can export bin-level log2 copy ratio data to a tabular file which can then be loaded in Nexus Copy Number (e.g. "cnvkit.py export nexus-basic MySample.cnr -o MySample.nexus"). The DNAnexus applet also emits these files. Then, within Nexus Copy Number, load the generated file and specify the "basic" format. The copy number data should then appear similarly to array CGH.

      Also, please note that CNVkit does not require Python programming to use, only the Unix/Linux command line. Feel free to let me know if you find any of the commands difficult to use or poorly documented.
      Last edited by etal; 02-04-2015, 09:40 AM.

      Comment

      • jaspersaris
        Member
        • Nov 2008
        • 6

        #4
        Hi Etal,

        For the moment it appears that using a Galaxy is the better solution for us. I am running one in a VMware and installed CNVkit this morning. Now I am uploading some .bam files to work with.

        With regards, Jasper

        Comment

        • jaspersaris
          Member
          • Nov 2008
          • 6

          #5
          Hi Etal,

          In the current galaxy wrapper all three .bam files appear in both the 'sample' list, as well the 'Normal' list, without a selection or adjustment option.

          Comment

          • etal
            Member
            • Oct 2013
            • 23

            #6
            The BAM files available on your server or under your account (however you've set it up) should be visible under either field in the input form, and you should be able to select the files you want for each field through the usual means, e.g. Ctrl+click. Choose the tumor samples for the "sample" list, and any normal or germline files that were sequenced under the same protocol for the "normal" list.

            You can also run it without any "normal" BAM inputs and the results should be reasonably good.

            Comment

            • shrutimish@gmail.com
              Member
              • Dec 2012
              • 12

              #7
              Hi etal, I have a more basic question for CNV detection- If I have a custom target panel through Ampliseq multiplex PCR, then is there a way to detect CNVs in that data?

              Your CNVkit software works only for hybrid capture target panels like TSCA (Illumina) and Haloplex (Agilent)? Am I getting it correct?

              Thanks

              Comment

              • etal
                Member
                • Oct 2013
                • 23

                #8
                Yes, CNVkit is designed for hybrid capture target panels like those of Illumina, Agilent and Nimblegen. For best results from amplicon-based targeted resequencing, you should probably use another program like OncoCNV.

                I've recently added a little bit of support for running the CNVkit pipeline without using any off-target bins. You can try it by downloading release 0.3.4 and following the instructions here. However, this new feature isn't thoroughly tested and I'm told it doesn't work for everyone, and in any case another layer of gene-level normalizations would be needed to make it perform well on targeted amplicon sequencing data. I'll let SeqAnswers know when that mode is ready for broader use.
                Last edited by etal; 02-23-2015, 09:58 PM.

                Comment

                • shrutimish@gmail.com
                  Member
                  • Dec 2012
                  • 12

                  #9
                  Thanks etal for your reply and this information. I will look into OncoCNV.

                  Comment

                  • Mulos
                    Junior Member
                    • Mar 2010
                    • 3

                    #10
                    fail to open file?

                    Hi Etal,

                    I wanted to test CNVkit on a small set of targeted sequencing data, but it is giving me trouble. With both batch and coverage (I haven't tried the other functions yet), I get the error:

                    "Processing reads in test.bam
                    [E::hts_open] fail to open file '-Q'
                    Segmentation fault: 11"

                    The indexing right before this worked fine. The chromosome names in my bam-files, fasta and target files are all the same (no "chr" prefix). The tests of cnvkit were all ok, no errors or warnings, and my pysam and pyvcf are up to date.
                    Any idea what I can do to fix this?

                    Thanks in advance,
                    Marlous

                    Comment

                    • etal
                      Member
                      • Oct 2013
                      • 23

                      #11
                      Hi Marlous,

                      Thanks for reporting this. Can you show me the commands you used to trigger this error, including any special options you used? Are your BAM files definitely in BAM format, and e.g. "samtools idxstats" works on your BAM files?

                      I think this error indicates that command line options for "samtools bedcov" on your system are different than those in the versions I tested, missing the "-Q" option to set a minimum mapping quality score when counting reads. This would break CNVkit's coverage command internally. The source code of pysam's bundled samtools (in both pysam 0.8.0 and the current trunk) shows that the bedcov command does still accept the -Q option (using getopt), so either you have a version of pysam/samtools/htslib different from what I looked at, or I'm misunderstanding the error. The bedcov command is barely documented in samtools 1.1 and I can't see why this option might be missing in some versions. Maybe a samtools expert here can help?

                      In the meantime, you can avoid the call to bedcov with "cnvkit.py coverage --count", which returns similar results to the default strategy but filters out unmapped reads directly using the Pysam API, rather than by calling bedcov with the -Q option.

                      Best,
                      Eric
                      Last edited by etal; 02-25-2015, 11:28 PM. Reason: Inspected the current pysam/samtools source code.

                      Comment

                      • Mulos
                        Junior Member
                        • Mar 2010
                        • 3

                        #12
                        Hey Eric,

                        Thanks for the quick reply! My bam-files are indeed bam-files, the idxstats works fine. I tried samtools bedcov from the command line (version 1.2), without the '-Q' option it works fine but if I add this option I get the same error message, so it seems to be a samtools-related error indeed...
                        I was mostly interested in the batch option from CNVkit, is it also possible to run this while circumventing the -Q?

                        Just to be sure, the command I used for the coverage analysis:
                        python cnvkit.py coverage ~/Documents/CNVkit/testT_dedup.realigned.bam ~/Documents/genomes/design_test4.bed -o ~/Documents/CNVkit/test_dedup.realigned.cnn
                        Output:
                        Processing reads in testT_dedup.realigned.bam
                        [E::hts_open] fail to open file '-Q'
                        Segmentation fault: 11
                        For the batch analysis:
                        python cnvkit.py batch ~/Documents/CNVkit/*T_dedup.realigned.bam --normal ~/Documents/CNVkit/*R*bam --fasta ~/Documents/genomes/GRCh37_gatk.fasta --access ~/bin/cnvkit/data/access-10kb.hg19_noChr.bed --output-reference ~/Documents/CNVkit/test.cnn --output-dir ~/Documents/test/ -t ~/Documents/genomes/design_test4.bed
                        Output:
                        Detected file format: BED
                        Detected file format: BED
                        Detected file format: BED
                        Wrote /Users/.../Documents/CNVkit/test/design_test4.antitarget.bed with 1678 background intervals
                        Building a copy number reference from normal samples...
                        Processing reads in test1R_dedup.realigned.bam
                        [E::hts_open] fail to open file '-Q'
                        Segmentation fault: 11
                        Thanks again, Marlous

                        Comment

                        • etal
                          Member
                          • Oct 2013
                          • 23

                          #13
                          I've added the "-c/--count" option from the "coverage" command to the "batch" options as well. If you installed CNVkit from the GitHub repo, you can get the latest by either pulling the new commits or downloading the latest source code Zip file from the master branch and installing that.

                          Please let me know if that works for you.

                          Comment

                          • Mulos
                            Junior Member
                            • Mar 2010
                            • 3

                            #14
                            Works like a charm, thank you so much.

                            Also, very pleased with the results!
                            Last edited by Mulos; 02-27-2015, 07:25 AM.

                            Comment

                            • inijman
                              Junior Member
                              • May 2009
                              • 8

                              #15
                              HI Etal,

                              Very nice work on a promising tool.

                              I'm giving it a go as well and what strikes me is that I don't always get the PDF files with the diagram and scatter option in a batch run. Is there a way to get verbose (error) logging?

                              When I do generate them with the individual scatter/diagram option, they are there, but the diagram is very difficult to read as all the labels overlap each other. Is there a way to influence this, or break all chromosomes to sepperate pages?

                              If i'm on a X11 terminal I can generate the plots, but when we run analyses on the cluster, there is no display available and we would need the pdf files.

                              Additionally, it would be great if you could use both sample.bam.bai als sample.bai file as indices.

                              Best, Ies
                              ps: and Hi to all the other posters!
                              Last edited by inijman; 03-02-2015, 03:42 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              41 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              36 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              192 views
                              0 reactions
                              Last Post seqadmin  
                              Working...