Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • issues on running ExomeCNV


    I am trying to use ExomeCNV to do copy number analysis on exome sequencing data. I have bam files sorted already. I tried to use shell scripts for generating coverage files, one error message came out:

    making pileup
    [main] The `pileup' command has been removed. Please use `mpileup' instead.

    I checked on all of the scripts provided by author, I saw 'pileup' command there, I don't know what that's supposed to mean. Also the coverage files generated from this run are "0" in the last four columns.

    command Line I used:
    sh sorted.bam exome.bed hg19.fasta script-directory


  • #2
    Hi Emily,

    It think your samtools version is causing all the trouble. ExomeCNV wiki page is recommending "v0.1.16 as newer version may not support pileup function/files". If your samtools version is newer you'll run into trouble, cause the pileup command has been replaced with mpileup. What you can do is the following:
    1.) Temporily replace samtools with version 0.1.16 OR
    2.) Manually edit the relevant scripts (those containing the line samtools pileup and replace it with mpileup OR
    3.) I recommend the following: Install GATK and run it as explained in their wiki:



    • #3
      Thanks Spoonman! it is version issue.

      Now I sort of have another problem. I got it run through, but the results don't look right. only chromosome1 has deletion/amplifications, on other chromosomes, there is really nothing happened. please see the attached output file.
      Do you happen to know what's wrong with it?

      Thanks again.
      Attached Files


      • #4
        Hi Emily,
        It' kind of hard to troubleshoot based on that picture. I personally never ran into that problem. I'll try to give you some hints:

        1.) Check your input file. Did the coverage calculation for your cell lines work properly? Do you have coverage information for all chromosomes?
        The output file should look similar to this:

        2.) In R you have to define the chromosomes you want to analyse. Are you sure you included all in your list? [chr.list = c("chr1","chr2","chr3",......)]

        3.) A more general recommendation: If you run ExomeCNV, I would run each command individually and check what the output variable looks like. Makes it easier to pinpoint where something started going wrong.



        • #5
          Thanks spoonman for the good tips!

          I used the author shell script to do the coverage, from the overview of coverage, the chromosome1 looks the best coverage compared to other chromosomes, I don't know why, it looks strange.
          If I want to run through R script, I have to add chrmosome 1 in the first option of chromosome list, otherwise it won't work, the error will be generated.



          • #6
            Hi Spoonman,

            I tried to use GATK to get the coverage file, from the user guide, there is one file unclear to me, which is "-L <exome.interval_list>". how did you get exome.interval_list using GATK?

            My output file only has 9 columns (no last column), but the sample file has 10 columns. What's yours?

            Last edited by emilyjia2000; 05-14-2012, 12:03 PM.


            • #7
              Hi Emily,
              -L <exome.interval_list> is the list of your targeted exons. I don't know what exons/genes you targeted (bait design). Each experiment has its own setup. You should probably ask the person that runs the NGS machine, I'm sure he/she knows what you need. But to be clear, GATK is not needed for "getting" the exon.interval_list. The exon.interval_list is information GATK needs so it can calculate read depth within the specified chromosomal range - listed in your exome.interval_list file. Be carful about the format of that file, as explained in their wiki: the file needs the following format:
              I think my file had all 10 columns, but I wouldn't be worried about that, as long as the first 3 column are there, once you read in the file in R the rest of the file information get's lost anyway.
              Read this:
              "Q: I noticed that when running function "read.coverage.gatk(file)" to reformat GATK output, these following two fields "sequenced.base" and "bease.with..10.coveratge" are all with "NA". Do you think "NA" in these two fields will affect the detection results?
              A: No, the fields "sequenced.base" and "bases.with..10.coverage" are degenerate and are not important for ExomeCNV to function. They are there for a historical reason."

              Good luck!


              • #8
                One single chromosome


                I have a bam file that containg only information for one chromosome (chr6).
                When I run

                % sh ~/programs/CNA/ bamFile exons.bed genome.fa ~/programs/CNA/

                I got the coverage files for all the chromosomes with 0 values, while all the wig files refere to chr1

                probe1 chr1 67000041 67000051 11 0 0 0 0
                probe35 chr1 48999844 48999965 122 0 0 0 0
                probe36 chr1 49000561 49000588 28 0 0 0 0
                probe37 chr1 49005313 49005410 98 0 0 0 0
                probe38 chr1 49052675 49052838 164 0 0 0 0

                So I run it with bed/fasta files including only data from chr6, but I still get 0 values in the coverage files, and all the wig files had info from chr6.

                Any ideas on how to solve this?



                • #9
                  Hi I'm running ExomeCNV for the my data.
                  I managed to get the coverage output though I get error while running the R script that they provide.

                  Error in if (lim.logR[1] == -Inf) lim.logR[1] = min(all.ecnv$logR[all.ecnv$logR != :
                  missing value where TRUE/FALSE needed
                  Calls: do.plot.eCNV
                  Execution halted

                  has any of you got this error and could suggest a solution?



                  • #10
                    Hi All,

                    I have a bam file that containg only information for chr21.
                    When I run sh

                    command :
                    sh /path for bam file/chr21.sortedpicard.rmdup.realigned.recal.bam /path for chr21.bed/chr21.CapturedRegion.bed /path for reference/chr21.fa

                    Output :
                    probe chr probe_start probe_end targeted base sequenced base coverage average coverage base with >10 coverage
                    probe1 chr21 9907173 9907501 329 0 0 0 0

                    I am running this bam2coverage script under samtools-0.1.16 which supports

                    Any suggestion and response would be highly accepted.



                    • #11

                      I would like to know how to make the files read if I take the output of GATK depthofcoverage, since it contains one file that contains all the chromosome summary together. How do I use the GATK files as input for ExomeCNV, I could only see that the command read.gatk.coverage but then what are the parameters?



                      • #12
                        am using the GATK DepthofCoverage output for exomeCNV, but am getting some error in dataframe.
                        My commands for normal and tumor for 3 chromosomes
                        chr.list = c("chr19","chr20","chr21")
                        normal = read.coverage.gatk("~/Desktop/Whole Genome Sequencing/CNV_detection_2015/ExomeCNV/GATK_DepthofCov_out/summary_coverage/S_313_N_S8980.coverage.sample_interval_summary")

                        tumor = read.coverage.gatk("/Users/vdas/Desktop/Whole Genome Sequencing/CNV_detection_2015/ExomeCNV/GATK_DepthofCov_out/summary_coverage/S_313_T_S7998.coverage.sample_interval_summary")

                        demo.logR = calculate.logR(normal, tumor)

                        demo.eCNV = c()
                        for (i in 1:length(chr.list)) {
                        idx = (normal$chr == chr.list[i])
                        ecnv = classify.eCNV(normal=normal[idx,], tumor=tumor[idx,], logR=demo.logR[idx], min.spec=0.9999, min.sens=0.9999, option="spec", c=0.5, l=70)
                        demo.eCNV = rbind(demo.eCNV, ecnv)

                        When am executing demo.eCNV runs with an error
                        Error in `$<`(`*tmp*`, "copy.number", value = NA) :
                        replacement has 1 row, data has 0

                        Also in the website it is written last column can be NA but I am having 2 columns as NA if I read the coverage file of gatk output. Once is the column of sequenced.base and the other is the last one. Also my chr column is in order or chr# but one calling the read.coverage.gatk it seems it expects chr1 should be like just 1 and chr string will be appended to it so the typical file uploaded is shown like below

                        probe chr probe_start probe_end targeted.base sequenced.base coverage average.coverage base.with..10.coverage
                        1 chr1:762098-762270 chrchr1 762098 762270 173 NA 141772 819.49 NA
                        2 chr1:861282-861490 chrchr1 861282 861490 209 NA 21078 100.85 NA
                        3 chr1:865592-865791 chrchr1 865592 865791 200 NA 26005 130.03 NA
                        4 chr1:866326-866498 chrchr1 866326 866498 173 NA 9629 55.66 NA
                        5 chr1:871060-871244 chrchr1 871060 871244 185 NA 52937 286.15 NA
                        6 chr1:874365-874774 chrchr1 874365 874774 410 NA 66160 161.37 NA
                        So there is a discrepancy in the dataframe. Can anyone suggest me how to get rid of this problem


                        Latest Articles


                        • seqadmin
                          Exploring the Dynamics of the Tumor Microenvironment
                          by seqadmin

                          The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                          07-08-2024, 03:19 PM
                        • seqadmin
                          Exploring Human Diversity Through Large-Scale Omics
                          by seqadmin

                          In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                          06-25-2024, 06:43 AM





                        Topics Statistics Last Post
                        Started by seqadmin, Today, 11:09 AM
                        0 responses
                        Last Post seqadmin  
                        Started by seqadmin, 07-19-2024, 07:20 AM
                        0 responses
                        Last Post seqadmin  
                        Started by seqadmin, 07-16-2024, 05:49 AM
                        0 responses
                        Last Post seqadmin  
                        Started by seqadmin, 07-15-2024, 06:53 AM
                        0 responses
                        Last Post seqadmin