Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • issues on running ExomeCNV

    Hello,

    I am trying to use ExomeCNV to do copy number analysis on exome sequencing data. I have bam files sorted already. I tried to use shell scripts for generating coverage files, one error message came out:

    making pileup
    [main] The `pileup' command has been removed. Please use `mpileup' instead.

    I checked on all of the scripts provided by author, I saw 'pileup' command there, I don't know what that's supposed to mean. Also the coverage files generated from this run are "0" in the last four columns.

    command Line I used:
    sh bam2coverage.sh sorted.bam exome.bed hg19.fasta script-directory

    Thanks,

  • #2
    Hi Emily,

    It think your samtools version is causing all the trouble. ExomeCNV wiki page is recommending "v0.1.16 as newer version may not support pileup function/files". If your samtools version is newer you'll run into trouble, cause the pileup command has been replaced with mpileup. What you can do is the following:
    1.) Temporily replace samtools with version 0.1.16 OR
    2.) Manually edit the relevant scripts (those containing the line samtools pileup and replace it with mpileup OR
    3.) I recommend the following: Install GATK and run it as explained in their wiki: https://secure.genome.ucla.edu/index...CNV_User_Guide

    Best,
    spoonman

    Comment


    • #3
      Thanks Spoonman! it is version issue.

      Now I sort of have another problem. I got it run through, but the results don't look right. only chromosome1 has deletion/amplifications, on other chromosomes, there is really nothing happened. please see the attached output file.
      Do you happen to know what's wrong with it?

      Thanks again.
      Attached Files

      Comment


      • #4
        Hi Emily,
        It' kind of hard to troubleshoot based on that picture. I personally never ran into that problem. I'll try to give you some hints:

        1.) Check your input file. Did the coverage calculation for your cell lines work properly? Do you have coverage information for all chromosomes?
        The output file should look similar to this: http://genome.ucla.edu/~fah/ExomeCNV...erage.gatk.txt

        2.) In R you have to define the chromosomes you want to analyse. Are you sure you included all in your list? [chr.list = c("chr1","chr2","chr3",......)]

        3.) A more general recommendation: If you run ExomeCNV, I would run each command individually and check what the output variable looks like. Makes it easier to pinpoint where something started going wrong.

        Best,
        spoonman

        Comment


        • #5
          Thanks spoonman for the good tips!

          I used the author shell script to do the coverage, from the overview of coverage, the chromosome1 looks the best coverage compared to other chromosomes, I don't know why, it looks strange.
          If I want to run through R script, I have to add chrmosome 1 in the first option of chromosome list, otherwise it won't work, the error will be generated.

          Thanks,

          Comment


          • #6
            Hi Spoonman,

            I tried to use GATK to get the coverage file, from the user guide, there is one file unclear to me, which is "-L <exome.interval_list>". how did you get exome.interval_list using GATK?

            My output file only has 9 columns (no last column), but the sample file has 10 columns. What's yours?

            Thanks,
            Last edited by emilyjia2000; 05-14-2012, 12:03 PM.

            Comment


            • #7
              Hi Emily,
              -L <exome.interval_list> is the list of your targeted exons. I don't know what exons/genes you targeted (bait design). Each experiment has its own setup. You should probably ask the person that runs the NGS machine, I'm sure he/she knows what you need. But to be clear, GATK is not needed for "getting" the exon.interval_list. The exon.interval_list is information GATK needs so it can calculate read depth within the specified chromosomal range - listed in your exome.interval_list file. Be carful about the format of that file, as explained in their wiki: https://secure.genome.ucla.edu/index...CNV_User_Guide the file needs the following format:
              chr#:start-end
              I think my file had all 10 columns, but I wouldn't be worried about that, as long as the first 3 column are there, once you read in the file in R the rest of the file information get's lost anyway.
              Read this:
              "Q: I noticed that when running function "read.coverage.gatk(file)" to reformat GATK output, these following two fields "sequenced.base" and "bease.with..10.coveratge" are all with "NA". Do you think "NA" in these two fields will affect the detection results?
              A: No, the fields "sequenced.base" and "bases.with..10.coverage" are degenerate and are not important for ExomeCNV to function. They are there for a historical reason."

              Good luck!
              spoonman

              Comment


              • #8
                One single chromosome

                Hi!

                I have a bam file that containg only information for one chromosome (chr6).
                When I run

                % sh ~/programs/CNA/bam2coverage.sh bamFile exons.bed genome.fa ~/programs/CNA/

                I got the coverage files for all the chromosomes with 0 values, while all the wig files refere to chr1


                probe1 chr1 67000041 67000051 11 0 0 0 0
                probe35 chr1 48999844 48999965 122 0 0 0 0
                probe36 chr1 49000561 49000588 28 0 0 0 0
                probe37 chr1 49005313 49005410 98 0 0 0 0
                probe38 chr1 49052675 49052838 164 0 0 0 0


                So I run it with bed/fasta files including only data from chr6, but I still get 0 values in the coverage files, and all the wig files had info from chr6.

                Any ideas on how to solve this?

                Thanks!

                Comment


                • #9
                  Hi I'm running ExomeCNV for the my data.
                  I managed to get the coverage output though I get error while running the R script that they provide.

                  ========================================
                  Error in if (lim.logR[1] == -Inf) lim.logR[1] = min(all.ecnv$logR[all.ecnv$logR != :
                  missing value where TRUE/FALSE needed
                  Calls: do.plot.eCNV
                  Execution halted
                  ========================================

                  has any of you got this error and could suggest a solution?

                  thanks
                  shruti

                  Comment


                  • #10
                    Hi All,

                    I have a bam file that containg only information for chr21.
                    When I run sh bam2coverage.sh

                    command :
                    sh bam2coverage.sh /path for bam file/chr21.sortedpicard.rmdup.realigned.recal.bam /path for chr21.bed/chr21.CapturedRegion.bed /path for reference/chr21.fa

                    Output :
                    probe chr probe_start probe_end targeted base sequenced base coverage average coverage base with >10 coverage
                    probe1 chr21 9907173 9907501 329 0 0 0 0

                    I am running this bam2coverage script under samtools-0.1.16 which supports bam2coverage.sh

                    Any suggestion and response would be highly accepted.

                    Thanks,
                    Rocky

                    Comment


                    • #11
                      @spoonman

                      I would like to know how to make the files read if I take the output of GATK depthofcoverage, since it contains one file that contains all the chromosome summary together. How do I use the GATK files as input for ExomeCNV, I could only see that the command read.gatk.coverage but then what are the parameters?

                      Thanks

                      Comment


                      • #12
                        am using the GATK DepthofCoverage output for exomeCNV, but am getting some error in dataframe.
                        My commands for normal and tumor for 3 chromosomes
                        chr.list = c("chr19","chr20","chr21")
                        normal = read.coverage.gatk("~/Desktop/Whole Genome Sequencing/CNV_detection_2015/ExomeCNV/GATK_DepthofCov_out/summary_coverage/S_313_N_S8980.coverage.sample_interval_summary")

                        tumor = read.coverage.gatk("/Users/vdas/Desktop/Whole Genome Sequencing/CNV_detection_2015/ExomeCNV/GATK_DepthofCov_out/summary_coverage/S_313_T_S7998.coverage.sample_interval_summary")



                        demo.logR = calculate.logR(normal, tumor)

                        demo.eCNV = c()
                        for (i in 1:length(chr.list)) {
                        idx = (normal$chr == chr.list[i])
                        ecnv = classify.eCNV(normal=normal[idx,], tumor=tumor[idx,], logR=demo.logR[idx], min.spec=0.9999, min.sens=0.9999, option="spec", c=0.5, l=70)
                        demo.eCNV = rbind(demo.eCNV, ecnv)
                        }

                        When am executing demo.eCNV runs with an error
                        Error in `$<-.data.frame`(`*tmp*`, "copy.number", value = NA) :
                        replacement has 1 row, data has 0

                        Also in the website it is written last column can be NA but I am having 2 columns as NA if I read the coverage file of gatk output. Once is the column of sequenced.base and the other is the last one. Also my chr column is in order or chr# but one calling the read.coverage.gatk it seems it expects chr1 should be like just 1 and chr string will be appended to it so the typical file uploaded is shown like below

                        probe chr probe_start probe_end targeted.base sequenced.base coverage average.coverage base.with..10.coverage
                        1 chr1:762098-762270 chrchr1 762098 762270 173 NA 141772 819.49 NA
                        2 chr1:861282-861490 chrchr1 861282 861490 209 NA 21078 100.85 NA
                        3 chr1:865592-865791 chrchr1 865592 865791 200 NA 26005 130.03 NA
                        4 chr1:866326-866498 chrchr1 866326 866498 173 NA 9629 55.66 NA
                        5 chr1:871060-871244 chrchr1 871060 871244 185 NA 52937 286.15 NA
                        6 chr1:874365-874774 chrchr1 874365 874774 410 NA 66160 161.37 NA
                        So there is a discrepancy in the dataframe. Can anyone suggest me how to get rid of this problem

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-27-2024, 06:37 PM
                        0 responses
                        13 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-27-2024, 06:07 PM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        69 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X