Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CNV-Seq output

    Dear all,
    I used CNV-Seq to call my CNVs, but I don't understand the output file with the following columns:
    chromosome, start, end, test, ref, position, log2, p.value, cnv, cnv.size, cnv.log2, cnv.p.value

    What is the real CNV-position (start and end)? Is the start at "position" and the end at "position+cnv.size"?
    What is the difference bewteen cnv.p.value and p.value?
    What is the meaning of cnv.log2 and log2?
    What is the meaning of the values for "ref" and "test"?

    Can sombody help me or recommend a documentation, which is easy to understand. I would be very happy about any help.

    Best regards
    Robby

  • #2
    The values for "test" and "ref" are the numbers of reads starting in region mentioned in column 2 and 3. Is that correct? So column 2 is just the start of the window and column 3 the end of the window, right?

    For one CNV I have to look for the same ID in column 9 => so CNV-Seq has multiple line for one CNV, correct?

    But what is the exact starting and end position of the CNV in the example below? And I don't still understand the difference between log2 and cnv.log2 and between p.value and cnv.p.value. Can sombody help me or comment on that?


    "chr1" 2250641 2250864 159 100 2250752 0.722575091988798 8.10063503680068e-07 77 448 0.964667045392917 3.93033619539229e-35
    "chr1" 2250753 2250976 171 87 2250864 1.02845734551634 3.42115799964058e-11 77 448 0.964667045392917 3.93033619539229e-35
    "chr1" 2250865 2251088 161 71 2250976 1.2347180850891 2.18142864565488e-14 77 448 0.964667045392917 3.93033619539229e-35
    "chr1" 2250977 2251200 107 60 2251088 0.888124717271795 4.18272787914529e-09 77 448 0.964667045392917 3.93033619539229e-35

    Comment


    • #3
      I still have problems to understand the output. Does really nobody understand the output or knows a good documentation?

      Comment


      • #4
        Robby,

        Were you able to find an answer to your issue? I am also confused about the output format. Please post the understanding here, if any.

        Comment


        • #5
          Hello,
          CNV-seq based on the overlapping-sliding window method. As the windows overlap, they not only give the "start" and "end" of the window but the midpoint ("position"), too. And it is on your own to choose start and end points for a cnv by columns "start" and "end" or by the column "position".
          cnv.log2 is the mean of all log2 values for this called cnv.
          How the p.values for the window itself and the called cnv is computed can be read in the white paper (equation 4 and 6 respectively)(PMID: 19267900).

          If you have any further questions don't hesitate to ask.
          VanAxel

          Comment


          • #6
            Is there any way we can report a bug for CNV-Seq, I tried searching for it but couldn't find.

            the software, for some strange reasons is not calling CNVs even when all conditions are met. I used a log2 threshold of 0.8 and a window size of only 2 so that I can look at the result by eye and judge which ones to pick.

            "CHROMOSOME_II" 1450045 1451171 404 416 1450608 0.881401332001257 2.29121717076986e-13 0 NA NA NA
            "CHROMOSOME_II" 1450609 1451735 570 464 1451172 1.22046668131509 1.31845290945157e-22 0 NA NA NA
            "CHROMOSOME_II" 1451173 1452299 629 550 1451736 1.11725796585782 1.18360892978282e-19 0 NA NA NA
            "CHROMOSOME_II" 1451737 1452863 671 657 1452300 0.954048963268409 3.24101505961822e-15 0 NA NA NA
            "CHROMOSOME_II" 1452301 1453427 602 577 1452864 0.984821735504775 5.02877099034994e-16 0 NA NA NA
            "CHROMOSOME_II" 1452865 1453991 513 516 1453428 0.915217327574355 3.2389370038797e-14 0 NA NA NA
            "CHROMOSOME_II" 1453429 1454555 590 516 1453992 1.1169734562165 1.20564598087743e-19 0 NA NA NA
            "CHROMOSOME_II" 1453993 1455119 551 465 1454556 1.16845116996632 4.16186035989176e-21 0 NA NA NA
            "CHROMOSOME_II" 1454557 1455683 374 369 1455120 0.943047021217795 6.25787394726544e-15 0 NA NA NA
            "CHROMOSOME_II" 1455121 1456247 415 422 1455684 0.899497904917657 8.08876111589643e-14 0 NA NA NA
            "CHROMOSOME_II" 1455685 1456811 615 573 1456248 1.02568083886025 4.03369375071261e-17 0 NA NA NA
            "CHROMOSOME_II" 1456249 1457375 666 575 1456812 1.13558978863008 3.59196486144447e-20 0 NA NA NA
            "CHROMOSOME_II" 1456813 1457939 663 565 1457376 1.15438757020059 1.04963378485956e-20 0 NA NA NA
            "CHROMOSOME_II" 1457377 1458503 609 524 1457940 1.14050498375944 2.60576418573692e-20 0 NA NA NA
            "CHROMOSOME_II" 1457941 1459067 410 408 1458504 0.930684324924505 1.30371091397085e-14 0 NA NA NA

            This entire >8kb region qualifies all conditions set for a CNV to be called and its still not called which makes me a little skeptical about the software itself. Can anyone list some other reliable alternatives.

            Comment


            • #7
              Error in cnv output (cnv-seq)

              Hi, when using the .cnv output from cnv-seq.pl and trying to plot with library(cnv) from cnv-seq in R there appears an error like this. I am upset because I have generated 7 files and some of them are working fine and some not and they are generated in the same way. Here I attach the error in R cran:

              > library(cnv)
              > data <- read.delim("my_data.cnv")
              > cnv.print(data)
              cnv chromosome start end size log2 p.value
              CNVR_1 chr Inf -Inf -Inf
              CNVR_0 chrY chrX chrMT chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8
              chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 398
              315321616 315321219 NA NA
              Warning messages:
              1: In min(sub$start) : no non-missing arguments to min; returning Inf
              2: In min(sub$position) : no non-missing arguments to min; returning Inf
              3: In max(sub$end) : no non-missing arguments to max; returning -Inf
              4: In max(sub$position) : no non-missing arguments to max; returning -Inf


              Thank you very much in advance!

              Comment


              • #8
                Hi Marevilla,
                i think your data set seems to be buggy because the normal output should look like this:
                CNV chromosome start end size log2 p.value type
                CNVR_1 chr7 1006251 1018750 12500 0.675152608894017 0.000319481853259238 Gain
                CNVR_2 chr7 2181251 2193750 12500 0.670496246462795 0.000349065546251931 Gain
                CNVR_3 chr7 2356251 2368750 12500 0.607797341092252 0.00110856198941488 Gain
                CNVR_4 chr7 3318751 3331250 12500 0.61291741275106 0.00101142355485097 Gain
                CNVR_5 chr7 5231251 5243750 12500 0.600241366643895 0.00126807991100568 Gain
                CNVR_6 chr7 6056251 6068750 12500 0.619870173392827 0.000892305035685526 Gain
                CNVR_7 chr7 6643751 6656250 12500 0.752320469416477 6.99749167890688e-05 Gain
                CNVR_8 chr7 7493751 7518750 25000 0.648214799086789 9.58225619127747e-07 Gain
                CNVR_9 chr7 7893751 7918750 25000 0.622685188999881 2.37316926738251e-06 Gain
                Maybe you need to check the output from your previous pipeline parts as they are buggy, too.

                The output one step before the above one should look like this:
                chromosome start end test ref position log2 p.value cnv cnv.size cnv.log2 cnv.p.value
                chr1 1 25000 91 139 12500 -0.406313758410827 0.0149832381084390 0 NA NA NA
                chr1 12501 37500 142 228 25000 -0.478310220546076 0.00561464244688826 0 NA NA NA
                chr1 25001 50000 149 202 37500 -0.234210288175650 0.102083476954782 0 NA NA NA
                chr1 37501 62500 135 161 50000 -0.0492686069498024 0.393692513368816 0 NA NA NA
                chr1 50001 75000 111 131 62500 -0.0341744610733607 0.425762996389164 0 NA NA NA
                chr1 62501 87500 83 94 75000 0.0252832537832712 0.444849658758006 0 NA NA NA
                chr1 75001 100000 87 88 87500 0.188344551325415 0.150648136271992 0 NA NA NA
                chr1 87501 112500 117 122 1e+05 0.144460056134502 0.213854859246385 0 NA NA NA
                chr1 100001 125000 121 118 112500 0.241052862026737 0.0931423468736603 0 NA NA NA
                regards
                VanAxel

                Comment


                • #9
                  Hello, VanAxel

                  I have one additional question. I got the exact same output file as you show..

                  chromosome start end test ref position log2 p.value cnv cnv.size cnv.log2 cnv.p.value
                  chr1 1 25000 91 139 12500 -0.406313758410827 0.0149832381084390 0 NA NA NA
                  chr1 12501 37500 142 228 25000 -0.478310220546076 0.00561464244688826 0 NA NA NA
                  chr1 25001 50000 149 202 37500 -0.234210288175650 0.102083476954782 0 NA NA NA
                  chr1 37501 62500 135 161 50000 -0.0492686069498024 0.393692513368816 0 NA NA NA
                  chr1 50001 75000 111 131 62500 -0.0341744610733607 0.425762996389164 0 NA NA NA
                  chr1 62501 87500 83 94 75000 0.0252832537832712 0.444849658758006 0 NA NA NA
                  chr1 75001 100000 87 88 87500 0.188344551325415 0.150648136271992 0 NA NA NA
                  chr1 87501 112500 117 122 1e+05 0.144460056134502 0.213854859246385 0 NA NA NA
                  chr1 100001 125000 121 118 112500 0.241052862026737 0.0931423468736603 0 NA NA NA


                  but from this file, how can I get the final file you showed??

                  CNV chromosome start end size log2 p.value type
                  CNVR_1 chr7 1006251 1018750 12500 0.675152608894017 0.000319481853259238 Gain
                  CNVR_2 chr7 2181251 2193750 12500 0.670496246462795 0.000349065546251931 Gain
                  CNVR_3 chr7 2356251 2368750 12500 0.607797341092252 0.00110856198941488 Gain
                  CNVR_4 chr7 3318751 3331250 12500 0.61291741275106 0.00101142355485097 Gain
                  CNVR_5 chr7 5231251 5243750 12500 0.600241366643895 0.00126807991100568 Gain
                  CNVR_6 chr7 6056251 6068750 12500 0.619870173392827 0.000892305035685526 Gain
                  CNVR_7 chr7 6643751 6656250 12500 0.752320469416477 6.99749167890688e-05 Gain
                  CNVR_8 chr7 7493751 7518750 25000 0.648214799086789 9.58225619127747e-07 Gain
                  CNVR_9 chr7 7893751 7918750 25000 0.622685188999881 2.37316926738251e-06 Gain

                  could you please let me know the way to get the final file??

                  Comment


                  • #10
                    Nevermind. I found the way..

                    cnv.print(data,file="xxx") will save the result as you suggest.. though it doesnot show "Gain" column

                    Comment


                    • #11
                      CNV-seq output error

                      Hello, VanAxel

                      Sorry for continuous posting.. I am still playing with CNV-seq.
                      From the cnv file, most of my line's cnv.size, cnv.log2, cnv.p.value havs NA like your posting.. but I have some lines which have not NA value for these column.

                      In this case, I do not have any problem to cnv.print(data) even though it is a little awkward.

                      BUT, for some analsyis, I had exactly same error with Marevilla. When I looked at the cnv file, I found that cnv.size, cnv.log2, cnv.p.value columnes have NA value for all lines in the cnv file. those columnes do not have any value at all except "NA". And in this case, I realized that cnv.summary(data) return

                      CNV percentage in genome: 0%
                      CNV nucleotide content: 0
                      CNV count: 0
                      Mean size: NaN
                      Median size: NA
                      Max Size: -Inf
                      Min Size: Inf
                      Warning messages:
                      1: In max(true$cnv.size) : no non-missing arguments to max; returning -Inf
                      2: In min(true$cnv.size) : no non-missing arguments to min; returning Inf


                      ... It is really weird.. even though I did the exactly smae thing for those two dataset!.

                      Could you please somebody help with this??

                      Comment


                      • #12
                        The last 3 columns are referring to CNV_Region, so if a sliding window is not part of a CNV region, the values will be NA.
                        If all values for the 3 columns are NA, it means no CNV region was found passing the criteria.

                        Comment


                        • #13
                          Hi all,

                          I am not sure if Marevilla and Ayush_Saxena are still interested, or if the developers are around.
                          I was having the same problem as some of you:
                          I run CNVseq for multiple bam files and some were not returning any significant regions while some other did.
                          After some hours with the R package I see there is a bug when estimating the window size to call the internal cnv.ANNO function. To fix it:
                          - Open the file "02.1.cnv.R"
                          - find the following line (line 59 in my version): step <- window.size/2;
                          - add this line below that one: step <- ceiling(step)

                          Comment


                          • #14
                            Originally posted by Pepe View Post
                            Hi all,

                            I am not sure if Marevilla and Ayush_Saxena are still interested, or if the developers are around.
                            I was having the same problem as some of you:
                            I run CNVseq for multiple bam files and some were not returning any significant regions while some other did.
                            After some hours with the R package I see there is a bug when estimating the window size to call the internal cnv.ANNO function. To fix it:
                            - Open the file "02.1.cnv.R"
                            - find the following line (line 59 in my version): step <- window.size/2;
                            - add this line below that one: step <- ceiling(step)
                            Thank you very much for the bugfix. The files are updated.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Yesterday, 06:37 PM
                            0 responses
                            10 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 06:07 PM
                            0 responses
                            9 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            50 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            67 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X