Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    you can directly use the bam file as input. there are some optimization steps for BWA but bowtie and novelalign should work.

    kai

    Comment


    • #92
      -A is mapping quality of the mate
      -x is about setting up Max_D_Size

      Originally posted by dGho View Post
      -A/--anchor_quality
      the minimal mapping quality of the reads Pindel uses as anchor
      (default 20)

      I am sorry, I also have another question. Is the anchor quality (above), the threshold alignment score for the mapped read? I just want to make sure I understand correctly.


      and in the case of:

      -x/--max_range_index
      the maximum size of structural variations to be detected; the higher this number, the greater the number of SVs reported, but the computational cost and memory requirements increase, as does the rate of false positives. 1=128, 2=512, 3=2,048, 4=8,092, 5=32,368, 6=129,472, 7=517,888, 8=2,071,552, 9=8,286,208 (maximum 9, default 5)

      Is this the same as the user-defined Maximum Deletion Size parameter (Max_D_Size) that is referred to in the original Pindel paper

      Comment


      • #93
        Originally posted by shuiwudao View Post
        Hi, Kai ye
        I can't solve this problem:


        Do you have any idea?

        Thank you in advance for your help
        looks like that you are using the very first version of pindel. please get the latest version at github.com/genome/pindel

        Comment


        • #94
          Hi KaiYe,

          Thanks for the previous clarification reagrding bam file usage.

          Now I have two more queries regarding pindel2vcf output:

          1) I converted pindel output to VCF using pindel2vcf utility. Here I got a SVTYPE=RPL, which was not in any individual variation files. What is SVTYPE=RPL? Which variation type is classified (DEL,INV,INS, etc) as RPL and on what basis you call a variation as RPL????


          2) Can pindel detect translocations???

          Comment


          • #95
            RPL = replacement. a sequence is replaced by another one (probably shorter). this happens when breakpoints are linked and repaired. we have seen this in cancer data and validated it.

            the latest version does call translocations: github.com/genome/pindel

            Comment


            • #96
              Thanks a lot KaiYe..

              Comment


              • #97
                Hi Kai,

                Thanks for developing this wonderful tool. I just have one little question about filtering based on AD. I need to know the relative depth of reads supporting reference and alternate alleles to evaluate the likelihood of the alternate. I remember in one of the posts you mentioned that:

                "As Pindel ignores reads supporting the reference allele to speed up, we only report the number of variant supporting reads per sample with strand and uniqueness information."

                Since our lab has loads of computer power, do you have a version that gives the information of reads supporting the reference? If not, what is your recommendation of getting this info. Thanks very much!

                Comment


                • #98
                  Originally posted by AlisonF View Post
                  Hi Kai,

                  Thanks for developing this wonderful tool. I just have one little question about filtering based on AD. I need to know the relative depth of reads supporting reference and alternate alleles to evaluate the likelihood of the alternate. I remember in one of the posts you mentioned that:

                  "As Pindel ignores reads supporting the reference allele to speed up, we only report the number of variant supporting reads per sample with strand and uniqueness information."

                  Since our lab has loads of computer power, do you have a version that gives the information of reads supporting the reference? If not, what is your recommendation of getting this info. Thanks very much!
                  in the latest version, the number of reads supporting the reference allele is also provided for each variant per sample. please checkout the latest version at github.com/genome/pindel

                  let me know if you have any questions.

                  Comment


                  • #99
                    Hi Kai,

                    Thanks for your quick response!
                    I re-ran my data using the latest version and it does give me the RD. Thank you very much.

                    One more question about the GT:RD:AD. It seems like the newest version is able to tell homozygous/heterozous although when using pindel2vcf tool, this message still shows up:

                    " Since pindel cannot yet genotype events (distinguish between 0/1 and 1/1) all events are called as 0/0 (not found) or 0/1, even while some may very well be homozygous alternative (1/1). "


                    In the new vcf, I see samples with:

                    .:0/0 => both Ref & Alt weren't detected
                    0/0:378:1 => homozygous Ref, even though AD=1, we can ignore it
                    0/1:2:1 => heterozygous
                    1/1:0:7 => homozygous Alt

                    Am I interpreting them correctly?
                    What about 0/1:0:5, which looks to me should be 1/1:0:5 (homo Alt)?

                    Thanks for your help!

                    Comment


                    • Originally posted by AlisonF View Post
                      Hi Kai,

                      Thanks for your quick response!
                      I re-ran my data using the latest version and it does give me the RD. Thank you very much.

                      One more question about the GT:RD:AD. It seems like the newest version is able to tell homozygous/heterozous although when using pindel2vcf tool, this message still shows up:

                      " Since pindel cannot yet genotype events (distinguish between 0/1 and 1/1) all events are called as 0/0 (not found) or 0/1, even while some may very well be homozygous alternative (1/1). "


                      In the new vcf, I see samples with:

                      .:0/0 => both Ref & Alt weren't detected
                      0/0:378:1 => homozygous Ref, even though AD=1, we can ignore it
                      0/1:2:1 => heterozygous
                      1/1:0:7 => homozygous Alt

                      Am I interpreting them correctly?
                      What about 0/1:0:5, which looks to me should be 1/1:0:5 (homo Alt)?

                      Thanks for your help!
                      thanks for the hint, I am going to remove the message in pindel2vcf. it should be "1/1:0:5". can you provide me the result for that particular variant so that I can look into this?

                      Comment


                      • Hi Kai,

                        The result for this particular variant is:

                        1 11906968 . GC G . PASS END=11906969;HOMLEN=9;HOMSEQ=CCCCCCCCC;SVLEN=-1;SVTYPE=DEL GT:RD:AD 0/1:0:5 1/1:0:6 0/1:0:1 0/1:4:4 .:0:0 0/0:2:0 1/1:0:7 0/1:6:3 0/0:6:0 0/0:8:0

                        Another variant:

                        1 17085265 . TAA CAG . PASS END=17085266;HOMLEN=0;SVLEN=-3;SVTYPE=RPL;NTLEN=3 GT:RD:AD 0/0:32:0 0/0:20:0 0/0:48:0 0/1:12:1 0/0:24:3 0/1:0:1 0/0:4:0 0/0:6:0 0/0:52:0 0/0:18:0

                        Not quite understand why 0/1:12:1 is a het, whereas 0/0:24:3 is homo ref since the latter has a greater AD/RD ratio. Thanks!

                        Comment


                        • Originally posted by AlisonF View Post
                          Hi Kai,

                          The result for this particular variant is:

                          1 11906968 . GC G . PASS END=11906969;HOMLEN=9;HOMSEQ=CCCCCCCCC;SVLEN=-1;SVTYPE=DEL GT:RD:AD 0/1:0:5 1/1:0:6 0/1:0:1 0/1:4:4 .:0:0 0/0:2:0 1/1:0:7 0/1:6:3 0/0:6:0 0/0:8:0

                          Another variant:

                          1 17085265 . TAA CAG . PASS END=17085266;HOMLEN=0;SVLEN=-3;SVTYPE=RPL;NTLEN=3 GT:RD:AD 0/0:32:0 0/0:20:0 0/0:48:0 0/1:12:1 0/0:24:3 0/1:0:1 0/0:4:0 0/0:6:0 0/0:52:0 0/0:18:0

                          Not quite understand why 0/1:12:1 is a het, whereas 0/0:24:3 is homo ref since the latter has a greater AD/RD ratio. Thanks!
                          can you provide the input lines for those two calls?

                          you can grep ChrID and then get the lines.

                          Comment


                          • Hi Kai,

                            I guess you mean this: (below is for the first variant)

                            341 D 1 NT 0 "" ChrID 1 BP 11906968 11906970 BP_range 11906968 11906979 Supports 26 24 + 0 0 - 26 24 S1 27 SUM_MS 1560 10 NumSupSamples 6 6 30079.101 0 0 0 0 5 4 30079.102 0 0 0 0 6 6 30079.201 0 0 0 0 1 1 802.101 2 2 0 0 4 4 802.202 0 0 0 0 0 0 840.101 1 1 0 0 0 0 840.201 0 0 0 0 7 6 AC531.101 3 3 0 0 3 3 AC531.102 3 3 0 0 0 0 AC531.201 4 4 0 0 0 0


                            I re-ran the new pindel and the second time, it did not detect the second variant that I posted (not sure why, parameters were the same). But if I see those type of variants, I'll let you know. Thanks!

                            Comment


                            • Originally posted by AlisonF View Post
                              Hi Kai,

                              I guess you mean this: (below is for the first variant)

                              341 D 1 NT 0 "" ChrID 1 BP 11906968 11906970 BP_range 11906968 11906979 Supports 26 24 + 0 0 - 26 24 S1 27 SUM_MS 1560 10 NumSupSamples 6 6 30079.101 0 0 0 0 5 4 30079.102 0 0 0 0 6 6 30079.201 0 0 0 0 1 1 802.101 2 2 0 0 4 4 802.202 0 0 0 0 0 0 840.101 1 1 0 0 0 0 840.201 0 0 0 0 7 6 AC531.101 3 3 0 0 3 3 AC531.102 3 3 0 0 0 0 AC531.201 4 4 0 0 0 0




                              I re-ran the new pindel and the second time, it did not detect the second variant that I posted (not sure why, parameters were the same). But if I see those type of variants, I'll let you know. Thanks!

                              here is what I have with the latest verion pindel2vcf.
                              CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 30079.101 30079.102 30079.201 802.101 802.202 840.101 840.201 AC531.101 AC531.102 AC531.201
                              chr1 11906968 . GC G . PASS END=11906969;HOMLEN=9;HOMSEQ=CCCCCCCCC;SVLEN=-1;SVTYPE=DEL GT:AD 0/0:0,5 0/0:0,6 0/0:0,1 0/0:2,4 0/0:0,0 0/0:1,0 0/0:0,7 0/0:3,3 0/0:3,0 0/0:4,0

                              If ref+alt<10, then give 0/0
                              else if (vaf between 0.2 and 0.8), 0/1
                              else if (vaf > 0.8) 1/1

                              you can change setting when running pindel2vcf

                              Comment


                              • Hi Kai,

                                I am just writing to confirm with you that the latest Pindel version is 0.2.5, June 4 2013, and it is this version of pindel2vcf that you are using to get the output above. Thanks!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                17 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X