Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pindel: improved version for indels and structural variants

    hi all

    Just put an improved Pindel on my website https://trac.nbic.nl/pindel/ with wiki, mail list, user manual.

    An instruction using it from BWA mapping is provided. You can use it to detect indels and SVs at single-base resolution from SLX paired-end short reads.

    Currently 1bp-1M bp deletions and 1bp-(read length -20)bp insertions can be detected. You can also find events of non-template insertion in deletions.

    I am working on inversions and large insertions as well as using pindel for RNA-Seq data.
    Please comment on Pindel and suggest additional functions.

    Kai
    [email protected]
    Last edited by KaiYe; 10-28-2011, 07:26 AM.

  • #2
    Hi Kai,

    Just trying out pindel for the first time...one question on the bam2pindel step: in the user manual, the input for this script is described as "aln.NameSorted.MateFixed.bam". Does this mean I have to do something additional to the bam generated by samtools? If yes, what?

    Thanks,
    Sophia

    Comment


    • #3
      Hi Sophia,

      If you generate bam from sam directly after mapping with BWA, you don't have to do anything else.

      Kai

      Comment


      • #4
        Originally posted by KaiYe View Post
        Hi Sophia,

        If you generate bam from sam directly after mapping with BWA, you don't have to do anything else.

        Kai
        cool, thanks.

        Comment


        • #5
          I must be missing something.. it isn't producing any output for me, but it also isn't giving an error. I'm trying to convert my BAM file to the pindel format:

          [heather@frankie (Mon Jul 26 13:33:04)]% bam2pindel.pl -i aln.sort.pindel.bam -o out.pindel -s retina -om -pi 150
          [heather@frankie (Mon Jul 26 13:41:02)]% ls
          . aln.sort.fix.bam.bai horse_genome_v2_all.fa.ann horse_genome_v2_all.fa.rsa
          .. aln.sort.pindel.bam horse_genome_v2_all.fa.bwt horse_genome_v2_all.fa.sa
          align.sort.bam aln_read1.sai horse_genome_v2_all.fa.fai tag_trim_6_1.fq
          aln.bam aln_read2.sai horse_genome_v2_all.fa.pac tag_trim_6_2.fq
          aln.sam horse_genome_v2_all.fa horse_genome_v2_all.fa.rbwt
          aln.sort.fix.bam horse_genome_v2_all.fa.amb horse_genome_v2_all.fa.rpac

          Comment


          • #6
            Hi Kai,

            Can Pintel call large structural variants (>1M) now?

            Thanks.

            Wu
            Last edited by wuhoucdc; 08-27-2010, 02:55 PM.

            Comment


            • #7
              Dear Kai Ye,
              I've used the pindel software recently. I have heared that you have published a new version software, will you please give me the linkage please.




              Best,
              Cong chen
              Wenzhou Medical College

              Comment


              • #8
                Originally posted by tinacai View Post
                Dear Kai Ye,
                I've used the pindel software recently. I have heared that you have published a new version software, will you please give me the linkage please.




                Best,
                Cong chen
                Wenzhou Medical College
                Hi Cong Chen,

                It seems to me that I have sent you my latest Pindel for test. Have you experienced any problems in using it?

                Kai

                Comment


                • #9
                  Originally posted by wuhoucdc View Post
                  Hi Kai,

                  Can Pintel call large structural variants (>1M) now?

                  Thanks.

                  Wu
                  Pindel can detect variants of any sizes as long as they are not inter-chromosome events. The only thing I worry about is speed. The runtime is linear to the maximum size of SVs.

                  I am currently testing a new version with the following additional functions:
                  1. Allow sequence errors/SNPs in the same reads containing INDELs/SVs
                  2. non-template sequence in deletions
                  3. inversions
                  4. tandem duplications
                  5. breakpoints of large insertions

                  Please send me an email for ask for it in case you want to test it.

                  Cheers,

                  Kai

                  Comment


                  • #10
                    Originally posted by raela View Post
                    I must be missing something.. it isn't producing any output for me, but it also isn't giving an error. I'm trying to convert my BAM file to the pindel format:

                    [heather@frankie (Mon Jul 26 13:33:04)]% bam2pindel.pl -i aln.sort.pindel.bam -o out.pindel -s retina -om -pi 150
                    [heather@frankie (Mon Jul 26 13:41:02)]% ls
                    . aln.sort.fix.bam.bai horse_genome_v2_all.fa.ann horse_genome_v2_all.fa.rsa
                    .. aln.sort.pindel.bam horse_genome_v2_all.fa.bwt horse_genome_v2_all.fa.sa
                    align.sort.bam aln_read1.sai horse_genome_v2_all.fa.fai tag_trim_6_1.fq
                    aln.bam aln_read2.sai horse_genome_v2_all.fa.pac tag_trim_6_2.fq
                    aln.sam horse_genome_v2_all.fa horse_genome_v2_all.fa.rbwt
                    aln.sort.fix.bam horse_genome_v2_all.fa.amb horse_genome_v2_all.fa.rpac
                    Would you please inform me your email address? I have cpp code to extract reads from sam files for Pindel.

                    Thanks.

                    Comment


                    • #11
                      help

                      Hi KaiYe

                      I'm having problem running Pindel. Here's what I've done:
                      1) Download all files from http://www.ebi.ac.uk/~kye/pindel/v_0.2.0/
                      2) ran bam2pindel.pl on one paired-end samples (aligned using BWA). My bam file is sorted but it does not have the header expected by your program, so i used the -om to force the script to run.
                      a number of files is generated: e.g. myprefix.1.txt (chr1)
                      3) then I tried running pindel_x86_64, but i then got this error message: ./pindel_x86_64: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by ./pindel_x86_64)
                      4) i tried upgrading some packages in my redhat linux, but still the same.
                      5) i then downloaded your source code from sourceforge (with svn) and compiled your pindel from scratch. It seems to work.
                      6) I find the "-i" parameter confusing as it says "-i, --config-file: the bam file later to be a config file;" in the script but "Input: the unmapped reads in a modified fastq format" in your powerpoint manual.
                      7) I assumed -i refers to the files generated by bam2pindel.pl, so i tested the command on some chromosomes. E.g.
                      pindel_64 -f hg19.fasta -i myprefix.1.txt -o otherprefix -c 1 -b empty.txt
                      8) but whichever chromosome i try, i always get "There are no reads for this chromosome":

                      Processing chromosome 1
                      Processing chromosome 2
                      Skip the rest of chromosomes.
                      1 249250621 269250621
                      26926 10000
                      BreakDancer events: 0
                      There are no reads for this chromosome.


                      What have i done wrong?

                      my email is jason.li @ petermac.org

                      Thanks
                      Jason

                      Comment


                      • #12
                        Hi Kaiye,

                        interesting tool you got there. anyway, have you publish the method? I am curious about one thing, say you have 1 read, you will grow the pattern until you cannot get a match, then you find the rest of the read within the next 1-1M bps. What if there are several matches in the 1-1M bps region, which one do you use and what kind of consideration do you use to choose it?

                        Comment


                        • #13
                          Originally posted by rwenang View Post
                          Hi Kaiye,

                          interesting tool you got there. anyway, have you publish the method? I am curious about one thing, say you have 1 read, you will grow the pattern until you cannot get a match, then you find the rest of the read within the next 1-1M bps. What if there are several matches in the 1-1M bps region, which one do you use and what kind of consideration do you use to choose it?
                          Yes, Pindel has been published (http://www.ncbi.nlm.nih.gov/pubmed/19561018) and it was awarded best paper at ISMB 2009 Special Interest Group on Short Read Sequencing.


                          Only unique hit will be considered here.

                          Comment


                          • #14
                            Originally posted by jtjli View Post
                            Hi KaiYe

                            I'm having problem running Pindel. Here's what I've done:
                            1) Download all files from http://www.ebi.ac.uk/~kye/pindel/v_0.2.0/
                            2) ran bam2pindel.pl on one paired-end samples (aligned using BWA). My bam file is sorted but it does not have the header expected by your program, so i used the -om to force the script to run.
                            a number of files is generated: e.g. myprefix.1.txt (chr1)
                            3) then I tried running pindel_x86_64, but i then got this error message: ./pindel_x86_64: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by ./pindel_x86_64)
                            4) i tried upgrading some packages in my redhat linux, but still the same.
                            5) i then downloaded your source code from sourceforge (with svn) and compiled your pindel from scratch. It seems to work.
                            6) I find the "-i" parameter confusing as it says "-i, --config-file: the bam file later to be a config file;" in the script but "Input: the unmapped reads in a modified fastq format" in your powerpoint manual.
                            7) I assumed -i refers to the files generated by bam2pindel.pl, so i tested the command on some chromosomes. E.g.
                            pindel_64 -f hg19.fasta -i myprefix.1.txt -o otherprefix -c 1 -b empty.txt
                            8) but whichever chromosome i try, i always get "There are no reads for this chromosome":

                            Processing chromosome 1
                            Processing chromosome 2
                            Skip the rest of chromosomes.
                            1 249250621 269250621
                            26926 10000
                            BreakDancer events: 0
                            There are no reads for this chromosome.


                            What have i done wrong?

                            my email is jason.li @ petermac.org

                            Thanks
                            Jason
                            I will send you my source code via email.

                            Comment


                            • #15
                              Hi KaiYe,

                              I'm working with SOLiD data...and would like to use Pindel but couldn't find anything about it. is Pindel only for Illumina data?
                              thanks in advance for your reply.
                              Fabrice

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              45 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X