Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pindel- empty output?

    Hello sequencing gurus,

    I am trying to setup a pipeline to analyze NGS data. I have been having trouble with 2 programs in particular (breakdancer_max and pindel). In regards to pindel, I have a correctly sorted and indexed bam file for input and the output looks like this (for every chromosome):

    *** Calling SV using Split-Read Analysis: /home/usr/apps/pindel024s/src ***
    >> Running Pindel on ALL: /home/usr/apps/pindel024s/src/pindel -f /srv/gs1/projects/lab/usr/data/hg19/ucsc.hg19.fasta -i /srv/gs1/projects/lab/usr/dir.2/JS2.sra.cfg -o /srv/gs1/projects/lab/usr/dir.2/JS2.sra -c ALL
    Pindel version 0.2.4s, June 18 2012.
    Looping over all chromosomes.
    Processing chromosome: chrM
    Chromosome Size: 16571
    NumBoxes: 60020 BoxSize: 667

    Looking at chromosome chrM bases 0 to 10000000.
    getReads chrM 20016571
    Insertsize in bamreads: 195
    Number of reads in current window: 0, + 0 - 0
    Number of reads where the close end could be mapped: 0, + 0 - 0
    Percentage of reads which could be mapped: + 0.00% - 0.00%

    No currentState.Reads for chrM found in /srv/gs1/projects/lab/usr/dir.2/JS2_L1_1_pf_aa.sorted.recal.bam
    BAM file index 0 0
    There are no reads for this bin.

    And the configuration file looks like this:

    /srv/gs1/projects/lab/usr/dir.2/JS2_L1_1_pf_aa.sorted.recal.bam 195 JS2

    I can't figure out what's wrong. I originally used an older version and then reinstalled the latest version of pindel (0.2.4s) and I get the same problem, the "BAM file index 0 0" "no reads in this bin" for every bin for every chromosome.

    Has anyone encountered a problem like this and been able to solve it? Any ideas would be really helpful.


  • #2
    Pindel "There are no reads for this bin"


    I also am having trouble with pindel

    I run this command on a BAM file:

    /opt/pindel/pindel -f /home/genetics/canfam2/canfam2.fasta -i SHY_01_input.txt -c chr18 -o SHY_01_out

    and I get this:

    Processing chromosome: chr18
    Chromosome Size: 58872314
    7888 10000
    Looking at chromosome chr18 bases 0 to 10000000.
    There are no reads for this bin.
    Looking at chromosome chr18 bases 10000000 to 20000000.
    There are no reads for this bin.
    Looking at chromosome chr18 bases 20000000 to 30000000.
    There are no reads for this bin.
    Looking at chromosome chr18 bases 30000000 to 40000000.
    There are no reads for this bin.
    Looking at chromosome chr18 bases 40000000 to 50000000.
    There are no reads for this bin.
    Looking at chromosome chr18 bases 50000000 to 60000000.
    There are no reads for this bin.
    Looking at chromosome chr18 bases 60000000 to 70000000.
    There are no reads for this bin.
    Looking at chromosome chr18 bases 70000000 to 80000000.
    There are no reads for this bin.

    Any ideas?


    • #3
      Originally posted by Seq_student View Post
      Hello sequencing gurus,

      I am trying to setup a pipeline to analyze NGS data. I have been having trouble with 2 programs in particular (breakdancer_max and pindel). In regards to pindel, I have a correctly sorted and indexed bam file for input and the output looks like this (for every chromosome):

      *** Calling SV using Split-Read Analysis: /home/usr/apps/pindel024s/src ***
      >> Running Pindel on ALL: /home/usr/apps/pindel024s/src/pindel -f /srv/gs1/projects/lab/usr/data/hg19/ucsc.hg19.fasta -i /srv/gs1/projects/lab/usr/dir.2/JS2.sra.cfg -o /srv/gs1/projects/lab/usr/dir.2/JS2.sra -c ALL
      Pindel version 0.2.4s, June 18 2012.
      Looping over all chromosomes.
      Processing chromosome: chrM
      Chromosome Size: 16571
      NumBoxes: 60020 BoxSize: 667

      Looking at chromosome chrM bases 0 to 10000000.
      getReads chrM 20016571
      Insertsize in bamreads: 195
      Number of reads in current window: 0, + 0 - 0
      Number of reads where the close end could be mapped: 0, + 0 - 0
      Percentage of reads which could be mapped: + 0.00% - 0.00%

      No currentState.Reads for chrM found in /srv/gs1/projects/lab/usr/dir.2/JS2_L1_1_pf_aa.sorted.recal.bam
      BAM file index 0 0
      There are no reads for this bin.

      And the configuration file looks like this:

      /srv/gs1/projects/lab/usr/dir.2/JS2_L1_1_pf_aa.sorted.recal.bam 195 JS2

      I can't figure out what's wrong. I originally used an older version and then reinstalled the latest version of pindel (0.2.4s) and I get the same problem, the "BAM file index 0 0" "no reads in this bin" for every bin for every chromosome.

      Has anyone encountered a problem like this and been able to solve it? Any ideas would be really helpful.

      Hi Seqstudent, I am having the same problem which I find vexing as I have run pindel on many older bam files without this problem. Did you ever figure out what the issue is?


      • #4
        I think it was the reference sequence in my case. After replacing the file and re-doing the indexes, it seemed to work OK.
        Last edited by mboursnell; 08-02-2013, 07:50 AM.


        • #5
          check whether you provide the same reference sequence used for mapping to pindel.
          chr1 is consider different than 1.
          check bam header (samtools view -H) and reference index (.fai), whether they match?


          • #6
            Originally posted by KaiYe View Post
            check whether you provide the same reference sequence used for mapping to pindel.
            chr1 is consider different than 1.
            check bam header (samtools view -H) and reference index (.fai), whether they match?
            first 10 bam headers:
            @SQ SN:1 LN:249250621
            @SQ SN:2 LN:243199373
            @SQ SN:3 LN:198022430
            @SQ SN:4 LN:191154276
            @SQ SN:5 LN:180915260
            @SQ SN:6 LN:171115067
            @SQ SN:7 LN:159138663
            @SQ SN:8 LN:146364022
            @SQ SN:9 LN:141213431
            @SQ SN:10 LN:135534747
            first 10 lines of ref.fasta.fai:

            1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 249250621 52 80 81
            2 dna:chromosome chromosome:GRCh37:2:1:243199373:1 243199373 252366358 80 81
            3 dna:chromosome chromosome:GRCh37:3:1:198022430:1 198022430 498605776 80 81
            4 dna:chromosome chromosome:GRCh37:4:1:191154276:1 191154276 699103539 80 81
            5 dna:chromosome chromosome:GRCh37:5:1:180915260:1 180915260 892647296 80 81
            6 dna:chromosome chromosome:GRCh37:6:1:171115067:1 171115067 1075824049 80 81
            7 dna:chromosome chromosome:GRCh37:7:1:159138663:1 159138663 1249078107 80 81
            8 dna:chromosome chromosome:GRCh37:8:1:146364022:1 146364022 1410206056 80 81
            9 dna:chromosome chromosome:GRCh37:9:1:141213431:1 141213431 1558399681 80 81
            10 dna:chromosome chromosome:GRCh37:10:1:135534747:1 135534747 1701378334 80 81
            ...They seem to match. I just can't figure out why pindel works for our files and not the new. The only difference between the two sets was the cleanup process that occurs before bwa. Could this be causing the problem?


            • #7
              Originally posted by dGho View Post
              first 10 bam headers:

              first 10 lines of ref.fasta.fai:

              ...They seem to match. I just can't figure out why pindel works for our files and not the new. The only difference between the two sets was the cleanup process that occurs before bwa. Could this be causing the problem?
              what kind of clean up?


              • #8
                We switched a new sequencing center and the preliminary cleanup involves removal of adapters and qc filteration of reads, and then removal of singletons and syncing of files. I'm sure the first sequencing center also performed similar cleanup. But the actual sequencing and cleanup may have changed slightly. Otherwise, the rest of the workflow (aligning etc) was performed by me in the same manner for both sets of samples.

                head of the raw fastq file (new set):

                @HISEQ:40239YACXX:5:1101:1629:2201 1:N:0:ATCACG
                @HISEQ:40239YACXX:5:1101:1657:2204 1:N:0:ATCACG
                @HISEQ:40239YACXX:5:1101:1727:2211 1:N:0:ATCACG
                head of fastq file after qc filtration and adapter trimming:

                So the actual fastq files look different. for example, the whole 1:Y:0:ATCACG part (sequence index,etc) is missing from the cleaned fastq. I don't know if this is the problem, but is this set of information that has been removed something that pindel uses?


                • #9
                  Removing sequence index is necessary. I do not see this step affects Pindel. what is "removal of singletons"? can you send me your processing steps for me to figure out?


                  • #10
                    Originally posted by KaiYe View Post
                    Removing sequence index is necessary. I do not see this step affects Pindel. what is "removal of singletons"? can you send me your processing steps for me to figure out?
                    Hello Kai,

                    Thank you so much. You are always very prompt and helpful when it comes to questions about your software.

                    We sequence paired end human DNA, the genome center removes the adaptor sequence using seqClean, then fasts-toolkit end trimming based on quality (remove bases with quality scores less than 13 from the end of each sequence). They also sync the fastqs(so they are in the same order) and filter out reads(along with their pair) that do not pass a specific quality threshold. By removal of singletons, I meant that sometimes one of the member of a pair does not pass the qc filters while the other member does. In this case the member that does pass qc (the singleton) is also removed.

                    After the initial cleanup performed by the genome sequencing center (above), I align to hg19 with bwa. Import sam to bam and index using samtools. I then use Picard to CleanSam and MarkDuplicates. Gatk then does IndelRealignment (realigns the area around indels) and BaseQuaityScoreRecalibration.

                    At this point the bams are ready for variant calling. I have tried running Pindel on the new samples before Picard, and also before GATK to see if it would make a difference; however, I still keep getting the blank output.

                    I use the same hg19.fa ref file for the whole pipeline. I am sorry if this is too detailed or not detailed enough. Any clue about what the issue may be would be great.


                    • #11
                      hi dGho,

                      I have questions about qc filters. how do you treat reads with poor mapping quality, clipped and unmapped, and so on. Pindel examines all reads without perfect mapping. If for some reasons, they are removed, Pindel will fail to capture variants.

                      checked your post on Aug 1:
                      "Looking at chromosome chrM bases 0 to 10000000."

                      there is "chr" in your reference file. but without it, in your bam file based on your post on aug 7.

                      insert size might also be an issue here. can you put 500 there and give another try?

                      providing a small dataset along with your reference file for me to reproduce the error is helpful. please also update your pindel version, which is already more than one year old. we had multiple major updates.



                      • #12
                        Originally posted by KaiYe View Post
                        hi dGho,

                        I have questions about qc filters. how do you treat reads with poor mapping quality, clipped and unmapped, and so on. Pindel examines all reads without perfect mapping. If for some reasons, they are removed, Pindel will fail to capture variants.

                        checked your post on Aug 1:
                        "Looking at chromosome chrM bases 0 to 10000000."

                        there is "chr" in your reference file. but without it, in your bam file based on your post on aug 7.

                        insert size might also be an issue here. can you put 500 there and give another try?

                        providing a small dataset along with your reference file for me to reproduce the error is helpful. please also update your pindel version, which is already more than one year old. we had multiple major updates.

                        Sorry for my late response. My post from Aug 1 was quoting another person who posted. I don't have chr in my reference or bams, just the numbers.

                        We recently received third set of exomes from the same new sequencing center. This new set is running through pindel with no problems like our old ones used to. The difference between these two sets was the following:

                        1. In the set that does not run on pindel: after removing reads from the raw fastqs that did not pass a qc threshold, the orphaned singletons (or reads that no longer have a pair bc one mate did not pass qc but the other did) were removed using the following script that I found here :


                        In the newest set of exomes, the sequencing center removed the singletons for me. Before and after this step, the pipeline was identical, so I must have messed something up when I ran that script. Other that pindel, I have not had any problems downstream with that script so far.

                        So this solved my specific problem.

                        Would you still like me to send you part of one of the fastqs that did not work to see why pindel cannot handle the fastqs after using this script? I don't think it is a widely used script.
                        Last edited by dGho; 08-15-2013, 12:28 PM.


                        • #13
                          can you provide a small region of the bam (10kb)? if the bam file size is less than 10MB, you can post to my email [email protected]. otherwise, dropbox or other means would be better. I will take a look.


                          Latest Articles


                          • seqadmin
                            Essential Discoveries and Tools in Epitranscriptomics
                            by seqadmin

                            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                            04-22-2024, 07:01 AM
                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin

                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM





                          Topics Statistics Last Post
                          Started by seqadmin, 05-02-2024, 08:06 AM
                          0 responses
                          Last Post seqadmin  
                          Started by seqadmin, 04-30-2024, 12:17 PM
                          0 responses
                          Last Post seqadmin  
                          Started by seqadmin, 04-29-2024, 10:49 AM
                          0 responses
                          Last Post seqadmin  
                          Started by seqadmin, 04-25-2024, 11:49 AM
                          0 responses
                          Last Post seqadmin  