Header Leaderboard Ad

Collapse

All ChIP Seq Reads failing to Align to reference genome

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • All ChIP Seq Reads failing to Align to reference genome

    Hello NGS community,
    I'm new to NGS analysis. I have some chipseq data for a transcription factor. The sequencing facility provided data as CRAM files. I did conversion to BAM and tried to do the downstream analysis but for every conversion I was getting problems. So i decided to go from CRAM to FastQ, do the alignment and then do the analysis. I have tried the following command line, however, all my reads fail to align. Is it because the parameters are too stringent or there's something I am missing?

    [email protected]:/media/fazal/backup/BCL11A/FastQ_Files$ bowtie -m 1 -S --chunkmbs 10000 /media/fazal/backup/BCL11A/Bowtie_Indices/human_g1k_v37.fasta -1 18418_2#1_1.fastq -2 18418_2#1_2.fastq > /media/fazal/backup/BCL11A/Sam_from_FastQ/18418_2#1.sam
    # reads processed: 40095362
    # reads with at least one reported alignment: 736 (0.00%)
    # reads that failed to align: 40094535 (100.00%)
    # reads with alignments suppressed due to -m: 91 (0.00%)
    Reported 736 paired-end alignments to 1 output stream(s)


    Any help will be highly appreciated.

    Thank you very much!
    Fazal

  • #2
    Could be a number of issues. Did you run your data through FastQC or another QC program? Did you remove adapters/quality trim?

    PS: You probably shouldn't have #s in your file names...

    Comment


    • #3
      Hi Fanli,
      Thanks for the reply.
      I did run FastQC. It doesn't look like there's adapter content. The only thing that FastQC marks is per sequence GC content and kmer content. kmer content is marked with cross (x) whereas per sequence gc content is marked (!).

      Comment


      • #4
        Hmm, that's a bit odd. Maybe you can try using BLAT or BLAST with a random subsample of your reads to see if they hit anything?

        Can you post your FastQC output here?

        Comment


        • #5
          Hi Fanli,
          I have attached the FastQC output for one of the paired-end files. I blated the first sequences from the paired-end files(1/1, 1/2): and they hit different chromosomes. I don't know if that's normal.
          Attached Files
          Last edited by fh331; 04-05-2016, 08:05 AM.

          Comment


          • #6
            Try with --trim3 50 in case the 3' ends of your reads are of low quality or contains N:s. Bowtie1 seems like an odd choice for PE75 reads but it should give you some alignments...

            Comment


            • #7
              Originally posted by fh331 View Post
              Hi Fanli,
              I blated the first sequences from the paired-end files(1/1, 1/2): and they hit different chromosomes. I don't know if that's normal.
              But they hit the human reference?

              Also, your quality scores look...remarkably even. Can anyone else chime in? Is that an encoding error or have you guys seen sequencing like that before?

              Comment


              • #8
                Hi Fanli,
                They sequences do hit human reference genome. GRCh37 which is what i am trying to align to. Don't know what's going on

                Comment


                • #9
                  Originally posted by Chipper View Post
                  Try with --trim3 50 in case the 3' ends of your reads are of low quality or contains N:s. Bowtie1 seems like an odd choice for PE75 reads but it should give you some alignments...
                  Hi Chippper,
                  thanks for the reply. would you recommend using bowtie2 instead of bowtie? In that case, if i am not wrong, i would need to index the reference genome with bowtie2 right?

                  Comment


                  • #10
                    I don't remember exactly how many mismatches bowtie1 tolerates, it should work but you may have to change some settings if you have lots of mismatches at the end, hence my suggestion to try mapping with only part of the sequence (-3 50 gives the first 25 bases, -3 25 -5 25 the middle part etc).

                    Is this standard ChIP seq samples or could it be some kind of inline barcodes that makes it unmappable? Maybe you could post a few reads.

                    Comment


                    • #11
                      Are you sure the reads in file 1 and file 2 are in the same order? Just print the first 5 reads of each file...

                      You can also try to map only one file (not considered as paired-end then) to see the percentage of mapped reads...

                      Comment


                      • #12
                        I had got CRAM files from the sequencing facility and i thought that CRAM contains aligned data so i assume they had removed the barcodes. I wanted to go back and converted the cram to fastq cause i had some trouble analysing the data that way. Below are first 10 reads from the two PE files:

                        file 1/1:

                        @HS32_18418:2:2307:11553:47098#1/1
                        CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
                        +
                        BBBBBBF/FFFFFFFFFFFFFFFFFFFFFFFFFFFBBFFFB</<FFFF//FFFFFFFBBF<FFFFFFBF/B<FF/
                        @HS32_18418:2:1201:20716:93279#1/1
                        AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAAC
                        +
                        /<FFF</<FBF<</FFFBF<FFFFBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
                        @HS32_18418:2:1102:8324:84406#1/1
                        CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCCTA
                        +
                        BBBB</<<FFFFFBBF<<BFFBB/F/<<<<F/FFFFFBBB<F//B/F/<FBFFB//</BFFBB/</////</7/<
                        @HS32_18418:2:2304:9612:31489#1/1
                        AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
                        +
                        /<FFF/<<FFF<//FFFB<<FFFFB/FFFBB<FFFFF<FFFFBFFFFFFFFFFFFFFFFFFFFFFFFBFBBBBBB
                        @HS32_18418:2:2109:18196:73431#1/1
                        CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
                        +
                        BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBB/FFF<F<BFFFF/BFF/B
                        @HS32_18418:2:2304:19412:56725#1/1
                        CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGCCCTAGCCCTAGCCCTAGCCCTA
                        +
                        FFFFFFFFFFFFFF/<FFFBF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
                        @HS32_18418:2:1109:17555:23909#1/1
                        CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
                        +
                        BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBFFF/F/FBB/
                        @HS32_18418:2:1312:9262:20826#1/1
                        TAAAACCCTAACCCTAAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCC
                        +
                        </</<FFF/FFFFBF</BFFFFB<<FFFB<FFFFFFFFFFF</FFFFFFFFB<<FFFFF<FFFFFFFFFFBBBBB
                        @HS32_18418:2:1312:15929:23212#1/1
                        AACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTAACCCAACCCTAACCCTAACCCTCACCCTCACC
                        +
                        </FFFFFBFFFFB<FFFFBBFFFBF<FFBFB/FFFFFBFFFFF/FFF<FFFFFFFFFFFFFFFFFFFFFFBBBBB
                        @HS32_18418:2:2309:4779:65172#1/1
                        CCTAACCCTCACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCTAACCCTAACC
                        +
                        BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFF<F<//F/F/FFF<//<BF<F///FB


                        file 1/2:

                        @HS32_18418:2:2307:11553:47098#1/2
                        TAACCCTACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTAACCCAAC
                        +
                        </FBB<///F<BF/B/F<FB<<FB<B<FFF/BFFBFBBB<FFB<FFFFBFFFFFF<FFFF/FFBFFF<FFBBBBB
                        @HS32_18418:2:1201:20716:93279#1/2
                        ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
                        +
                        BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFF//FFB
                        @HS32_18418:2:1102:8324:84406#1/2
                        ACCCTAATCCTATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCC
                        +
                        //FB<///<<///FB//<<</<//F<FF/F<<//<</B/<FF</<FFFFFFFF</F</F</<F<BFF/FFB<<</
                        @HS32_18418:2:2304:9612:31489#1/2
                        ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCTAACCC
                        +
                        BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFBF<FFF///<<//<//<
                        @HS32_18418:2:2109:18196:73431#1/2
                        CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCAACCCTAACCCTAACCCTAACCCTAA
                        +
                        F<B<<FF</</FFFF</FFFF<<FBFFB<FBB/B<FFFB<FFFFFBFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
                        @HS32_18418:2:2304:19412:56725#1/2
                        CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAA
                        +
                        BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FF//</FF/<FFFFF/FFFFFFF
                        @HS32_18418:2:1109:17555:23909#1/2
                        TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTACCCTAACCCTAACCCTAACCCTAAC
                        +
                        /</FFFBF/FFBB<<FFFFF<FFFFF/FFFBF/FFF<</FFBFF<FFB<FFFFFFFFFFFFFFFFFFFFFBBBBB
                        @HS32_18418:2:1312:9262:20826#1/2
                        CCTAACCCTAACCCTAACCCTAACCCTAACCCTTAACCCTCACCCTCACCCTCCCCCTCACCCTAACCCTAACCC
                        +
                        BBBBBFBFFFFFFFFFFFFFFFFFFFFFBBF/<<F/</</BBF/<<BFFBF/</<FBFFFBFFFF<FFBF//BFF
                        @HS32_18418:2:1312:15929:23212#1/2
                        CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
                        +
                        BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFF<FFFBFFF<F/
                        @HS32_18418:2:2309:4779:65172#1/2
                        CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAAACCCTAAACC
                        +
                        FFB<</BFF<//FFF/</FFFFB<FFBFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB


                        it looks dodgy to me as all the reads seem to be repetitive sequences

                        Comment


                        • #13
                          Originally posted by Chipper View Post
                          I don't remember exactly how many mismatches bowtie1 tolerates, it should work but you may have to change some settings if you have lots of mismatches at the end, hence my suggestion to try mapping with only part of the sequence (-3 50 gives the first 25 bases, -3 25 -5 25 the middle part etc).

                          Is this standard ChIP seq samples or could it be some kind of inline barcodes that makes it unmappable? Maybe you could post a few reads.
                          I had got CRAM files from the sequencing facility and i thought that CRAM contains aligned data so i assume they had removed the barcodes. I wanted to go back and converted the cram to fastq cause i had some trouble analysing the data that way. Below are first 10 reads from the two PE files:

                          file 1/1:

                          @HS32_18418:2:2307:11553:47098#1/1
                          CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
                          +
                          BBBBBBF/FFFFFFFFFFFFFFFFFFFFFFFFFFFBBFFFB</<FFFF//FFFFFFFBBF<FFFFFFBF/B<FF/
                          @HS32_18418:2:1201:20716:93279#1/1
                          AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAAC
                          +
                          /<FFF</<FBF<</FFFBF<FFFFBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
                          @HS32_18418:2:1102:8324:84406#1/1
                          CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCCTA
                          +
                          BBBB</<<FFFFFBBF<<BFFBB/F/<<<<F/FFFFFBBB<F//B/F/<FBFFB//</BFFBB/</////</7/<
                          @HS32_18418:2:2304:9612:31489#1/1
                          AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
                          +
                          /<FFF/<<FFF<//FFFB<<FFFFB/FFFBB<FFFFF<FFFFBFFFFFFFFFFFFFFFFFFFFFFFFBFBBBBBB
                          @HS32_18418:2:2109:18196:73431#1/1
                          CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
                          +
                          BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBB/FFF<F<BFFFF/BFF/B
                          @HS32_18418:2:2304:19412:56725#1/1
                          CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAGCCCTAGCCCTAGCCCTAGCCCTA
                          +
                          FFFFFFFFFFFFFF/<FFFBF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
                          @HS32_18418:2:1109:17555:23909#1/1
                          CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
                          +
                          BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBFFF/F/FBB/
                          @HS32_18418:2:1312:9262:20826#1/1
                          TAAAACCCTAACCCTAAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCC
                          +
                          </</<FFF/FFFFBF</BFFFFB<<FFFB<FFFFFFFFFFF</FFFFFFFFB<<FFFFF<FFFFFFFFFFBBBBB
                          @HS32_18418:2:1312:15929:23212#1/1
                          AACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTAACCCAACCCTAACCCTAACCCTCACCCTCACC
                          +
                          </FFFFFBFFFFB<FFFFBBFFFBF<FFBFB/FFFFFBFFFFF/FFF<FFFFFFFFFFFFFFFFFFFFFFBBBBB
                          @HS32_18418:2:2309:4779:65172#1/1
                          CCTAACCCTCACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCTAACCCTAACC
                          +
                          BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFF<F<//F/F/FFF<//<BF<F///FB


                          file 1/2:

                          @HS32_18418:2:2307:11553:47098#1/2
                          TAACCCTACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTAACCCAAC
                          +
                          </FBB<///F<BF/B/F<FB<<FB<B<FFF/BFFBFBBB<FFB<FFFFBFFFFFF<FFFF/FFBFFF<FFBBBBB
                          @HS32_18418:2:1201:20716:93279#1/2
                          ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
                          +
                          BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFF//FFB
                          @HS32_18418:2:1102:8324:84406#1/2
                          ACCCTAATCCTATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCC
                          +
                          //FB<///<<///FB//<<</<//F<FF/F<<//<</B/<FF</<FFFFFFFF</F</F</<F<BFF/FFB<<</
                          @HS32_18418:2:2304:9612:31489#1/2
                          ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAAACCTAACCC
                          +
                          BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFBF<FFF///<<//<//<
                          @HS32_18418:2:2109:18196:73431#1/2
                          CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTACCCTAACCCAACCCTAACCCTAACCCTAACCCTAA
                          +
                          F<B<<FF</</FFFF</FFFF<<FBFFB<FBB/B<FFFB<FFFFFBFFFFFFFFFFFFFFFFFFFFFFFFBBBBB
                          @HS32_18418:2:2304:19412:56725#1/2
                          CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAA
                          +
                          BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FF//</FF/<FFFFF/FFFFFFF
                          @HS32_18418:2:1109:17555:23909#1/2
                          TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTTACCCTAACCCTACCCTAACCCTAACCCTAACCCTAAC
                          +
                          /</FFFBF/FFBB<<FFFFF<FFFFF/FFFBF/FFF<</FFBFF<FFB<FFFFFFFFFFFFFFFFFFFFFBBBBB
                          @HS32_18418:2:1312:9262:20826#1/2
                          CCTAACCCTAACCCTAACCCTAACCCTAACCCTTAACCCTCACCCTCACCCTCCCCCTCACCCTAACCCTAACCC
                          +
                          BBBBBFBFFFFFFFFFFFFFFFFFFFFFBBF/<<F/</</BBF/<<BFFBF/</<FBFFFBFFFF<FFBF//BFF
                          @HS32_18418:2:1312:15929:23212#1/2
                          CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
                          +
                          BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFF/FFFFFFF<FFFBFFF<F/
                          @HS32_18418:2:2309:4779:65172#1/2
                          CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAAACCCTAAACC
                          +
                          FFB<</BFF<//FFF/</FFFFB<FFBFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBBB


                          it looks dodgy to me as all the reads seem to be repetitive sequences

                          Comment


                          • #14
                            Looking at your reads, it seems you had a problem when you extracted them from the CRAm files. They look exactly similar with a shift of 3-4 bp...

                            Comment


                            • #15
                              These sequences (CCCTAA) are from the telomere repeat. CRAMs are sorted files so, depending on the reference used for alignment, it's not unusual to see all of these reads at the beginning of the file.

                              What IS unusual is that both reads are from the same strand, when one should be the reverse complement of the other (e.g., TTAGGG). Not sure how that happened, but it's likely to be the reason why realignment failed.

                              Comment

                              Working...
                              X