Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

ABI-SOLID data with Bowtie-0.12.7 and TopHat-1.1.2

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ABI-SOLID data with Bowtie-0.12.7 and TopHat-1.1.2

    Hi guys.
    I successfully ran Solid data (50 bp reads) with current bowtie and tophat versions. But surprisingly it gave the following stats. especially only 21% aligning results!. Any one have any idea why aligning failed miserably ?

    The command I used

    Code:
    tophat -o RHE014_Tophat_Output -C --library-type fr-secondstrand --segment-length 50 /home/choijk/Software/bowtie-0.12.7/indexes/hg18_c SL001_R00089_RHE014_01pgx2_F3_3_1
    Following is the tophat log file description

    Code:
    $ cat filezqaMDd.log 
    # reads processed: 97340093
    # reads with at least one reported alignment: 20468407 (21.03%)
    # reads that failed to align: 76790363 (78.89%)
    # reads with alignments suppressed due to -m: 81323 (0.08%)
    Reported 34579945 alignments to 1 output stream(s)

  • #2
    if you can maybe try using Bioscope/mapreads for the mapping.
    it improves the odds.
    ~50% is expected for Whole transcriptome.
    So you are getting lower than average..
    http://kevin-gattaca.blogspot.com/

    Comment


    • #3
      Hmmm I already have results with Bioscope but I want to run these by using Tophat with out giving any known refseq annotation in order to find new transcripts.

      Anyone suggestions on Tophat/bowtie

      Comment


      • #4
        I see. How's the mapping % with bioscope then?
        http://kevin-gattaca.blogspot.com/

        Comment


        • #5
          is around ~ 80 %

          Comment


          • #6
            Can you try using fr-firststrand?

            Comment


            • #7
              but fr-secondstarand is meant for SOLID data right ?

              Comment


              • #8
                In general yes, but it depends on the protocol you used.

                Comment


                • #9
                  if you're using 50b reads, try trimming them to 35b in colorspace (i.e., use only the first 35 bases). Most of the -1's (no-calls) and sequencing errors occur between bases 36 and 50.

                  We've DOUBLED our mapping using bioscope this way (from 25 million read to 50million reads).

                  There is a separate thread on this topic, but I'd be happy to hear back from you if you try this.

                  Let me know if you'd like to peek at our source code for trimming in colorspace.

                  Comment


                  • #10
                    I'm curious to know if people are quoting the mapping percent as identified in the alignment.txt report or if they are calculating it based on total mapped reads as a percent of total reads. The Bioscope alignment report shows mapped reads as 100% then all figures others based on this.

                    I've been reporting total mapped reads, unique aligned reads, ribosomal and unmapped reads as a percent of total reads. Total mapped is usually 60-70%.

                    With the trimming to 35bp, is it necessary since Bioscope is doing a seed and extend on the reads. I find my average aligned read length to be ~40bp with a size frequency plot being bimodal at 25bp and 50bp.

                    Comment


                    • #11
                      @bacdirector: Yes I would be happy to do that. COuld you please provide me the code. My SOLID .csfasta data format looks like this
                      # Wed Mar 10 00:25:29 2010 /share/apps/corona/bin/filter_fasta.pl --output=/data/results/sl001/SL001_R00089/RHE012_01pgx2/results.F1B1/primary.20100310065316729 --name=SL001_R00089_RHE012_01pgx2 --tag=F3 --minlength=50 --mincalls=25 --prefix=T /data/results/sl001/SL001_R00089/RHE012_01pgx2/jobs/postPrimerSetPrimary.2745/rawseq
                      # Cwd: /home/pipeline
                      # Title: SL001_R00089_RHE012_01pgx2
                      >853_10_97_F3
                      T.....023..2.10..120.3.2010.031...2.1.30.22001..00.
                      >853_10_111_F3
                      T.....113....33..003.0.010..100...2.0..2.03002..02.
                      >853_10_157_F3
                      T.....230..2.00..330.2.1313.231...3.1.10.02031..10.
                      >853_10_194_F3
                      T.....031....32..323.0.322..100...3.0..1.30313..23.
                      >853_10_221_F3

                      Comment


                      • #12
                        Problem with TopHat and ABI SOLiD

                        I don't know if you are aware but the current version TopHat is using different algorithm than was described in TopHat paper from 2009. The current algorithm is described in supplement to Cufflinks paper (Trapnell 2010).

                        The most important change is that read is split to segments, and discovering of the splice junctions is based where these segments aligned. New TopHat is optimsed for >=75bp reads, in this case each read is divided to 3 segments each 25bp.

                        You run TopHat with setting the segment length to 50 (--segment-length 50), which means that there will be just ONE segment, thus such setting cannot discover any splice junction.

                        And here is the problem with current TopHat, it seems to be NOT designed for ABI SOLiD reads. You have two options for 50bp reads:
                        - use default settings, and be aware that not all splice junctions will be discovered
                        - set --segment-length 16, but then 16bp segments will align everywhere and in many cases will be discarded, so again many splice junctions will not be discovered and many false positives will be found

                        For 36bp reads situations is even worse.
                        Old versions of TopHat supported such short reads but didn't support color space reads

                        In my opinion, so far, there is no proper software for discovering new transcripts, or even assembling properly existing ones, for ABI SOLiD data. If you know any please let me know.
                        Pawel Labaj

                        Comment


                        • #13
                          @plabaj:
                          Very True!! My experience with tophat has been disaster so far.. While bowtie gives way more alignments; using tophat for single end reads, it reports only 24% alignment. My reads size is 50 bases. The junctions.bed file produced is extremely unreliable since it reports only 851 junctions in < 10% of the scaffolds. The remaining is unreported! On the top of it, most of the junctions are overlapping. Is there a possible reason where it gets things wrong?

                          Comment


                          • #14
                            @tsucheta

                            As I wrote before.
                            New algorithm for finding splice junctions implemented in TopHat is responsible for that.
                            It starts working properly for reads of length 75bp.

                            If your reads are not ABI SOLiD try to install older version of TopHat (you have to find out which one analysing the versions changes).
                            Maybe it will help.

                            Or if you are interested just in transcripts expresion use BowTie against transcripts sequences (not genomic sequence like for TopHat)
                            Pawel Labaj

                            Comment


                            • #15
                              Originally posted by tsucheta View Post
                              @plabaj:
                              Very True!! My experience with tophat has been disaster so far.. While bowtie gives way more alignments; using tophat for single end reads, it reports only 24% alignment. My reads size is 50 bases. The junctions.bed file produced is extremely unreliable since it reports only 851 junctions in < 10% of the scaffolds. The remaining is unreported! On the top of it, most of the junctions are overlapping. Is there a possible reason where it gets things wrong?
                              What's your segment size?

                              Comment

                              Working...
                              X