Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ABI-SOLID data with Bowtie-0.12.7 and TopHat-1.1.2

    Hi guys.
    I successfully ran Solid data (50 bp reads) with current bowtie and tophat versions. But surprisingly it gave the following stats. especially only 21% aligning results!. Any one have any idea why aligning failed miserably ?

    The command I used

    Code:
    tophat -o RHE014_Tophat_Output -C --library-type fr-secondstrand --segment-length 50 /home/choijk/Software/bowtie-0.12.7/indexes/hg18_c SL001_R00089_RHE014_01pgx2_F3_3_1
    Following is the tophat log file description

    Code:
    $ cat filezqaMDd.log 
    # reads processed: 97340093
    # reads with at least one reported alignment: 20468407 (21.03%)
    # reads that failed to align: 76790363 (78.89%)
    # reads with alignments suppressed due to -m: 81323 (0.08%)
    Reported 34579945 alignments to 1 output stream(s)

  • #2
    if you can maybe try using Bioscope/mapreads for the mapping.
    it improves the odds.
    ~50% is expected for Whole transcriptome.
    So you are getting lower than average..
    http://kevin-gattaca.blogspot.com/

    Comment


    • #3
      Hmmm I already have results with Bioscope but I want to run these by using Tophat with out giving any known refseq annotation in order to find new transcripts.

      Anyone suggestions on Tophat/bowtie

      Comment


      • #4
        I see. How's the mapping % with bioscope then?
        http://kevin-gattaca.blogspot.com/

        Comment


        • #5
          is around ~ 80 %

          Comment


          • #6
            Can you try using fr-firststrand?

            Comment


            • #7
              but fr-secondstarand is meant for SOLID data right ?

              Comment


              • #8
                In general yes, but it depends on the protocol you used.

                Comment


                • #9
                  if you're using 50b reads, try trimming them to 35b in colorspace (i.e., use only the first 35 bases). Most of the -1's (no-calls) and sequencing errors occur between bases 36 and 50.

                  We've DOUBLED our mapping using bioscope this way (from 25 million read to 50million reads).

                  There is a separate thread on this topic, but I'd be happy to hear back from you if you try this.

                  Let me know if you'd like to peek at our source code for trimming in colorspace.

                  Comment


                  • #10
                    I'm curious to know if people are quoting the mapping percent as identified in the alignment.txt report or if they are calculating it based on total mapped reads as a percent of total reads. The Bioscope alignment report shows mapped reads as 100% then all figures others based on this.

                    I've been reporting total mapped reads, unique aligned reads, ribosomal and unmapped reads as a percent of total reads. Total mapped is usually 60-70%.

                    With the trimming to 35bp, is it necessary since Bioscope is doing a seed and extend on the reads. I find my average aligned read length to be ~40bp with a size frequency plot being bimodal at 25bp and 50bp.

                    Comment


                    • #11
                      @bacdirector: Yes I would be happy to do that. COuld you please provide me the code. My SOLID .csfasta data format looks like this
                      # Wed Mar 10 00:25:29 2010 /share/apps/corona/bin/filter_fasta.pl --output=/data/results/sl001/SL001_R00089/RHE012_01pgx2/results.F1B1/primary.20100310065316729 --name=SL001_R00089_RHE012_01pgx2 --tag=F3 --minlength=50 --mincalls=25 --prefix=T /data/results/sl001/SL001_R00089/RHE012_01pgx2/jobs/postPrimerSetPrimary.2745/rawseq
                      # Cwd: /home/pipeline
                      # Title: SL001_R00089_RHE012_01pgx2
                      >853_10_97_F3
                      T.....023..2.10..120.3.2010.031...2.1.30.22001..00.
                      >853_10_111_F3
                      T.....113....33..003.0.010..100...2.0..2.03002..02.
                      >853_10_157_F3
                      T.....230..2.00..330.2.1313.231...3.1.10.02031..10.
                      >853_10_194_F3
                      T.....031....32..323.0.322..100...3.0..1.30313..23.
                      >853_10_221_F3

                      Comment


                      • #12
                        Problem with TopHat and ABI SOLiD

                        I don't know if you are aware but the current version TopHat is using different algorithm than was described in TopHat paper from 2009. The current algorithm is described in supplement to Cufflinks paper (Trapnell 2010).

                        The most important change is that read is split to segments, and discovering of the splice junctions is based where these segments aligned. New TopHat is optimsed for >=75bp reads, in this case each read is divided to 3 segments each 25bp.

                        You run TopHat with setting the segment length to 50 (--segment-length 50), which means that there will be just ONE segment, thus such setting cannot discover any splice junction.

                        And here is the problem with current TopHat, it seems to be NOT designed for ABI SOLiD reads. You have two options for 50bp reads:
                        - use default settings, and be aware that not all splice junctions will be discovered
                        - set --segment-length 16, but then 16bp segments will align everywhere and in many cases will be discarded, so again many splice junctions will not be discovered and many false positives will be found

                        For 36bp reads situations is even worse.
                        Old versions of TopHat supported such short reads but didn't support color space reads

                        In my opinion, so far, there is no proper software for discovering new transcripts, or even assembling properly existing ones, for ABI SOLiD data. If you know any please let me know.
                        Pawel Labaj

                        Comment


                        • #13
                          @plabaj:
                          Very True!! My experience with tophat has been disaster so far.. While bowtie gives way more alignments; using tophat for single end reads, it reports only 24% alignment. My reads size is 50 bases. The junctions.bed file produced is extremely unreliable since it reports only 851 junctions in < 10% of the scaffolds. The remaining is unreported! On the top of it, most of the junctions are overlapping. Is there a possible reason where it gets things wrong?

                          Comment


                          • #14
                            @tsucheta

                            As I wrote before.
                            New algorithm for finding splice junctions implemented in TopHat is responsible for that.
                            It starts working properly for reads of length 75bp.

                            If your reads are not ABI SOLiD try to install older version of TopHat (you have to find out which one analysing the versions changes).
                            Maybe it will help.

                            Or if you are interested just in transcripts expresion use BowTie against transcripts sequences (not genomic sequence like for TopHat)
                            Pawel Labaj

                            Comment


                            • #15
                              Originally posted by tsucheta View Post
                              @plabaj:
                              Very True!! My experience with tophat has been disaster so far.. While bowtie gives way more alignments; using tophat for single end reads, it reports only 24% alignment. My reads size is 50 bases. The junctions.bed file produced is extremely unreliable since it reports only 851 junctions in < 10% of the scaffolds. The remaining is unreported! On the top of it, most of the junctions are overlapping. Is there a possible reason where it gets things wrong?
                              What's your segment size?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X