Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • DerSeb
    Member
    • Oct 2009
    • 44

    Tophat SOLID PE data not mapped in pairs

    Hello,

    this is a problem I have been dealing with now for a long time. I try to map SOLID PE reads using TopHat v 1.1.4 using this command:

    Code:
    tophat --color --quals -r 155 -p 4 -o /data/genetics/ -G /home/schaefer/tophat/Rattus_norvegicus.RGSC3.4.56.tophat.gtf rn4_c /data/genetics/1sort_1.csfasta,/data/genetics/1sort_2.csfasta /data/genetics/analyses/1sort_1.qual,/data/genetics/1sort_2.qual
    All 4 input files are sorted and the reads are named like this:

    Code:
    1sort_1.csfasta
    >1000_1000_1070/1
    T20010011233323011122110001120331100111211132111111
    >1000_1000_1179/1
    T31033102020100001322200230302001331230303321323200
    >1000_1000_1947/1
    T03313223130010310121222000210123010111002233323320
    1sort_2.csfasta
    >1000_1000_1070/2
    G1011001131233111011133111
    >1000_1000_1179/2
    G2222132322022032303223131
    >1000_1000_1947/2
    G0100011002000120101320013
    1sort_1.qual
    >1000_1000_1070/1
    32 24 32 29 22 6 11 29 10 25 14 20 31 20 14 33 32 26 6 5 6 6 32 22 8 33 10 28 21 14 8 8 23 25 14 22 19 26 10 8 14 17 23 23 32 4 32 24 8 32
    >1000_1000_1179/1
    31 33 32 31 33 27 33 33 31 33 33 27 32 32 33 33 32 33 31 32 32 31 33 28 32 33 27 30 31 32 33 22 31 32 29 26 31 22 29 24 32 33 20 16 33 25 33 12 31 29
    >1000_1000_1947/1
    33 33 33 30 33 33 28 33 31 33 26 32 26 24 33 29 33 31 20 8 33 33 30 33 29 31 32 33 29 31 29 32 24 14 17 33 30 28 27 27 33 31 28 33 33 30 31 8 24 12
    1sort_2.qual
    >1000_1000_1070/2
    32 25 33 33 8 31 11 10 6 32 33 33 6 8 8 32 28 22 30 20 32 8 12 10 14
    >1000_1000_1179/2
    30 29 30 33 25 16 29 33 33 26 31 30 23 29 30 25 32 31 25 10 27 30 24 31 29
    >1000_1000_1947/2
    33 33 32 27 32 33 33 23 24 32 33 33 19 16 30 31 27 22 14 31 19 17 20 6 29
    >1000_1000_254/2
    15 16 18 25 31 30 8 31 8 4 10 20 28 32 31 30 4 31 31 12 10 4 17 19 9
    >1000_1000_358/2
    5 32 12 11 7 8 18 9 7 8 4 27 4 6 4 4 4 4 5 6 4 21 4 4 4
    I have also tried using identical names for the reads (without /1 and /2).

    Everything seems to work ok:

    Code:
    [Mon Dec 13 11:52:28 2010] Beginning TopHat run (v1.1.4)
    -----------------------------------------------
    [Mon Dec 13 11:52:29 2010] Preparing output location /data/genetics//
    [Mon Dec 13 11:52:29 2010] Checking for Bowtie index files
    [Mon Dec 13 11:52:29 2010] Checking for reference FASTA file
    [Mon Dec 13 11:52:29 2010] Checking for Bowtie
    	Bowtie version:			 0.12.3.0
    [Mon Dec 13 11:52:29 2010] Checking for Samtools
    	Samtools version:		 0.1.8.0
    [Mon Dec 13 11:52:36 2010] Checking reads
    	min read length: 25bp, max read length: 50bp
    	format:		 fasta
    [Mon Dec 13 11:58:39 2010] Reading known junctions from GTF file
    [Mon Dec 13 12:13:59 2010] Mapping reads against rn4_c with Bowtie
    [Mon Dec 13 15:08:47 2010] Joining segment hits
    [Mon Dec 13 15:35:40 2010] Mapping reads against rn4_c with Bowtie(1/2)
    [Mon Dec 13 17:16:18 2010] Mapping reads against rn4_c with Bowtie(2/2)
    [Mon Dec 13 18:29:19 2010] Searching for junctions via segment mapping
    [Mon Dec 13 20:05:16 2010] Retrieving sequences for splices
    [Mon Dec 13 20:07:55 2010] Indexing splices
    Warning: Encountered reference sequence with only gaps
    Warning: Encountered reference sequence with only gaps
    [Mon Dec 13 20:23:43 2010] Mapping reads against segment_juncs with Bowtie
    [Mon Dec 13 21:20:19 2010] Mapping reads against segment_juncs with Bowtie
    [Mon Dec 13 22:10:15 2010] Joining segment hits
    [Mon Dec 13 22:19:17 2010] Reporting output tracks
    -----------------------------------------------
    Run complete [10:55:26 elapsed]
    The problem arises when I look at the output: The reads are not mapped in pairs, but in single reads of 50 and 25 bp each.

    samtools view reveals:

    Code:
    33_119_59	16	chr1	226	1	25M	*	0	0	GATTTGAGTGCAGAGAGCTGTGTGA	!!#+))))!!)))))))))****FF	NM:i:0	NH:i:3	CC:Z:=	CP:i:70416
    478_1485_267	16	chr1	3060	0	25M	*	0	0	AGACATGGAGAAAGAACACTCCTCC	//6.(BI:<;07A<51!.JNJ::::	NM:i:0	NH:i:5	CC:Z:=	CP:i:73804
    980_603_1244	16	chr1	3884	0	50M	*	0	0	TGTATATGTATATATATATGCATATATATATATATATATATACATACGTA	!!!+/2&!!)-/+++))*"&&!)))**),41**)!!))!!++))))))))	NM:i:0	NH:i:16	CC:Z:=	CP:i:61895914
    All flags are either 16 or 0, which cannot be true for a paired read. Do you have any idea what is going on?

    All the best,
    Seb
  • fennan
    Member
    • Apr 2010
    • 19

    #2
    You are not saying to tophat that you want to use paired end data. You should remove the "," between the files to do so:

    Code:
    tophat --color --quals -r 155 -p 4 -o /data/genetics/ -G /home/schaefer/tophat/Rattus_norvegicus.RGSC3.4.56.tophat.gtf rn4_c /data/genetics/1sort_1.csfasta /data/genetics/1sort_2.csfasta /data/genetics/analyses/1sort_1.qual /data/genetics/1sort_2.qual

    Comment

    • DerSeb
      Member
      • Oct 2009
      • 44

      #3
      great, I tried it and it works!

      thx!

      Comment

      • hlwright
        Member
        • Feb 2011
        • 30

        #4
        TopHat PE data not mapped in pairs - new version of TopHat

        Could you please tell me if this command is still the same using the new version of TopHat (v1.3.1)?

        I ask because we are trying to map SOLiD paired-end RNA-seq reads to the human genome. We are getting 36% - 40% reads mapping and are looking for ways to improve our level of mapping. We also suspect TopHat is not treating our datasets as paired, as in the SAM output column 8 (PNEXT) the value is always 0.

        An example command is:
        Code:
        tophat -p 8 --color --quals  -r 140 --mate-std-dev 30 --library-type fr-secondstrand -G hg19mRNA.gtf /Path/to/Bowtie-0.12.7/index/hg19_c Pair_F3.csfasta,Pair_F5.csfasta Pair_F3.qual,Pair_F5.qual
        Any advice would be much appreciated.

        Thanks
        Helen

        Comment

        • jcgrenier
          Member
          • May 2011
          • 12

          #5
          Just look at the previous post. You have to remove the commas between the file names.

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM
          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          12 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          48 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          106 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          125 views
          0 reactions
          Last Post SEQadmin2  
          Working...