Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cyendrek
    Junior Member
    • Mar 2013
    • 1

    read counts without gff3 file

    I am working with a species that does not have a genome sequenced. There is a partial unigene/EST database that I have used to align some Illumina RNA-Seq reads with tophat2/bowtie2. My problem is with obtaining read counts from the .bam files. Normally, I use HT-Seq but in this case I don't have a gff3 file. Our core bioinformatics director said to use custom perl scripts but that is beyond my expertise. Can anyone help?
    Thanks,
    Craig
  • chadn737
    Senior Member
    • Jan 2009
    • 392

    #2
    Why not do a de novo assembly?

    Comment

    • jgibbons1
      Senior Member
      • Oct 2009
      • 135

      #3
      @cyendrek I've had the same issue before. Rather than using bowtie2 (which I typically use too) try using seqmap and rseq. Both programs are in the rseq package http://www-personal.umich.edu/~jianghui/rseq/

      You first need to map the reads to the reference (or in your case unigene/EST) using seqmap then you can generate RPKM and read count calculations per gene/EST using rseq. I've used this pipeline quite a bit in the past so let me know if you have any problems.

      Here's an example command line of seqmap allowing 2 mismatches:

      [user]$ seqmap 2 ReadFile.fasta Reference.fasta Output.seqmap /eland:3

      Here's an example command line of rseq assuming read length is 50 bp:

      [user]$ rseq comp_exp -r 50 Reference.fasta Output.seqmap

      This will create a file with the "comp_exp" extension that has the number of mapped reads, number of uniquely mapped reads and rpkm values (among other stats).

      A few words of wisdom, your reads must be in fasta format, so convert fastq to fasta (I use the fastxtoolkit for this). Also, seqmap uses ALOT of memory, so I usually break my read file up into batches of 5-10 million reads. I then mapped these independently against the reference, merge the output files then run rseq on the merged output.

      Good luck!

      Comment

      Latest Articles

      Collapse

      • GATTACAT
        Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by GATTACAT
        Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
        07-01-2026, 11:43 AM
      • SEQadmin2
        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by SEQadmin2


        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

        Here are nine questions we think about, in roughly the order they matter, before...
        06-18-2026, 07:11 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 07-02-2026, 11:08 AM
      0 responses
      12 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-30-2026, 05:37 AM
      0 responses
      14 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-26-2026, 11:10 AM
      0 responses
      20 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-17-2026, 06:09 AM
      0 responses
      54 views
      0 reactions
      Last Post SEQadmin2  
      Working...