Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DRAT
    replied
    Hi hathiram2,

    I recently had the same problem after downloading mm10 sequences/annotation files from the tophat website. Check the content of the genome.fa file because for me it was empty and I had to download it separately from the NCBI website. After replacing genome.fa file things work fine.

    Leave a comment:


  • jernest1
    replied
    Is the gtf_to_fasta executable in your PATH?
    enter "echo $PATH" and make sure that gtf_to_fasta program (in whatever tophat folder was created when you unpacked the tar.gz file) is in one of those directories. $HOME/bin is a convenient directory for this.

    Something else I have to do on my computer system (sun grid engine HPC cluster) is use qsub to submit jobs. The system has trouble finding all the software, so I submit jobs using "qsub -v PATH job.sge".

    Maybe that will help someone.

    Leave a comment:


  • GenoMax
    replied
    Originally posted by varshacp View Post
    HI

    I checked the log file and besides the run.log which I posted earlier I get the following error in the g2f.log file

    terminate called after throwing an instance of 'std:ut_of_range'
    what(): basic_string::substr


    Help me to understand this

    Thank you
    Varsha
    The problem almost certainly seems to be the gff file (since all other index files appear to be there and I assume are non-zero byte).

    Can you try to verify your GFF file using one of these: http://genometools.org/cgi-bin/gff3validator.cgi or http://modencode.oicr.on.ca/cgi-bin/...te_gff3_online

    Leave a comment:


  • varshacp
    replied
    Hi

    I tried to run tohat using different gff file and genome sequence file and getting the same error

    tophat -p 2 -G ca.gff3 -o cp05_thout1 caref_chr_pltd_unplaced1 cp05_ctl1.fastq

    [2014-05-05 10:39:33] Beginning TopHat run (v2.0.9)
    -----------------------------------------------
    [2014-05-05 10:39:33] Checking for Bowtie
    Bowtie version: 2.1.0.0
    [2014-05-05 10:39:33] Checking for Samtools
    Samtools version: 0.1.19.0
    [2014-05-05 10:39:33] Checking for Bowtie index files (genome)..
    [2014-05-05 10:39:33] Checking for reference FASTA file
    [2014-05-05 10:39:33] Generating SAM header for caref_chr_pltd_unplaced1
    format: fastq
    quality scale: phred33 (default)
    [2014-05-05 10:39:34] Reading known junctions from GTF file
    [2014-05-05 10:39:39] Preparing reads
    left reads: min. length=12, max. length=347, 18834597 kept reads (58333 discarded)
    Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
    [2014-05-05 10:45:43] Building transcriptome data files..
    [FAILED]
    Error: gtf_to_fasta returned an error.

    Thankx for your help

    Varsha

    Leave a comment:


  • varshacp
    replied
    HI

    I checked the log file and besides the run.log which I posted earlier I get the following error in the g2f.log file

    terminate called after throwing an instance of 'std:ut_of_range'
    what(): basic_string::substr


    Help me to understand this

    Thank you
    Varsha

    Leave a comment:


  • varshacp
    replied
    Hi GenoMax

    The following is the list of file in the directory from which I am running the tophat command

    caref_ncbiall.fa (genome sequence file)
    caref_ncbiall.1.bt2 (bowtie index files)
    caref_ncbiall.2.bt2
    caref_ncbiall.3.bt2
    caref_ncbiall.4.bt2
    caref_ncbiall.rev.1.bt2
    caref_ncbiall.rev.2.bt2
    cp04.fastq (reads files)
    caref_seq.gff (genome annotation file)

    Thank you

    Kind regards


    Varsha

    Leave a comment:


  • GenoMax
    replied
    Varsha: Without seeing a listing of the files (related to this error, e.g. caref_ncbiall) in the directory you are running this from there is not much further help I can offer.

    Leave a comment:


  • varshacp
    replied
    HI GenoMax

    I am still getting the same error.


    Thank you

    Leave a comment:


  • GenoMax
    replied
    Are things working now? Or are you still seeing an error?

    Leave a comment:


  • varshacp
    replied
    Hi Genomax

    The index is also build using the same genome sequence file in the same directory

    Leave a comment:


  • GenoMax
    replied
    Were you able to get tophat working?

    Leave a comment:


  • varshacp
    replied
    Originally posted by GenoMax View Post
    TopHat is picky about the order of options on the command line. Can you try the following:

    Code:
    $ tophat -o cp04_thout5 -p 2 -G caref_seq.gff caref_ncbiall cp04.fastq
    Let me also verify that the basename for your genome index files is "caref_ncbiall", that is there are several files (that comprise of the index) that have that prefix?

    Hi

    The basename is caref_ncbiall for the index files

    Leave a comment:


  • GenoMax
    replied
    Can you post a listing of the files in this directory?

    Also see my previous post about the order of options. If the genome index is correctly created then give that command line a try.

    Leave a comment:


  • varshacp
    replied
    HI GenoMAx

    The genome index was created using the same fasta file and is in the same directory

    Thankx

    Varsha

    Leave a comment:


  • GenoMax
    replied
    Originally posted by varshacp View Post
    Hi

    I forgot to mention that the fastq file is also in same directory. The gff file and genome sequences were downloaded from NCBI (NCBI has separate fasta file for each chromosomes and all the unplaced scafolds are in one fasta file. I concatenated these files to make the genome file and renamed it as per the gff file. The same gff file is working with other genome sequence file which does not have unplaced sequences.


    Thank you

    Varsha
    Does it mean that you have not created the "index" files for this combined fasta reference file? You will need to index the reference in order to use tophat. You can build the reference index using the directions here: http://tophat.cbcb.umd.edu/tutorial.shtml#ref

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Quality Control Essentials for Next-Generation Sequencing Workflows
    by seqadmin




    Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

    Nucleic Acid Quality Control
    Preparing for NGS starts with isolating the...
    02-10-2025, 01:58 PM
  • seqadmin
    An Introduction to the Technologies Transforming Precision Medicine
    by seqadmin


    In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
    01-27-2025, 07:46 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 02-07-2025, 09:30 AM
0 responses
72 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-05-2025, 10:34 AM
0 responses
113 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-03-2025, 09:07 AM
0 responses
87 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-31-2025, 08:31 AM
0 responses
48 views
0 likes
Last Post seqadmin  
Working...
X