Hi hathiram2,
I recently had the same problem after downloading mm10 sequences/annotation files from the tophat website. Check the content of the genome.fa file because for me it was empty and I had to download it separately from the NCBI website. After replacing genome.fa file things work fine.
Seqanswers Leaderboard Ad
No announcement yet.
Is the gtf_to_fasta executable in your PATH?
enter "echo $PATH" and make sure that gtf_to_fasta program (in whatever tophat folder was created when you unpacked the tar.gz file) is in one of those directories. $HOME/bin is a convenient directory for this.
Something else I have to do on my computer system (sun grid engine HPC cluster) is use qsub to submit jobs. The system has trouble finding all the software, so I submit jobs using "qsub -v PATH job.sge".
Maybe that will help someone.
Leave a comment:
Originally posted by varshacp View PostHI
I checked the log file and besides the run.log which I posted earlier I get the following error in the g2f.log file
terminate called after throwing an instance of 'std:ut_of_range'
what(): basic_string::substr
Help me to understand this
Thank you
Can you try to verify your GFF file using one of these: http://genometools.org/cgi-bin/gff3validator.cgi or http://modencode.oicr.on.ca/cgi-bin/...te_gff3_online
Leave a comment:
I tried to run tohat using different gff file and genome sequence file and getting the same error
tophat -p 2 -G ca.gff3 -o cp05_thout1 caref_chr_pltd_unplaced1 cp05_ctl1.fastq
[2014-05-05 10:39:33] Beginning TopHat run (v2.0.9)
[2014-05-05 10:39:33] Checking for Bowtie
Bowtie version:
[2014-05-05 10:39:33] Checking for Samtools
Samtools version:
[2014-05-05 10:39:33] Checking for Bowtie index files (genome)..
[2014-05-05 10:39:33] Checking for reference FASTA file
[2014-05-05 10:39:33] Generating SAM header for caref_chr_pltd_unplaced1
format: fastq
quality scale: phred33 (default)
[2014-05-05 10:39:34] Reading known junctions from GTF file
[2014-05-05 10:39:39] Preparing reads
left reads: min. length=12, max. length=347, 18834597 kept reads (58333 discarded)
Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places
[2014-05-05 10:45:43] Building transcriptome data files..
Error: gtf_to_fasta returned an error.
Thankx for your help
Leave a comment:
I checked the log file and besides the run.log which I posted earlier I get the following error in the g2f.log file
terminate called after throwing an instance of 'std:ut_of_range'
what(): basic_string::substr
Help me to understand this
Thank you
Leave a comment:
Hi GenoMax
The following is the list of file in the directory from which I am running the tophat command
caref_ncbiall.fa (genome sequence file)
caref_ncbiall.1.bt2 (bowtie index files)
cp04.fastq (reads files)
caref_seq.gff (genome annotation file)
Thank you
Kind regards
Leave a comment:
Varsha: Without seeing a listing of the files (related to this error, e.g. caref_ncbiall) in the directory you are running this from there is not much further help I can offer.
Leave a comment:
Hi Genomax
The index is also build using the same genome sequence file in the same directory
Leave a comment:
Originally posted by GenoMax View PostTopHat is picky about the order of options on the command line. Can you try the following:
Code:$ tophat -o cp04_thout5 -p 2 -G caref_seq.gff caref_ncbiall cp04.fastq
The basename is caref_ncbiall for the index files
Leave a comment:
Can you post a listing of the files in this directory?
Also see my previous post about the order of options. If the genome index is correctly created then give that command line a try.
Leave a comment:
HI GenoMAx
The genome index was created using the same fasta file and is in the same directory
Leave a comment:
Originally posted by varshacp View PostHi
I forgot to mention that the fastq file is also in same directory. The gff file and genome sequences were downloaded from NCBI (NCBI has separate fasta file for each chromosomes and all the unplaced scafolds are in one fasta file. I concatenated these files to make the genome file and renamed it as per the gff file. The same gff file is working with other genome sequence file which does not have unplaced sequences.
Thank you
Leave a comment:
Latest Articles
by seqadmin
Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.
Nucleic Acid Quality Control
Preparing for NGS starts with isolating the...-
Channel: Articles
02-10-2025, 01:58 PM -
by seqadmin
In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...-
Channel: Articles
01-27-2025, 07:46 AM -
Topics | Statistics | Last Post | ||
Genetic Mapping of Plasmodium knowlesi Identifies Essential Genes and Drug Resistance Mechanisms
by seqadmin
Started by seqadmin, 02-07-2025, 09:30 AM
0 responses
Last Post
by seqadmin
02-07-2025, 09:30 AM
Started by seqadmin, 02-05-2025, 10:34 AM
0 responses
Last Post
by seqadmin
02-05-2025, 10:34 AM
Started by seqadmin, 02-03-2025, 09:07 AM
0 responses
Last Post
by seqadmin
02-03-2025, 09:07 AM
Started by seqadmin, 01-31-2025, 08:31 AM
0 responses
Last Post
by seqadmin
01-31-2025, 08:31 AM
Leave a comment: