Seqanswers Leaderboard Ad

**KatsenPlatz** · 08-19-2013, 09:15 AM

My Guess is that there is a problem somewhere in the fastq file, although the first 4 lines look good! A few checks that you can do are:

1. there are 4 lines for each sequence read in the file, i.e. total count of lines in the fastq file is 4x the total number of reads
2. every first line of each record starts with @ and every third line starts with +
3. the length of the quality sequence is the same as the length of the sequence read for every record

**GenoMax** · 08-19-2013, 09:21 AM

Looking at the sequence identifiers I wonder if this is old data from a GAII machine. It is then likely in the older illumina (1.3) Fastq format. If that is the case then you may need to add the relevant options for tophat to take that into account.

From TopHat manual

--solexa-quals Use the Solexa scale for quality values in FASTQ files.
--solexa1.3-quals As of the Illumina GA pipeline version 1.3, quality scores are encoded in Phred-scaled base-64. Use this option for FASTQ files from pipeline 1.3 or later.

**manvendra7** · 09-30-2013, 03:50 PM

Thanks guys,
My Problem is figured out. There was a problem with my fastq file

**arkilis** · 09-30-2013, 03:55 PM

Originally posted by manvendra7 View Post

Dear FOlks,
I am so new, an early stage researcher.

I am using TopHat2 to map the reads, I guess, I am fulfilling all the requirements, my code is

/usr/local/bin/tophat2 -p 8 -G ~/path/to/Homo_sapiens.GRCh37.72.gtf -o
~/path/to/Human_mapping_iPS_s7_rep1
--splice-mismatches 1 --max-multihits 30 --microexon-search --fusion-search
~/path/to/bowtie2_index/hg19
~/path/to/myfile.fastq

I am submitting on grid engine cluster with qsub -l h_vmem=50G [above_script]
this is showing error as:
"""""TopHat requires all reads be either FASTQ or FASTA. Mixing formats is not supported"""

I am bit frustrated because my fastq files look fine to me as shown in code

@SOLEXA-GA05_00009_SRi_AD_MS_BN_VW:7:1:2364:933#ATGAGCA
NGGCCTTCCCACATTCTTTACACTCATAGGTTTTCTCACCAGTGTGAGTTCTCTTGTGCACAATAAGGTAAGAGCC
+SOLEXA-GA05_00009_SRi_AD_MS_BN_VW:7:1:2364:933#ATGAGCA
!454478347;09977778<655476;69;8588380745<75;57495945158::=677976:7674:64763-

Please help???????

For all what I know is there are diff verions of fastq format. you better have to check of the software requirements.

FASTQ: three main versions, illumina 1.3+, 1.5+ and 1.8+

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

where is real problem, tophat2 code or my fastq files

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News