Might help us if you demonstrated that the file is indeed not empty. How about a 'ls -l' on the file. Or an 'od -c yourfile.sff | head --lines 4' or the actual command you sent to SeqIO.convert so that we can be sure that you did send your file to it.
Header Leaderboard Ad
Collapse
FASTQ sequence converter
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi,
I was actually able to get it to run today.. Not sure what the problem was yesterday. But i got some funny results anyhow. Some of the nt's are uppercase and some are lowercase. This caused problems for some of the Galaxy fastx tools that summarize quality data.
Any thoughts?
@HH42GP401CAJLD
gactagactcgacgtGTACTCAGGCTCGCACCGTGGCATGTCGCACTGTACTCAAGGCTCGCACCGTGGCATGTCGCACTGTACTTAAGGCTCACACCGTGGCATGTCGCACTGTACTCAAGGCACACAGGGGntaggnn
+
IIIIIIIIIIIIIIIIIIIGD666IIIIIIIIGDDDIIIIIIIIIIIIIIIGB;;;;IIIGGGGGCC>>>CIHID@@@C==:99==GGIIIIHIIIIIIIGGGCCCHIDDDC@777@C>1111AA@>;84445!;:44!!
@HH42GP401B4BC5
gactagactcgacgtGCAGTAGCTGCAATGGCGCAGAAGGCGTGCTTCtctctcncacgcacacacgagagagagngnnn
+
FFFFFFFFFFFFFFFIIIIIIIIIFFFFDDAAAB?<4444<>>9422323663/!//5///59=///2222////!2!!!
The code that I ran is here, (117,221 is the right number of reads for this file)
>>> SeqIO.convert("454Reads.JA11255_155_RL13.sff", "sff", "untrimmed.fastq", "fastq")
117221
Comment
-
Originally posted by lplough81 View PostHi,
I was actually able to get it to run today.. Not sure what the problem was yesterday. But i got some funny results anyhow. Some of the nt's are uppercase and some are lowercase.
Originally posted by lplough81 View PostThis caused problems for some of the Galaxy fastx tools that summarize quality data.
Any thoughts?
Or, what you probably want to do is ask for the trimmed sequences (which will be all upper case):
Code:SeqIO.convert("454Reads.JA11255_155_RL13.sff", "sff-trim", "trimmed.fastq", "fastq")
Comment
-
Originally posted by lplough81 View PostGot it. Fairly new work for me, so I appreciate the patient replies. Can I specify the quality cutoff for trimming? Or what is the default that the biopython fastq trimmer uses?
You may need to further trim off PCR primers or other library specific adapters if the Roche software wasn't told about them.
You may decide to further apply some quality cutoff trimming as well. This may be a good idea for some downstream analysis, not for others.
It is possible to do this kind of trimming in Biopython, but not in one line. There are some examples in the tutorial. I've written some SFF trimming tools using Biopython available within the Galaxy Tool Shed (if your institute runs its own Galaxy instance that may be interesting).
There are also other tools which will do it for you - especially if you want to work with the FASTQ file (or FASTA+QUAL) instead of the SFF file.
Comment
-
how to trim FASTA name
Hi,
Is there a simple way to reduce the fasta name (e.g /
"> HH42GP401CAJLD length=118 xy=0823_0287 region=1 run=R_2012_01_27_13_59_03_ "
to ">HH42GP401CAJLD"?
Similar to trimming an SFF file to FASTA with biopython SeqIOconvert(), but taking a fasta file as the input and then outputting another fasta file?
Thanks,
Louis
Comment
-
Try something like this, untested:
Code:from Bio import SeqIO in_file = "example.fasta" out_file = "new.fasta" file_format = "fasta" def remove_descr(record): record.description="" return record #This is a generator expression - not all in memory at once! wanted = (remove_descr(r) for r in SeqIO.parse(in_file, file_format)) count = SeqIO.write(wanted, out_file, file_format)
Comment
-
Originally posted by lplough81 View PostHi,
Is there a simple way to reduce the fasta name (e.g /
"> HH42GP401CAJLD length=118 xy=0823_0287 region=1 run=R_2012_01_27_13_59_03_ "
to ">HH42GP401CAJLD"?Code:sed 's/\s.*//' 454reads.fas > 454reads_trimmedheader.fas
Comment
-
Originally posted by kmcarr View PostNice catch drio, thanks. One of those really subtle things you don't catch until you work with a different set of files.
Eugeni, sorry I didn't get back to you on this; got really crushed at work. I have uploaded a modified version of the script incorporating drio's fix.
I really would appreciated it.
Comment
Latest Articles
Collapse
-
by seqadmin
Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...-
Channel: Articles
09-07-2023, 11:15 PM -
-
by seqadmin
Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.
Whole Transcriptome RNA-seq
Whole transcriptome sequencing...-
Channel: Articles
08-31-2023, 11:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 09-22-2023, 09:05 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
09-22-2023, 09:05 AM
|
||
Started by seqadmin, 09-21-2023, 06:18 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
09-21-2023, 06:18 AM
|
||
Started by seqadmin, 09-20-2023, 09:17 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
09-20-2023, 09:17 AM
|
||
Started by seqadmin, 09-19-2023, 09:23 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
09-19-2023, 09:23 AM
|
Comment