Header Leaderboard Ad

Collapse

FASTQ sequence converter

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sunil Bhavsar
    replied
    I am a beginner for using Perl command. Wen I am trying FastaQual2fastq.pl script for making my fastq file but they like this error - readline() on closed filehandle.
    So please help me to give a right solution for putting fasta seq.

    Leave a comment:


  • coolFlame
    replied
    Originally posted by kmcarr View Post
    Nice catch drio, thanks. One of those really subtle things you don't catch until you work with a different set of files.

    Eugeni, sorry I didn't get back to you on this; got really crushed at work. I have uploaded a modified version of the script incorporating drio's fix.
    @kmcarr: I found your script very useful and I am currently as a MSc Bioinformatics students working on an assignment which involves developing a web interface to a little mapping pipeline. This is purely for educational purposes. Would I be allowed to use your script to prepare the fastq file for the pipeline?
    I really would appreciated it.

    Leave a comment:


  • SES
    replied
    Originally posted by lplough81 View Post
    Hi,
    Is there a simple way to reduce the fasta name (e.g /
    "> HH42GP401CAJLD length=118 xy=0823_0287 region=1 run=R_2012_01_27_13_59_03_ "

    to ">HH42GP401CAJLD"?
    I don't think you need a script for that. If your file is "454reads.fas" then just do:
    Code:
    sed 's/\s.*//' 454reads.fas > 454reads_trimmedheader.fas

    Leave a comment:


  • maubp
    replied
    Try something like this, untested:
    Code:
    from Bio import SeqIO
    
    in_file = "example.fasta"
    out_file = "new.fasta"
    file_format = "fasta"
    
    def remove_descr(record):
        record.description=""
        return record
    
    #This is a generator expression - not all in memory at once!
    wanted = (remove_descr(r) for r in SeqIO.parse(in_file, file_format))
    count = SeqIO.write(wanted, out_file, file_format)
    print "Saved %i records" % count

    Leave a comment:


  • lplough81
    replied
    how to trim FASTA name

    Hi,
    Is there a simple way to reduce the fasta name (e.g /
    "> HH42GP401CAJLD length=118 xy=0823_0287 region=1 run=R_2012_01_27_13_59_03_ "

    to ">HH42GP401CAJLD"?

    Similar to trimming an SFF file to FASTA with biopython SeqIOconvert(), but taking a fasta file as the input and then outputting another fasta file?

    Thanks,

    Louis

    Leave a comment:


  • maubp
    replied
    Originally posted by lplough81 View Post
    Got it. Fairly new work for me, so I appreciate the patient replies. Can I specify the quality cutoff for trimming? Or what is the default that the biopython fastq trimmer uses?
    There are two things to consider - getting rid of the adapter sequences and quality trimming. Roche does a good job of this as part of the base calling and production of the SFF file. When reading SFF files, Biopython (and other tools like sff_extract and Roche's own tools) will just apply the trimming information recorded in the SFF file. Using the Roche trimming is usually fine.

    You may need to further trim off PCR primers or other library specific adapters if the Roche software wasn't told about them.

    You may decide to further apply some quality cutoff trimming as well. This may be a good idea for some downstream analysis, not for others.

    It is possible to do this kind of trimming in Biopython, but not in one line. There are some examples in the tutorial. I've written some SFF trimming tools using Biopython available within the Galaxy Tool Shed (if your institute runs its own Galaxy instance that may be interesting).

    There are also other tools which will do it for you - especially if you want to work with the FASTQ file (or FASTA+QUAL) instead of the SFF file.

    Leave a comment:


  • lplough81
    replied
    OK!

    Got it. Fairly new work for me, so I appreciate the patient replies. Can I specify the quality cutoff for trimming? Or what is the default that the biopython fastq trimmer uses?

    Thanks again.

    LP

    Leave a comment:


  • maubp
    replied
    Originally posted by lplough81 View Post
    Hi,
    I was actually able to get it to run today.. Not sure what the problem was yesterday. But i got some funny results anyhow. Some of the nt's are uppercase and some are lowercase.
    You'll see the same from Roche's own tools. The lower case are the bits which would be trimmed off as adapters or low quality bases.

    Originally posted by lplough81 View Post
    This caused problems for some of the Galaxy fastx tools that summarize quality data.

    Any thoughts?
    That could be an oversight in fastx - ask them about it.

    Or, what you probably want to do is ask for the trimmed sequences (which will be all upper case):

    Code:
    SeqIO.convert("454Reads.JA11255_155_RL13.sff", "sff-trim", "trimmed.fastq", "fastq")

    Leave a comment:


  • lplough81
    replied
    Hi,
    I was actually able to get it to run today.. Not sure what the problem was yesterday. But i got some funny results anyhow. Some of the nt's are uppercase and some are lowercase. This caused problems for some of the Galaxy fastx tools that summarize quality data.

    Any thoughts?

    @HH42GP401CAJLD
    gactagactcgacgtGTACTCAGGCTCGCACCGTGGCATGTCGCACTGTACTCAAGGCTCGCACCGTGGCATGTCGCACTGTACTTAAGGCTCACACCGTGGCATGTCGCACTGTACTCAAGGCACACAGGGGntaggnn
    +
    IIIIIIIIIIIIIIIIIIIGD666IIIIIIIIGDDDIIIIIIIIIIIIIIIGB;;;;IIIGGGGGCC>>>[email protected]@@C==:[email protected]@C>[email protected]>;84445!;:44!!
    @HH42GP401B4BC5
    gactagactcgacgtGCAGTAGCTGCAATGGCGCAGAAGGCGTGCTTCtctctcncacgcacacacgagagagagngnnn
    +
    FFFFFFFFFFFFFFFIIIIIIIIIFFFFDDAAAB?<4444<>>9422323663/!//5///59=///2222////!2!!!

    The code that I ran is here, (117,221 is the right number of reads for this file)
    >>> SeqIO.convert("454Reads.JA11255_155_RL13.sff", "sff", "untrimmed.fastq", "fastq")
    117221

    Leave a comment:


  • westerman
    replied
    Might help us if you demonstrated that the file is indeed not empty. How about a 'ls -l' on the file. Or an 'od -c yourfile.sff | head --lines 4' or the actual command you sent to SeqIO.convert so that we can be sure that you did send your file to it.

    Leave a comment:


  • lplough81
    replied
    Error on Fastq convert

    HI,
    I tried the fastq convert module in Biopython;

    from Bio import SeqIO
    SeqIO.convert("example.sff", "sff", "untrimmed.fastq", "fastq")

    (I used my sff file though)

    and I recieved this error:

    File "/usr/lib/pymodules/python2.7/Bio/SeqIO/SffIO.py", line 258, in _sff_file_header
    raise ValueError("Empty file.")
    ValueError: Empty file.

    Does this mean that there is an open line in the sff file? Any thoughts?

    Thanks,
    Louis

    Leave a comment:


  • maasha
    replied
    Using Biopieces you can do:

    Code:
    read_sff -i data.sff | write_fastq -o data.fq -x
    or

    Code:
    read_sff -i data.sff | write_454 -o data.fna -q data.fna.qual -x
    or both in one go:

    Code:
    read_sff -i data.sff | write_fastq -o data.fq | write_454 -o data.fna -q data.fna.qual -x

    Leave a comment:


  • prisnirath
    replied
    thanks...the script worked for me with a little alterations (minor ones).

    Leave a comment:


  • ketil
    replied
    Thanks for the benchmarks! What machine was used for this? I've written a program (flower - http://blog.malde.org/index.php/flower) to extract various information from SFF files, including Fasta and (Illumina or Sanger style) FastQ. It takes about 20 seconds to convert at 2.1G SFF to FastQ, but this is on a beefy server (Xeon 3.4GHz), so it's probably not directly comparable. Nice to see that we're in the same league, at least.

    Leave a comment:


  • nt2010
    replied
    Thanks BaCh and idas for your answers. All clear.

    I'm not sure if i should continue here or start another thread. My questions would be that some of trimmed reads output by the converter(s) can still be very long with low quality at the end (Phred ~ 10). Should i trim then further, or it's acceptable to keep them as 454 works differently from illumina?

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
    by seqadmin


    ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

    01-24-2023, 01:19 PM
  • seqadmin
    Introduction to Single-Cell Sequencing
    by seqadmin
    Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

    The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
    ...
    01-09-2023, 03:10 PM
  • seqadmin
    AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
    by seqadmin
    Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

    Read type and length
    AVITI is a short-read benchtop sequencer that also offers an innovative...
    12-29-2022, 10:44 AM

ad_right_rmr

Collapse
Working...
X