Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Himalaya
    Member
    • Jun 2010
    • 38

    #16
    Originally posted by Jose Blanca View Post
    Sorry, I have not explained myself well enough.
    clean_reads uses two different algorithms for quality trimming. One for long reads (lucy) and a different one for short reads. If you're cleaning long reads, the parameters aplicable are the lucy parameters: lucy_error, lucy_window and lucy_bracket. These are the parameters that you should tweak to modify the cleaning behaviour when dealing with 454 and sanger reads.

    For illumina and solid we didin't manage to use lucy so we implemented a sliding window trimming function. Its parameters are qual_window, qual_threshold, and only_3_end. That's why these parameters can only be used with short reads.
    Hi Jose
    I am going with your explainations. I issued the followed command:
    clean_reads -i Pair01_fastq_format.fastq -o ./clean_reads/output_q20_len_50 -p 454 -f fastq -g fastq -min_length 20 --lucy_error 0.025,0.02 --lucy_bracket 10,0.02 --lucy_window 1,0.02

    It seems to work fine but no output. The clean_reads.error explains that the input file "Pair01_fastq_format.fastq" is not found. The file is in the same directory from where i issued the command and it says "no such file or directory".
    Am i wrong in the command itself?
    I really appreciate your help. Thnx

    Comment

    • Jose Blanca
      Member
      • Aug 2009
      • 70

      #17
      Are you sure the file is in there? could it be a problem with the letter case? In unix the case matter Pair01_fastq_format.fastq and pair01_fastq_format.fastq would be different files.
      Can you run the following command ok?
      head Pair01_fastq_format.fastq

      Comment

      • Himalaya
        Member
        • Jun 2010
        • 38

        #18
        Hi Jose
        Its fine now. The command is working. But I have a question. If I want to cleaning minimum threshold quality score, how can i do that? Since --qual_threshold does not work for 454, how is 'clean_reads' working without quality threshold information?
        Thanx

        Comment

        • Jose Blanca
          Member
          • Aug 2009
          • 70

          #19
          Take a look at the lucy documentation, because you have to use their parameters.

          Comment

          • Pedro
            Junior Member
            • Dec 2008
            • 6

            #20
            Hi Jose,

            another question for you. I've been testing clean_reads and it works quite nicely. However, when I try to use multi threads I got errors which I believe are related with psubprocess. Could you help me on this (since it would speed up the work ). The error code is as follows:

            "clean_reads -i mp1_M1.fastq -o mp1_M1cr1.fastq -p illumina -t 4 -a adaptors.fasta
            The command was:
            /usr/local/bin/clean_reads -i mp1_M1.fastq -o mp1_M1cr1.fastq -p illumina -t 4 -a adaptors.fasta
            /usr/local/bin/clean_reads version: 0.2.1
            Running pipeline illumina with the following parameters:
            --platform: illumina
            --seq_in: mp1_M1.fastq
            --seq_out: mp1_M1cr1.fastq
            --adaptors_file: adaptors.fasta
            --threads: 4
            --disable_quality_trimming: False
            --qual_threshold: 20
            --qual_window: 1
            --only_3_end: False
            --filter_identity: 95.0
            --filter_length_percentage: 75.0
            --error_log: clean_reads.error
            An unexpected error happened.
            The clean_reads developers would appreciate your feedback
            Please send them the error log and take a look at it: clean_reads.error

            [Errno 2] No such file or directory: '/tmp/tmpTYEr3e/tmpKIFhW2/tmpGHXvok'/usr/local/lib/python2.6/dist-packages/franklin/utils/cgitb.py:245: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
            value = pydoc.text.repr(getattr(evalue, name))
            Traceback (most recent call last):
            File "/usr/local/bin/clean_reads", line 857, in <module>
            main(stdout, stderr)
            File "/usr/local/bin/clean_reads", line 840, in main
            processes=threads)
            File "/usr/local/lib/python2.6/dist-packages/franklin/pipelines/pipelines.py", line 339, in seq_pipeline_runner
            processes)
            File "/usr/local/lib/python2.6/dist-packages/franklin/pipelines/pipelines.py", line 287, in _parallel_process_sequences
            retcode = process.wait()
            File "/usr/local/lib/python2.6/dist-packages/psubprocess/prunner.py", line 374, in wait
            self._collect_output_streams()
            File "/usr/local/lib/python2.6/dist-packages/psubprocess/prunner.py", line 407, in _collect_output_streams
            joiner(out_file, part_out_fnames)
            File "/usr/local/lib/python2.6/dist-packages/psubprocess/prunner.py", line 490, in default_cat_joiner
            in_fhand = open(in_file_, 'r')
            IOError: [Errno 2] No such file or directory: '/tmp/tmpTYEr3e/tmpKIFhW2/tmpGHXvok' "

            thanx
            P

            Comment

            • Jose Blanca
              Member
              • Aug 2009
              • 70

              #21
              I think that I have fixed the problem. The fix is included in the psubprocess that I have just released, could you try to reinstall psubprcess with this version?

              Comment

              • grassgirl
                Member
                • Mar 2011
                • 51

                #22
                How does GS Assembler determine qual cutoff?

                Originally posted by Himalaya View Post
                Hi Jose
                I am trying to do quality trimming and filtering 454 reads. The adaptors and primers and barcode sequences are already removed.I am not allowed to specific minimum quality threshold to clean bad quality reads. I don't understand why? How does it do quality trimming.
                thnx
                I am wondering this myself. We have a Junior with v.2.5p1 software and it appears (by scanning many qual scores) that the lower cutoff is 10, although I see a few zeros in there (is this a glitch)? I looked through the GS Run section of the manual on filtering and could not find a place to set the qual score threshold nor an explanation of what the cutoff is. I'd hate to assume it is 10 based on a visual scan of some qual scores. Anybody have an idea?

                Comment

                • Jose Blanca
                  Member
                  • Aug 2009
                  • 70

                  #23
                  I would recommend to to a quality boxplot to understand how every run went. You have an example of a boxplot here:



                  It is the thrid chart.

                  Comment

                  • grassgirl
                    Member
                    • Mar 2011
                    • 51

                    #24
                    Thank you, Jose. After looking at the link you sent, I found that our bioinformatics center has a program to do just that. I am completely new to sequencing so I appreciate your help.

                    Comment

                    • Himalaya
                      Member
                      • Jun 2010
                      • 38

                      #25
                      Originally posted by essvee View Post
                      I suggest trying SeqTrim.
                      You can set minimum quality based on a defined window size, minimum length, etc.
                      You can also run it command line, or online.
                      www.scbi.uma.es/seqtrim/
                      hi essvee
                      Do you know how seqtrim cleans up the low quality bases. I mean the actual steps or methodology? My sequences are already clean from primers and adaptors. I just need to clean low quality bases. I couldn't find it in published papers(2007 and 2010) and in downloaded tar file. Please let me know where i can find the working principle of cleaning low quality bases.
                      Thanks

                      Comment

                      • Himalaya
                        Member
                        • Jun 2010
                        • 38

                        #26
                        Originally posted by robs View Post
                        I like PRINSEQ (http://prinseq.sourceforge.net/). It comes as web and standalone version and does all the QC and data pre-processing that you need.

                        The application note also contains a short comparison with similar tools (http://bioinformatics.oxfordjournals.../27/6/863.long).
                        HI Rob
                        Can you show me where can i find the working principle of prinseq? I would like to know how prinseq trims off the low quality score bases?

                        Comment

                        • Himalaya
                          Member
                          • Jun 2010
                          • 38

                          #27
                          Finally I got QTrim to quality trim the 454 sequence reads. It also outputs the graphical plots showing the quality trend of reads before and after quality trimming. QTrim is available here
                          hiv.sanbi.ac.za/software/qtrim

                          Comment

                          • maryluzyl
                            Junior Member
                            • Apr 2013
                            • 2

                            #28
                            Hi
                            someone knows how to change parameters in seqTrim by command line
                            Thanks so much
                            Mary Luz

                            Comment

                            • JackieBadger
                              Senior Member
                              • Mar 2009
                              • 385

                              #29
                              any 454 data should always be re-called using PyroBayes....much more accurate base-caller and significantly improves data quality

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 08:59 AM
                              0 responses
                              9 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              30 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...