Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Himalaya
    Member
    • Jun 2010
    • 38

    454 Data cleaning

    Has anyone tried any software for 454 data cleaning, removing the poor quality reads? And has anyone tired installing hyphy and 454HIV without any problem in installing? I need help..
  • proteasome
    Member
    • Jul 2009
    • 22

    #2
    If you're generally looking to cleanup 454 data I would suggest using Galaxy http://main.g2.bx.psu.edu/to convert your sff files to fastq, and then using fastq filters and tools to remove short reads or low quality reads. You can also mask low quality bases (such as in homo-polymers) to Ns without loosing reads.

    Comment

    • Himalaya
      Member
      • Jun 2010
      • 38

      #3
      thanx proteasome for the reply..galaxy is great online tool...problem is uploading files of huge size..

      Comment

      • essvee
        Member
        • Apr 2011
        • 11

        #4
        I suggest trying SeqTrim.
        You can set minimum quality based on a defined window size, minimum length, etc.
        You can also run it command line, or online.

        Comment

        • DZhang
          Senior Member
          • Jun 2010
          • 177

          #5
          Hi,

          Check out fastx toolkits (http://hannonlab.cshl.edu/fastx_toolkit/) and SolexaQA (http://solexaqa.sourceforge.net/). Both have simple but neat scripts to do read trimming.

          Douglas
          Last edited by DZhang; 05-28-2011, 06:41 PM. Reason: correction

          Comment

          • Jose Blanca
            Member
            • Aug 2009
            • 70

            #6
            We have done our own read cleaning pipeline. It works for us, so we have made it available just in case it could be of any use to other people. It is called clean_reads.

            Comment

            • robs
              Senior Member
              • May 2010
              • 116

              #7
              I like PRINSEQ (http://prinseq.sourceforge.net/). It comes as web and standalone version and does all the QC and data pre-processing that you need.

              The application note also contains a short comparison with similar tools (http://bioinformatics.oxfordjournals.../27/6/863.long).

              Comment

              • Himalaya
                Member
                • Jun 2010
                • 38

                #8
                Originally posted by Jose Blanca View Post
                We have done our own read cleaning pipeline. It works for us, so we have made it available just in case it could be of any use to other people. It is called clean_reads.
                Hi Jose Blanca..I installed clean_reads with Biopython and psubprocess preinstalled according to requirement but resulted to segmentation fault. have you run the program? Please advice me about the fault if you run it clean. thank you

                Comment

                • Jose Blanca
                  Member
                  • Aug 2009
                  • 70

                  #9
                  I would need more information. A segmentation fault is quite a strange error in a python program. could you send me the output?

                  Comment

                  • Himalaya
                    Member
                    • Jun 2010
                    • 38

                    #10
                    Originally posted by Jose Blanca View Post
                    I would need more information. A segmentation fault is quite a strange error in a python program. could you send me the output?
                    Hi Jose
                    I am using mac os snow leopard. My commandline is: clean_reads -i Pair01.fastq -o ./clean_reads/output_q20_len50_only3end -p 454 -f fastq -g fastq -qual_threshold 20 -only_3_end True -min_len 50. It only gave me one line error 'segmentation fault' and says python quit unexpectedly in separate window with long error report. A small last part of error report is below:
                    0x7fff8507b000 - 0x7fff85131fff libobjc.A.dylib 227.0.0 (compatibility 1.0.0) <1960E662-D35C-5D98-EB16-D43166AE6A22> /usr/lib/libobjc.A.dylib
                    0x7fff85288000 - 0x7fff85446fff libicucore.A.dylib 40.0.0 (compatibility 1.0.0) <3D9313BF-97A4-6B65-E583-F6173E64C3C2> /usr/lib/libicucore.A.dylib
                    0x7fff8643f000 - 0x7fff86461ff7 libexpat.1.dylib 7.2.0 (compatibility 7.0.0) <7D173736-CBDF-F02F-2D07-B38F565D5ED4> /usr/lib/libexpat.1.dylib
                    0x7fff86462000 - 0x7fff864aaff7 libvDSP.dylib 268.0.1 (compatibility 1.0.0) <98FC4457-F405-0262-00F7-56119CA107B6> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libvDSP.dylib
                    0x7fff87df1000 - 0x7fff87df1ff7 com.apple.Accelerate 1.6 (Accelerate 1.6) <15DF8B4A-96B2-CB4E-368D-DEC7DF6B62BB> /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate
                    0x7fff8846a000 - 0x7fff88544fff com.apple.vImage 4.0 (4.0) <B5A8B93B-D302-BC30-5A18-922645DB2F56> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vImage.framework/Versions/A/vImage
                    0x7fff88545000 - 0x7fff88d4ffe7 libBLAS.dylib 219.0.0 (compatibility 1.0.0) <2F26CDC7-DAE9-9ABE-6806-93BBBDA20DA0> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
                    0x7fffffe00000 - 0x7fffffe01fff libSystem.B.dylib ??? (???) <40DA878D-6D69-FEA3-398B-BBD80C9BFF46> /usr/lib/libSystem.B.dylib

                    Then i tried to run clean_reads in ubuntu with same command and gave me error:
                    IOError: [Errno 2] No such file or directory: 'ual_threshold'
                    i did specify any file 'ual_threshold'. That was option -qual_threshold' i specified.
                    any advice..please

                    Comment

                    • Jose Blanca
                      Member
                      • Aug 2009
                      • 70

                      #11
                      In mac it won't work, because the binaries shiped inside clean_reads are only for linux.
                      Regarding the linux problem, it's a malformed command line. It should be:
                      --qual_threshold
                      instead of:
                      -qual_threshold
                      Regards.

                      Comment

                      • Himalaya
                        Member
                        • Jun 2010
                        • 38

                        #12
                        Originally posted by Jose Blanca View Post
                        In mac it won't work, because the binaries shiped inside clean_reads are only for linux.
                        Regarding the linux problem, it's a malformed command line. It should be:
                        --qual_threshold
                        instead of:
                        -qual_threshold
                        Regards.
                        Hi Jose, Thanks a lot. In linux it seems to work now..For the same command again, it gives me output " parameter qual_threshold is incompatible with platform long_with_quality". I tested the --qual_threshold value from 10 to 100 and repeatedly gave the same output.
                        any advice on this..Thanks for helping me out to run the program.

                        Comment

                        • Jose Blanca
                          Member
                          • Aug 2009
                          • 70

                          #13
                          You can not use the --qual_thrshold parameter for long reads (sanger or 454). I have to explain that in the documentation a little. For the long reads we trim the bad quality regions by using lucy so the parameter to change would be lucy_error. qual_threshold is used by the short reads quality trimmer.

                          Comment

                          • Himalaya
                            Member
                            • Jun 2010
                            • 38

                            #14
                            Originally posted by Jose Blanca View Post
                            You can not use the --qual_thrshold parameter for long reads (sanger or 454). I have to explain that in the documentation a little. For the long reads we trim the bad quality regions by using lucy so the parameter to change would be lucy_error. qual_threshold is used by the short reads quality trimmer.
                            Hi Jose
                            I am trying to do quality trimming and filtering 454 reads. The adaptors and primers and barcode sequences are already removed.I am not allowed to specific minimum quality threshold to clean bad quality reads. I don't understand why? How does it do quality trimming. Sorry I could not get documentation of clean_reads. And when i specify option -only_3_end True, it gave me error not compatible with platform. So does it mean it trims from 5' and 3' prime ends.

                            thnx

                            Comment

                            • Jose Blanca
                              Member
                              • Aug 2009
                              • 70

                              #15
                              Sorry, I have not explained myself well enough.
                              clean_reads uses two different algorithms for quality trimming. One for long reads (lucy) and a different one for short reads. If you're cleaning long reads, the parameters aplicable are the lucy parameters: lucy_error, lucy_window and lucy_bracket. These are the parameters that you should tweak to modify the cleaning behaviour when dealing with 454 and sanger reads.

                              For illumina and solid we didin't manage to use lucy so we implemented a sliding window trimming function. Its parameters are qual_window, qual_threshold, and only_3_end. That's why these parameters can only be used with short reads.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...