Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • patouch74
    Member
    • May 2014
    • 16

    Need advice for pre process and assembly

    Hi,

    I have two paired end files and I'm trying to merge them in order to assemble them, so I have two strategies but I don't what to use and I'm not really sure if they are correct or not.

    1) First method

    - Join them with fastq-join
    The outputs are 3 files: one refers to the joined reads from the original files (JOIN), one is the unjoined reads alone from the 1 (PE_1), and one likewise from 2 (PE_2).
    - Assembly with velveth :
    velveth Dir 31 -shortPaired -separate -fastq PE_1.fq PE_2.fq -short -fastq JOIN.fq
    - velvetg etc.


    2) Second method
    - Join them with velvet-shuffleSequences_fastq.pl
    The output is : one file containing both reads in interleaved format (OUT)
    - Assembly with velveth :
    velveth Dir 31 -shortPaired -fastq OUT.fq
    - velvetg etc.

    Some people had done these before ?
    Thanks
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    If you want to merge them, I suggest using BBMerge, which has a very low false positive join rate. False joins cause assembly errors.

    bbmerge.sh in1=PE_1.fq in2=PE_2.fq out=merged.fq outu=unmerged.fq

    The "outu" file will contain unmerged reads interleaved. But, I encourage you to try assembling twice, once with the original reads and once with the merged + unmerged reads, because merging is not guaranteed to improve assembly; sometimes it will make it worse.

    And as for "velvet-shuffleSequences_fastq.pl", not sure what the point is of that. Interleaving paired reads won't affect your assembly.

    Comment

    • patouch74
      Member
      • May 2014
      • 16

      #3
      Thanks for your answer,
      I will try your soft. To assembly merged + unmerged reads, the correct command is :
      velveth Dir 31 -shortPaired -fastq merged.fq -short -fastq unmerged.fq

      right ?

      and for velvet-shuffleSequences_fastq.pl, I've read

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        I'm guessing that information is obsolete, as Velvet can handle paired reads in two files just fine now.

        But the command would be:

        velveth Dir 31 -shortPaired -fastq unmerged.fq -short -fastq merged.fq

        ...since "unmerged" contains paired reads while "merged" contains the single reads. Also, you will probably get a better assembly with a higher K than 31.

        Comment

        • patouch74
          Member
          • May 2014
          • 16

          #5
          Brian, do you suggest to trim bad quality sequences before join or after join ?

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            I suggest this order:

            1) removal of phiX, and other artifact/contaminant reads
            2) adapter trimming
            3) normalization and/or error-correction and/or subsampling (all optional, depends on whether you have too much coverage)
            4) merging (optional)
            5) quality-trimming of unmerged reads only, to around Q10
            6) assembly

            All of this, aside from assembly, can be done with BBTools. Steps 1, 2, and 5 can be done with bbduk.sh and step 3 can be done with bbnorm.sh (error-correction or normalization) or reformat.sh (subsampling).

            1)
            bbduk.sh -Xmx1g in1=read1.fq in2=read2.fq out=clean.fq ref=phix174_ill.ref.fa.gz hdist=1 k=31

            2)
            bbduk.sh -Xmx1g in=clean.fq out=trimmed.fq ref=truseq.fq.gz ktrim=r mink=11 hdist=1 k=25

            3) (optional)
            ecc.sh -Xmx29g in=trimmed.fq out=corrected.fq

            4) (optional)
            bbmerge.sh in=corrected.fq out=merged.fq outu=unmerged.fq

            5)
            bbduk.sh -Xmx1g in=unmerged.fq out=qtrimmed.fq qtrim=rl trimq=10 minlength=50

            6)
            velveth Dir 31 -shortPaired -fastq qtrimmed.fq -short -fastq merged.fq

            The reference files phix174_ill.ref.fa.gz and truseq.fq.gz are both included with BBTools, in the "resources" directory. If you do error-correction (which will improve the rate of merging), the "-Xmx29g" flag is just an example; rather than 29g, it should be set to around 85% of the computer's physical memory.

            Comment

            Latest Articles

            Collapse

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Today, 06:09 AM
            0 responses
            15 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            34 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            39 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            44 views
            0 reactions
            Last Post SEQadmin2  
            Working...