Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • filtered_subreads.fastq contain multipass reads

    Hello , I'm new to the Pacbio. I strated with the SMRT Portal to pre-process my raw h5 files, so I had to use the RS_Subreads.1 protocol to filter reads by quality, length etc..., but I noticed that the output filtered_subreads.fastq contain full multi pass reads (
    CCS) and partial ones.
    I know that I have to extract CCS reads for error correction , but my question, do multi pass reads affect the assembly results since they are duplicated ans Is it worth to remove them from the filtered_subreads.fastq before the assembly step.
    Many thanks.

  • #2
    If you want CCS reads then you should use "RS_ReadsOfInsert" protocol. Not a very logical name but that is what we have now.

    Comment


    • #3
      Yes that's waht I expect to do. But is it worth to remove ccs reads and partial mutli pass reads from the fltered_subreads.fastq ?

      Comment


      • #4
        Are you going to assemble outside of SMRTportal or are you planning to use HGAP within SMRTportal?

        "CCS-like" reads (is probably the best term to use per PacBio) would give you the longest and best representation of that particular fragment.

        Comment


        • #5
          Yes I plan to use HGAP within SMRTportal and may be also the CeleraAssembler but of course after processing the error correction of Pacbio reads by using CCS and illumina reads.

          Comment


          • #6
            Yes I plan to use HGAP within SMRTportal and may be also the CeleraAssembler but of course after processing the error correction of Pacbio reads by using CCS and illumina reads.
            But how about multi (full and partial) pass in the filtered_subreads.fastq file generated by RS_Subread.1 protocol in the SMRT portal. do you suggest to remove these fragments from this file, as they are generally short, duplicated and with poor quality ?

            Comment


            • #7
              If you are going to do the assembly outside SMRTportal it may be best to filter out those reads (or use ReadsOfInsert output instead). A nice summary here: https://github.com/PacificBioscience...Bio-Long-Reads

              Since in HGAP_Assembly.2 protocol in SMRTPortal the following happens at step 1:

              Filtering Parameters (PreAssembler Filter v1)

              Minimum Subread Length: Subreads shorter than this value (in base pairs) are filtered out and excluded from analysis.
              Minimum Polymerase Read Quality: Polymerase reads with lower quality than this value are filtered out and excluded from analysis.
              Minimum Polymerase Read Length: Polymerase reads shorter than this value (in base pairs) are filtered out and excluded from analysis.

              Comment


              • #8
                Thank you for this nice summary. I have 50X pacbio reads and 50X illumina, so according to the summary, it is best for me to use Celera or Ectools for the assembly.
                I think that the HGAP PreAssembler Filter v1 step can be done also by the RS_Subread.1 protocol since it uses the same parameters and it does not filter out totallty the "duplicated" (multi pass) reads. So I expect to do a house script to remove these reads, which should have the same id beginning of the CCS reads and then replace them .
                Thank you again for your response

                Comment


                • #9
                  With 50X PacBio you can simply run HGAP without worrying about the illumina data. If you have SMRT Analysis installed, run the HGAP.3 protocol, with a reasonable estimate of genome size: http://programs.pacificbiosciences.c...3-07-15/2t6ztt

                  Comment


                  • #10
                    Thank you Rhall, but sorry I did a miss estimation of the pacbio reads coverage, I have only about 14x that's why I would perform the hybrid assembly. But before that, I have some steps to do:
                    Quality filtering,
                    CCS extraction,
                    Multi pass subreads removal,
                    Error correction,
                    And Chimeras removal.

                    Comment


                    • #11
                      To go into ECTools hybrid assembly all that needs to be done is run the basic filter protocol to generate a filtered_subreads.fasta file (RS_subreads.1 protocol in SMRT Analysis)

                      Comment


                      • #12
                        I ran the RS_subreads.1 protocol but I noticed that multi pass reads are still present in the filtered_subread.fasta, so I need a specific script to remove them.

                        Comment


                        • #13
                          Why do you need to remove them? You will only be using the longest reads for error correction, which will likely have few passes, it shouldn't be a problem for hybrid assembly.

                          Comment


                          • #14
                            In fact my genome is really complicated to assemble as it is ultra repeated, so I thought that reducing as maximum artifact reads would be benefict for asssembly resuts statistics. I expect to use all the reads for assembly (longest ones and short ones) So if I'am understanding you, you think that multi passe reads should't be a problem for the assembly ?

                            Comment


                            • #15
                              I wouldn't worry about it until after you have an initial assembly. The only problem I foresee is if the library wasn't good and the reads are all short, but in that case, no amount of filtering is going to improve the assembly.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-25-2024, 11:49 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-24-2024, 08:47 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              62 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X