Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Can you tell me exlain why there is no benefit of doing paired-end sequencing, if a software require to cat R1/R2 fastq file.

    I think my questions are:

    If I don't want to assemble, I can use the fastq files to analysis directly right? I don't need to cat R1 and R2? Can I think R1 and R2 are duplicate?
    Last edited by SDPA_Pet; 03-30-2016, 12:10 PM.

    Comment


    • #17
      @GenoMax: yes, I should have made that explicitly clear. If assembly is even a remote possibility, PE is the way to go.

      @SDPA_Pet: If you simply cat your R1/R2 files together, then it is identical to sequencing SE at 2x the depth.

      Comment


      • #18
        Originally posted by fanli View Post
        @GenoMax: yes, I should have made that explicitly clear. If assembly is even a remote possibility, PE is the way to go.

        @SDPA_Pet: If you simply cat your R1/R2 files together, then it is identical to sequencing SE at 2x the depth.
        Also, if I want to cat my R1/R2, use the cat fastq file to do next analysis. Should I dump all the sequences that can't cat? I think the whole point for me to do PE is because it can give me 2X depth. I shouldn't trust those sequences that can't be concatenated, right?

        Comment


        • #19
          Originally posted by fanli View Post
          @SDPA_Pet: If you simply cat your R1/R2 files together, then it is identical to sequencing SE at 2x the depth.
          Not identical really. Let's say you have, for example, 1 million paired end reads, meaning 1 million R1 and 1 million R2 ==> 2 million total reads. You have have collected information from only 1 million unique pieces of DNA. With 2 million single end reads you have exactly the same number of reads & bps as the former case but you have collected information from 2 million unique fragments of DNA. (Assuming no PCR duplication.) Advantage goes to single end reads.

          However it is cheaper to get 2 million reads using paired ends (1 million + 1 million), than 2 million single end reads. Advantage paired end reads.

          Comment


          • #20
            Originally posted by kmcarr View Post
            Not identical really. Let's say you have, for example, 1 million paired end reads, meaning 1 million R1 and 1 million R2 ==> 2 million total reads. You have have collected information from only 1 million unique pieces of DNA. With 2 million single end reads you have exactly the same number of reads & bps as the former case but you have collected information from 2 million unique fragments of DNA. (Assuming no PCR duplication.) Advantage goes to single end reads.

            However it is cheaper to get 2 million reads using paired ends (1 million + 1 million), than 2 million single end reads. Advantage paired end reads.
            Sorry, I didn't explain it clearly. I mean PE sequencing. R1 and R2 should be almost identical for WGS?

            For example,

            1>For HI-seq 2X250 bp WGS seuqencing (not amplicon sequencing), if they sheer gDNA less than 250 bp, basically, I will get 1million X2 reads. R1 and R2 almost identical. It doesn't make sense to join the them together. I can use either one to analysis.

            2>If they sheer gDNA in between 250bp to 500 bp, there will be overlap region. I can join them and analysis.

            3>If they sheer gDNA >500bp, there won't be overlap region. I can't join them? Most of R1 and R2 reads are not identical? Almost 2 million unique reads.

            I don't know if I understand this wrong or correct? That's why I ask what is the normal sheered gDNA size.

            I only did amplicon sequencing before, when I give them my amplicons, the size has been decided. Amplicon sequencing, we use 2> methods, we joined them together to get long reads (my amplicon size). If I have 450 expected PCR amplcons. They only can sequence ~250 from one end. If I have two end sequenced, I will get my full amplicon sequences reads (about 450bp) after joined.
            Last edited by SDPA_Pet; 03-30-2016, 02:07 PM.

            Comment


            • #21
              You are forgetting to add ~120 (or is it 140?) bp for the two illumina adapters that are added to each fragment. That increases the effective size of the total fragment after library prep.

              In case of amplicons the fragment size is predetermined. In case of WGS the shearing is going to produce a distribution of fragment sizes with a median size. It is not cost effective to do stringent size selection (and not required either).

              I suggest that you consider the suggestion from SPAdes developers. They have seen multiple datasets and the recommendation stems from that experience. I am going to quote them:

              We suggest using 350-500 bp fragments with 2x150 reads and 550-700 bp fragments with 2x250 reads.
              Assuming this includes the adapters you can estimate the fragment sizes from there.

              Comment


              • #22
                Hi GenoMax,

                Yes. I know the adapter. I just give it a simplified example.

                SPAdes is good. However, I will probably use the MG-RAST online server workflow to do it. The lastest MG-RAST version has the choice to join PE reads (see here, http://blog.metagenomics.anl.gov/mg-rast-v3-2-faq/)

                However, I need to make sure my PE reads are able to join. That's why I am eager to know how to control the sheer DNA size, because this is the first time for me to do WGS to a new sequencing center.

                According to you, "It is not cost effective to do stringent size selection (and not required either)". Should I talk to the sequencing center person to control the size? I don't want them to give me the results that is not applicable for MG-RAST.

                See the last post here (http://seqanswers.com/forums/showthread.php?t=6791). The last post, the guy has the same problem, but seems no solution.

                If I don't tell them to control ,the size of sheered DNA and if they sheer long fragment, I can't join. R1 and R2 will be great different. I have to desposit them twice on MG-RAST and consider them as two samples, not one sample. I am trying to avoid this.

                Environmental metagenomics is quite different from working on model organisms.
                Last edited by SDPA_Pet; 03-30-2016, 02:46 PM. Reason: typo

                Comment


                • #23
                  You have been releasing additional bits of information over this entire day/discussion. Hope this is the last bit :-)

                  If MG-RAST (automated analysis platform for metagenomes providing quantitative insights into microbial populations based on sequence data) is your primary interest then be sure to let the sequencing center know that you want the two reads to overlap. They should be able to choose the right shearing conditions and you will probably want 2x250 reads to ensure overlap in the middle.

                  You could try to assemble the data later but the results may or may not be great.
                  Last edited by GenoMax; 03-30-2016, 03:04 PM.

                  Comment


                  • #24
                    Thanks GenoMax. Yes, I think so.

                    Comment


                    • #25
                      You can ask the centre to size select the library with Pippin or on gel to have a narrow fragment size distribution. For 2x250 sequencing size selection at 400 bp (-/+ 10%) and for 2x150 at 230 bp on Pippin instrument should give enough padding in the 3’ ends for merging majority of reads. They would shear DNA to the intended peak size to minimise DNA loss to prepare optimum library.

                      Comment


                      • #26
                        Is Pippin a software? Another thing I am not sure. I don't have a lot of samples. I would guess the sequencing center will put my samples with the samples from other customers on the same flow cell.

                        So, for the samples on the same flow cell, do they have to be the same sheered length or not? or they can do it by different requirements. I don't know whose samples will be with mine and what their requirement is.
                        Last edited by SDPA_Pet; 03-30-2016, 04:19 PM. Reason: typo

                        Comment


                        • #27
                          Pippin is automatic DNA size selection instrument with high accuracy which most centres would have one.

                          HiSeq flow cells have 8 lanes and they can multiplex as many library they need on each lane. Your libraries probably would be sequenced in one or two lanes depending on the amount of data required. Libraries in each lane or even in the same lane can be different sizes.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Latest Developments in Precision Medicine
                            by seqadmin



                            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                            Somatic Genomics
                            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                            05-24-2024, 01:16 PM
                          • seqadmin
                            Recent Advances in Sequencing Analysis Tools
                            by seqadmin


                            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                            05-06-2024, 07:48 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:55 AM
                          0 responses
                          12 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-30-2024, 03:16 PM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-29-2024, 01:32 PM
                          0 responses
                          29 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-24-2024, 07:15 AM
                          0 responses
                          215 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X