Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions about Illumina paired-end metagenomics

    Hello guys,

    I am going to send some samples (microbial communities soil samples) to a sequencing center for metagenomics (illumina Hi-seq). I have done Illumina Miseq 16S rRNA amplicon sequencing (paired-end, 2X300bp) before. For 16S rRNA sequencing, we call it paired joined ends some time. We use primer 939F-1492R and sequencing from two directions. Then we join the two paired sequences together and we can get long reads (500 -600bp) before there are some overlaps at the end of sequencing. This is for 16S rRNA amplicon sequencing.

    I don't know how does it work for Illumina metagenomics

    1> For Hi-seq 2X250bp. I suppose I will get two ~ 250 bp sequences back. Can I join them together for a long fragments (500bp)? Or these two just sequencing twice. They are reverse complements. Basically, I just get one backup sequencing for verification. The real fragment length won't be doubled and still 250bp?

    a>if they are joined paired, will the sequencing center join these for me?

    b>If they are duplicate reads, I'm not sure what should I do. Should I remove the duplicates or find consensus reads.

    2> Also, they have 2X150bp and 2X250bp (http://www.illumina.com/systems/hise...ific_data.html). For environmental data, should I choose later and longer reads? Is it the longer the better? The illumina official site says the 150bp is slow but accurate method. There is no special comment on 250bp. I suppose it (250bp) faster but not accurate?

    Thank you,
    Ben

  • #2
    Sounds like you get 2x300 reads to overlap (how much is the overlap). If you start doing 2x250 reads on HiSeq you would need to be careful since the reads may no longer overlap. If you are able to get the reads to overlap (even a small number of bases may be enough) you will get a longer representation (<500 bp).

    @Brian has a merging method if the reads don't overlap. You can see that in this thread.

    Sequencing center will not join these reads for you but I expect that you will need to do it yourself. BBMerge/FLASH are tool candidates for that.

    Longer reads (2x250) would have lower Q-scores at end of reads (perhaps not necessarily greater error). Since you have used 2x300 MiSeq reads before you are aware of that possibility. I don't expect 2x250 reads to be less accurate (measurably) than the 2x150 reads.

    You may want to do a test run before jumping in fully.

    Comment


    • #3
      Hi GenoMax,

      I think I might not have explained clearly.

      1>For my previous experience. Yes, I use 2X300 bp Mi-seq for 16S rRNA amplicon seuqnecing. They are overlapped because of primer design?

      2>For my new project, I will be doing WGS with Hi-seq (2X250 or 2X150bp), which means no PCR amplicons or primers involved.

      In this case, what should I do? Here is what I understand what you said?

      I don't think the WGS will have paired-joined as what I did in Mi-seqs.
      You said the sequencing center won't join it for me? Do you guys normally do pair joining for WGS? I don't think any reason to join a WGS paired sequences because there is no way to tell if they are overlapped or not, right? or join it first to get 500bp? Then I can choose to assemble it or unassemble it? -- I am quite confused about here.

      If I don't want to assemble it, can I just use this unjoined paired sequencing results for analysis directly?

      Comment


      • #4
        Originally posted by SDPA_Pet View Post
        Hi GenoMax,
        I think I might not have explained clearly.
        1>For my previous experience. Yes, I use 2X300 bp Mi-seq for 16S rRNA amplicon seuqnecing. They are overlapped because of primer design?
        Yes. But since you have now clarified that the new data is WGS that changes things.
        I don't think the WGS will have paired-joined as what I did in Mi-seqs.
        You said the sequencing center won't join it for me? Do you guys normally do pair joining for WGS? I don't think any reason to join a WGS paired sequences because there is no way to tell if they are overlapped or not, right? or join it first to get 500bp? Then I can choose to assemble it or unassemble it? -- I am quite confused about here.
        Joining is not applicable if you are going to make standard WGS libraries.
        If I don't want to assemble it, can I just use this unjoined paired sequencing results for analysis directly?
        Absolutely. You will still scan/trim for adapter contamination as needed.

        Are you going to try assembling the data (I assume the constituents are unknown)? That may influence the choice of read length.

        Comment


        • #5
          Hi GenoMax,

          Thanks. Now it is clear.

          1> I haven't decided assemble it or not at this time? If want to assemble it, should I choose 2X250bp long reads? Yes, it is from environment and we don't know what microbes are there.

          2>Also, since WGS paired read sequencing can't join them to 500bp (in the case of 2X250bp). What is the advantageous for the paired end VS single end? What I understand is -- For each sheer genomic DNA fragment, we sequence it twice by using paired-end sequencing. So, it will give us some kind of proof-read?

          Comment


          • #6
            You are probably going to use SPAdes or Metavelvet for these assemblies. See this note from SPAdes about read lengths and libraries you would need to make (http://spades.bioinf.spbau.ru/releas...al.html#sec3.4).

            Paired end reads provide spatial information but no proof-reading (unless the reads overlap).

            Comment


            • #7
              When you say spatial information? Can you explain it? does this mean the reads location on the genome?

              PS, for your previous reply " Are you going to try assembling the data (I assume the constituents are unknown)? That may influence the choice of read length. " -- If I eventually decide that I am gonna assemble, should I choose 2X250bp instead of 2X150bp? I know the software I am going to use to assemble, but I need to decide I do 2X150 or 2X250bp first.

              Thank you.

              Comment


              • #8
                Originally posted by SDPA_Pet View Post
                When you say spatial information? Can you explain it? does this mean the reads location on the genome?
                Since you know the average size of the fragments in your library you would roughly know that R1/R2 would be a certain distance apart (since they represent the two ends of the fragment).

                PS, for your previous reply " Are you going to try assembling the data (I assume the constituents are unknown)? That may influence the choice of read length. " -- If I eventually decide that I am gonna assemble, should I choose 2X250bp instead of 2X150bp? I know the software I am going to use to assemble, but I need to decide I do 2X150 or 2X250bp first.
                Does the software you are planning to use provide any recommendation? You saw the recommendation from SPAdes developers in the link above.

                Comment


                • #9
                  GenoMaX,

                  1>The first question -- "average size of the fragments". How would I know it? The sequencing center will sheer the gDNA. Does this mean the if 2X250bp they will sheer it to average of 500bp size and if it is 2X150bp, they will sheer it to 300bp size? I am still confused how exactly Illumina works for WGS pair sequencing? Do you have any website link of details about how does WGS pair end works?

                  2>For the 2nd questions. I have never used both software. I was in a bioinformatics workshop long time ago. They taught velvet. It's first time to hear metaVelvet. It seems SPAdes can do both 2X150bp and 2X250bp. If I remembered correctly, velvet can assemble as short as to 50bp. Does this mean MetaVelvet can only assemble 2X150bp. I read some papers. They normal use velvet assemble metagenomic reads. What I remember velvet can assemble both? (2X250bp and 2X150bp)

                  Comment


                  • #10
                    Size/quantity/quality of the fragments/library can be determined by running on Agilent Bioanalyzer/Tapestation (http://www.agilent.com/cs/library/sl...Sequencing.pdf)

                    This is an older document but it should illustrate WGS principles: http://www.illumina.com/documents/pr...c_sequence.pdf

                    Comment


                    • #11
                      Yes. I know bioanalyzer can do that, but I don't think the sequencing center will tell me about it. This is 3rd party sequencing center. We used to do 454 sequencing in our university and they always tell us the fragments size. I would guess this 3rd party sequencing center will just send the fastaq file back to us. I think my questions is about the "general rules" when they sheer the DNA for WGS? Do they sheer to ~ 300bp for 2X150bp and ~500bp for 2X250bp right?

                      Also, why would you think the choice of the assemble software will help decide if I am going to use 2X150bp or 2X250bp?

                      Comment


                      • #12
                        Most sequencing cores I know are generally happy to send you the Bioanalyzer tracers. Fragment size depends on what prep is used, not what the sequencing length will be.

                        Generally longer reads are better for assembly as you will be better able to span repetitive sequence. That being said, PacBio or those TSLR libraries would probably be ideal for a really good assembly...

                        Comment


                        • #13
                          Thanks. I doubt I will use PacBio. Since these are environmental samples and not from pure bacteria culture. I have decided if I am going to assemble it or not? Because you don't know what is in the sample, a lot of pro and cons to discuss about assemble environmental samples.

                          PS, fanli, beside the fragment size, do they normally offer pair separation distances?

                          Comment


                          • #14
                            Originally posted by SDPA_Pet View Post
                            PS, fanli, beside the fragment size, do they normally offer pair separation distances?
                            aren't those the same thing?

                            also, just fyi, some of the metagenomics software out there (e.g. MetaPhlAn, kraken) suggest that you essentially cat your R1/R2 fastq files as input, so in that sense there is no benefit of doing paired-end sequencing.

                            Comment


                            • #15
                              @fanli: The packages you mention do phylogenetic analysis so that input expectation is specific. If @SDPA_Pet ever wants to do assembly of the data doing PE sequencing would be better upfront.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM
                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:35 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-09-2024, 02:46 PM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-07-2024, 06:57 AM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-06-2024, 07:17 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X