Seqanswers Leaderboard Ad

**GenoMax** · 03-30-2016, 04:07 AM

Sounds like you get 2x300 reads to overlap (how much is the overlap). If you start doing 2x250 reads on HiSeq you would need to be careful since the reads may no longer overlap. If you are able to get the reads to overlap (even a small number of bases may be enough) you will get a longer representation (<500 bp).

@Brian has a merging method if the reads don't overlap. You can see that in this thread.

Sequencing center will not join these reads for you but I expect that you will need to do it yourself. BBMerge/FLASH are tool candidates for that.

Longer reads (2x250) would have lower Q-scores at end of reads (perhaps not necessarily greater error). Since you have used 2x300 MiSeq reads before you are aware of that possibility. I don't expect 2x250 reads to be less accurate (measurably) than the 2x150 reads.

You may want to do a test run before jumping in fully.

**SDPA_Pet** · 03-30-2016, 05:26 AM

Hi GenoMax,

I think I might not have explained clearly.

1>For my previous experience. Yes, I use 2X300 bp Mi-seq for 16S rRNA amplicon seuqnecing. They are overlapped because of primer design?

2>For my new project, I will be doing WGS with Hi-seq (2X250 or 2X150bp), which means no PCR amplicons or primers involved.

In this case, what should I do? Here is what I understand what you said?

I don't think the WGS will have paired-joined as what I did in Mi-seqs.
You said the sequencing center won't join it for me? Do you guys normally do pair joining for WGS? I don't think any reason to join a WGS paired sequences because there is no way to tell if they are overlapped or not, right? or join it first to get 500bp? Then I can choose to assemble it or unassemble it? -- I am quite confused about here.

If I don't want to assemble it, can I just use this unjoined paired sequencing results for analysis directly?

**GenoMax** · 03-30-2016, 05:54 AM

Originally posted by SDPA_Pet View Post

Hi GenoMax,
I think I might not have explained clearly.
1>For my previous experience. Yes, I use 2X300 bp Mi-seq for 16S rRNA amplicon seuqnecing. They are overlapped because of primer design?

Yes. But since you have now clarified that the new data is WGS that changes things.

I don't think the WGS will have paired-joined as what I did in Mi-seqs.
You said the sequencing center won't join it for me? Do you guys normally do pair joining for WGS? I don't think any reason to join a WGS paired sequences because there is no way to tell if they are overlapped or not, right? or join it first to get 500bp? Then I can choose to assemble it or unassemble it? -- I am quite confused about here.

Joining is not applicable if you are going to make standard WGS libraries.

If I don't want to assemble it, can I just use this unjoined paired sequencing results for analysis directly?

Absolutely. You will still scan/trim for adapter contamination as needed.

Are you going to try assembling the data (I assume the constituents are unknown)? That may influence the choice of read length.

**SDPA_Pet** · 03-30-2016, 06:04 AM

Hi GenoMax,

Thanks. Now it is clear.

1> I haven't decided assemble it or not at this time? If want to assemble it, should I choose 2X250bp long reads? Yes, it is from environment and we don't know what microbes are there.

2>Also, since WGS paired read sequencing can't join them to 500bp (in the case of 2X250bp). What is the advantageous for the paired end VS single end? What I understand is -- For each sheer genomic DNA fragment, we sequence it twice by using paired-end sequencing. So, it will give us some kind of proof-read?

**GenoMax** · 03-30-2016, 06:14 AM

You are probably going to use SPAdes or Metavelvet for these assemblies. See this note from SPAdes about read lengths and libraries you would need to make (http://spades.bioinf.spbau.ru/releas...al.html#sec3.4).

Paired end reads provide spatial information but no proof-reading (unless the reads overlap).

**SDPA_Pet** · 03-30-2016, 06:28 AM

When you say spatial information? Can you explain it? does this mean the reads location on the genome?

PS, for your previous reply " Are you going to try assembling the data (I assume the constituents are unknown)? That may influence the choice of read length. " -- If I eventually decide that I am gonna assemble, should I choose 2X250bp instead of 2X150bp? I know the software I am going to use to assemble, but I need to decide I do 2X150 or 2X250bp first.

Thank you.

**GenoMax** · 03-30-2016, 06:41 AM

Originally posted by SDPA_Pet View Post

When you say spatial information? Can you explain it? does this mean the reads location on the genome?

Since you know the average size of the fragments in your library you would roughly know that R1/R2 would be a certain distance apart (since they represent the two ends of the fragment).

PS, for your previous reply " Are you going to try assembling the data (I assume the constituents are unknown)? That may influence the choice of read length. " -- If I eventually decide that I am gonna assemble, should I choose 2X250bp instead of 2X150bp? I know the software I am going to use to assemble, but I need to decide I do 2X150 or 2X250bp first.

Does the software you are planning to use provide any recommendation? You saw the recommendation from SPAdes developers in the link above.

**SDPA_Pet** · 03-30-2016, 07:00 AM

GenoMaX,

1>The first question -- "average size of the fragments". How would I know it? The sequencing center will sheer the gDNA. Does this mean the if 2X250bp they will sheer it to average of 500bp size and if it is 2X150bp, they will sheer it to 300bp size? I am still confused how exactly Illumina works for WGS pair sequencing? Do you have any website link of details about how does WGS pair end works?

2>For the 2nd questions. I have never used both software. I was in a bioinformatics workshop long time ago. They taught velvet. It's first time to hear metaVelvet. It seems SPAdes can do both 2X150bp and 2X250bp. If I remembered correctly, velvet can assemble as short as to 50bp. Does this mean MetaVelvet can only assemble 2X150bp. I read some papers. They normal use velvet assemble metagenomic reads. What I remember velvet can assemble both? (2X250bp and 2X150bp)

**GenoMax** · 03-30-2016, 07:13 AM

Size/quantity/quality of the fragments/library can be determined by running on Agilent Bioanalyzer/Tapestation (http://www.agilent.com/cs/library/sl...Sequencing.pdf)

This is an older document but it should illustrate WGS principles: http://www.illumina.com/documents/pr...c_sequence.pdf

**SDPA_Pet** · 03-30-2016, 07:19 AM

Yes. I know bioanalyzer can do that, but I don't think the sequencing center will tell me about it. This is 3rd party sequencing center. We used to do 454 sequencing in our university and they always tell us the fragments size. I would guess this 3rd party sequencing center will just send the fastaq file back to us. I think my questions is about the "general rules" when they sheer the DNA for WGS? Do they sheer to ~ 300bp for 2X150bp and ~500bp for 2X250bp right?

Also, why would you think the choice of the assemble software will help decide if I am going to use 2X150bp or 2X250bp?

**fanli** · 03-30-2016, 11:43 AM

Most sequencing cores I know are generally happy to send you the Bioanalyzer tracers. Fragment size depends on what prep is used, not what the sequencing length will be.

Generally longer reads are better for assembly as you will be better able to span repetitive sequence. That being said, PacBio or those TSLR libraries would probably be ideal for a really good assembly...

WGS of ~200 bacteria: recommended sequencing type and size - SEQanswers

http://seqanswers.com/forums/showthread.php?p=163184

Wandering without a reference? Post here

**SDPA_Pet** · 03-30-2016, 11:51 AM

Thanks. I doubt I will use PacBio. Since these are environmental samples and not from pure bacteria culture. I have decided if I am going to assemble it or not? Because you don't know what is in the sample, a lot of pro and cons to discuss about assemble environmental samples.

PS, fanli, beside the fragment size, do they normally offer pair separation distances?

**fanli** · 03-30-2016, 11:57 AM

Originally posted by SDPA_Pet View Post

PS, fanli, beside the fragment size, do they normally offer pair separation distances?

aren't those the same thing?

also, just fyi, some of the metagenomics software out there (e.g. MetaPhlAn, kraken) suggest that you essentially cat your R1/R2 fastq files as input, so in that sense there is no benefit of doing paired-end sequencing.

**GenoMax** · 03-30-2016, 12:01 PM

@fanli: The packages you mention do phylogenetic analysis so that input expectation is specific. If @SDPA_Pet ever wants to do assembly of the data doing PE sequencing would be better upfront.

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Questions about Illumina paired-end metagenomics

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News