Seqanswers Leaderboard Ad

**john6015** · 12-13-2010, 03:30 AM

anyone?
I have read some articles about short read assembly, and i noticed that after the first assembly step, they All use mate pair or pair end reads to make scaffolds or superconyigs. Anyone knows if its possible without pair end reads?

**boetsie** · 12-13-2010, 05:53 AM

Hi John,

it is impossible to make scaffolds from contigs using single read sequences. You need paired-end or mate pair data for this.

See these threads;

scaffold and contig - SEQanswers

http://seqanswers.com/forums/showthread.php?t=3405&highlight=scaffold

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

and

what is a paired-end read? - SEQanswers

http://seqanswers.com/forums/showthread.php?t=503&highlight=scaffold

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Hope this helps,
Boetsie

**john6015** · 12-13-2010, 06:50 AM

thanks for the info,

i'm a software engineer student, and for my final project i need to assemble from solid and 454 reads a whole genome, after using few assembly software like velvet and newbler, my biggest contig was 160kb in size.

no matter what other software i try to use, and i tried alot, i cant make any bigger contigs.

so you saying its impossible to make a whole genome from the data i have(which is not paired end)??

**colindaven** · 12-13-2010, 07:23 AM

Strictly it's not assembly, but if you have a closely related reference genome you can try to map your contigs to that (BLAST?) to gain extra information. The GenomeGraphs package in R can be useful for visulalising mapped contigs.

160 kb isn't too bad for many (especially repeat rich) genomes. I hope your supervisors aren't expecting a finished genome as that's not realistic.

**boetsie** · 12-13-2010, 07:54 AM

Problem with single read sequences is that they can't handle repeats since it is impossible to know where the repeat should be placed.

Say you have a repeat R, which is present two times in the genome. The neighbouring sequences for the first occurence i call A and B, and the neighbours for the second occurence of the repeat i call C and D:

A->R->B
C->R->D

With de novo assembly, it is impossible to predict whether the sequence should be A>R>B or A>R>D, unless the repeat is smaller than your biggest (454) read.

With paired data, you can predict if A and B belong together if one read of the sequence falls on contig A and the other sequence on contig B.

For more information, see this website;

Page not found | CBCB

http://www.cbcb.umd.edu/research/assembly_primer.shtml

As colindaven says; 190kb is quite good. I think you should not try to further improve the assembly.

Boetsie

**john6015** · 12-14-2010, 05:45 PM

Colindaven and boetsie, thanks for the help.

Colindaven, you mean i should try to find similiar genome on the internet with blast? Or i should try try do alignmet of first data that i have on the other?

**boetsie** · 12-15-2010, 03:28 AM

Hi John,

no problem, that's where this forum is for

About your question; If you have a reference genome (say for example; you have E.coli reads, your reference will be E.coli), do a reference assembly.

If you don't have a reference genome it is a bit harder... You can try to BLAST your contigs and see if you get a close related genome and use this genome for reference assembly. However, i'm not very familiar with this.

To do a reference assembly, take a look at the software packages at;

Software packages for next gen sequence analysis - SEQanswers

http://seqanswers.com/forums/showthread.php?t=43

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

or

SEQanswers

http://seqanswers.com/wiki/Software/list

Also, if you have a reference genome, try to map the contigs using this tool;

http://nbc11.biologie.uni-kl.de/

Go to Assembly -> Contig Aligner.

and see how they map.

Good luck.
Boetsie

Originally posted by john6015 View Post

Colindaven and boetsie, thanks for the help.

Colindaven, you mean i should try to find similiar genome on the internet with blast? Or i should try try do alignmet of first data that i have on the other?

**pari_89** · 07-12-2013, 05:36 AM

Hi everyone, I am probably asking a very basic question. I have FastQ file from ion Torrent PGM sequencer and do not know whether the data is single read or paired end. How can I check for that? I want to know if I can produce scaffolds with this data.

Thank you

Kind Regards

**Mark** · 07-12-2013, 06:04 AM

If you have a related genome to serve as a reference you might consider using the bambus2 scaffolder.

**pari_89** · 07-12-2013, 10:15 AM

Originally posted by Mark View Post

If you have a related genome to serve as a reference you might consider using the bambus2 scaffolder.

Hi, but can I use the fastQ file to see whether they are single or paired end reads? How do I know that?

**Mark** · 07-12-2013, 10:40 AM

I was responding to the question on scaffolding single end data. For your question, I've not used ion torrent but if it is similar to illumina output, paired end data comes in two files where single end data is in a single file. The pairs in each file are listed in order and have the same name up to where it designateds read 1 or read2. Its possible however that pe reads might come shuffled in a single file, in which case I would expect the first and second fastqs, and the trhird and fourth fastqs, etc., would be pairs sharing the same base ID

**pari_89** · 07-12-2013, 12:44 PM

Originally posted by Mark View Post

I was responding to the question on scaffolding single end data. For your question, I've not used ion torrent but if it is similar to illumina output, paired end data comes in two files where single end data is in a single file. The pairs in each file are listed in order and have the same name up to where it designateds read 1 or read2. Its possible however that pe reads might come shuffled in a single file, in which case I would expect the first and second fastqs, and the trhird and fourth fastqs, etc., would be pairs sharing the same base ID

Hi, thanks. I have a single fastQ file and the first line is like this:

@VW27N:4:11
ATGAAACGCCGATTATCTTTAGCAATAACATTGTTGGCCGAACCGGAATTAATCATATTAGATGAACCAACTGTAGGCATTGACCTAAATTGCGCCAACAAATATGGCAACAGTTCAAGCAAATGACCAAAGACGGAAAGAGTGTCGTCATCACAACACATGTTATGGATGAGGCGGAACGTTGTGATAAAGTTGGACTTATTGTCGA
+
CCCCC?CDE@EEE?CC@@@;@AEE?DD>@@C?CD>C>C>CC@E@E@C@C?C?CCCCCE@E686;<5;;C=CCCD==8?A=9;2.(.5:,<49;?C:B;ABE9CCCD=AA@CDDD5;;5;;;?6=BCC=CD<CCCC=CC9>>>>>=DDA<;;BBCD=CC666DD8=>D<A@D==4<>/606@=9?@===CC7@C;C266D=CC=DD?

Could anyone give me a clue? Thanks again

**Mark** · 07-13-2013, 06:34 AM

Given the head line
@VW27N:4:11
lists nothing that can be construed as a "1" or "2" designation it seems very likely that you have single end data. Just to be sure, check that the second record in this file is not also named @VW27N:4:11.

**krobison** · 07-13-2013, 06:49 PM

Paired end data for Ion Torrent is rare, so it is unlikely you have it. Also, I believe most of the time the insert size is still close to the typical read length, so in many cases you can get higher quality fused reads but it won't be much help for scaffolding.

Short insert libraries don't tend to give you much scaffolding information anyways.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

scaffolds without paired end?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News