Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • krobison
    replied
    Paired end data for Ion Torrent is rare, so it is unlikely you have it. Also, I believe most of the time the insert size is still close to the typical read length, so in many cases you can get higher quality fused reads but it won't be much help for scaffolding.

    Short insert libraries don't tend to give you much scaffolding information anyways.

    Leave a comment:


  • Mark
    replied
    Given the head line
    @VW27N:4:11
    lists nothing that can be construed as a "1" or "2" designation it seems very likely that you have single end data. Just to be sure, check that the second record in this file is not also named @VW27N:4:11.

    Leave a comment:


  • pari_89
    replied
    Originally posted by Mark View Post
    I was responding to the question on scaffolding single end data. For your question, I've not used ion torrent but if it is similar to illumina output, paired end data comes in two files where single end data is in a single file. The pairs in each file are listed in order and have the same name up to where it designateds read 1 or read2. Its possible however that pe reads might come shuffled in a single file, in which case I would expect the first and second fastqs, and the trhird and fourth fastqs, etc., would be pairs sharing the same base ID
    Hi, thanks. I have a single fastQ file and the first line is like this:

    @VW27N:4:11
    ATGAAACGCCGATTATCTTTAGCAATAACATTGTTGGCCGAACCGGAATTAATCATATTAGATGAACCAACTGTAGGCATTGACCTAAATTGCGCCAACAAATATGGCAACAGTTCAAGCAAATGACCAAAGACGGAAAGAGTGTCGTCATCACAACACATGTTATGGATGAGGCGGAACGTTGTGATAAAGTTGGACTTATTGTCGA
    +
    CCCCC?CDE@EEE?CC@@@;@AEE?DD>@@C?CD>C>C>CC@E@E@C@C?C?CCCCCE@E686;<5;;C=CCCD==8?A=9;2.(.5:,<49;?C:B;ABE9CCCD=AA@CDDD5;;5;;;?6=BCC=CD<CCCC=CC9>>>>>=DDA<;;BBCD=CC666DD8=>D<A@D==4<>/606@=9?@===CC7@C;C266D=CC=DD?

    Could anyone give me a clue? Thanks again

    Leave a comment:


  • Mark
    replied
    I was responding to the question on scaffolding single end data. For your question, I've not used ion torrent but if it is similar to illumina output, paired end data comes in two files where single end data is in a single file. The pairs in each file are listed in order and have the same name up to where it designateds read 1 or read2. Its possible however that pe reads might come shuffled in a single file, in which case I would expect the first and second fastqs, and the trhird and fourth fastqs, etc., would be pairs sharing the same base ID

    Leave a comment:


  • pari_89
    replied
    Originally posted by Mark View Post
    If you have a related genome to serve as a reference you might consider using the bambus2 scaffolder.
    Hi, but can I use the fastQ file to see whether they are single or paired end reads? How do I know that?

    Leave a comment:


  • Mark
    replied
    If you have a related genome to serve as a reference you might consider using the bambus2 scaffolder.

    Leave a comment:


  • pari_89
    replied
    Hi everyone, I am probably asking a very basic question. I have FastQ file from ion Torrent PGM sequencer and do not know whether the data is single read or paired end. How can I check for that? I want to know if I can produce scaffolds with this data.

    Thank you

    Kind Regards

    Leave a comment:


  • boetsie
    replied
    Hi John,

    no problem, that's where this forum is for

    About your question; If you have a reference genome (say for example; you have E.coli reads, your reference will be E.coli), do a reference assembly.

    If you don't have a reference genome it is a bit harder... You can try to BLAST your contigs and see if you get a close related genome and use this genome for reference assembly. However, i'm not very familiar with this.

    To do a reference assembly, take a look at the software packages at;

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    or


    Also, if you have a reference genome, try to map the contigs using this tool;



    Go to Assembly -> Contig Aligner.

    and see how they map.

    Good luck.
    Boetsie

    Originally posted by john6015 View Post
    Colindaven and boetsie, thanks for the help.

    Colindaven, you mean i should try to find similiar genome on the internet with blast? Or i should try try do alignmet of first data that i have on the other?

    Leave a comment:


  • john6015
    replied
    Colindaven and boetsie, thanks for the help.

    Colindaven, you mean i should try to find similiar genome on the internet with blast? Or i should try try do alignmet of first data that i have on the other?

    Leave a comment:


  • boetsie
    replied
    Problem with single read sequences is that they can't handle repeats since it is impossible to know where the repeat should be placed.

    Say you have a repeat R, which is present two times in the genome. The neighbouring sequences for the first occurence i call A and B, and the neighbours for the second occurence of the repeat i call C and D:

    A->R->B
    C->R->D

    With de novo assembly, it is impossible to predict whether the sequence should be A>R>B or A>R>D, unless the repeat is smaller than your biggest (454) read.

    With paired data, you can predict if A and B belong together if one read of the sequence falls on contig A and the other sequence on contig B.

    For more information, see this website;



    As colindaven says; 190kb is quite good. I think you should not try to further improve the assembly.

    Boetsie

    Leave a comment:


  • colindaven
    replied
    Strictly it's not assembly, but if you have a closely related reference genome you can try to map your contigs to that (BLAST?) to gain extra information. The GenomeGraphs package in R can be useful for visulalising mapped contigs.

    160 kb isn't too bad for many (especially repeat rich) genomes. I hope your supervisors aren't expecting a finished genome as that's not realistic.

    Leave a comment:


  • john6015
    replied
    thanks for the info,

    i'm a software engineer student, and for my final project i need to assemble from solid and 454 reads a whole genome, after using few assembly software like velvet and newbler, my biggest contig was 160kb in size.

    no matter what other software i try to use, and i tried alot, i cant make any bigger contigs.

    so you saying its impossible to make a whole genome from the data i have(which is not paired end)??

    Leave a comment:


  • boetsie
    replied
    Hi John,

    it is impossible to make scaffolds from contigs using single read sequences. You need paired-end or mate pair data for this.

    See these threads;
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    and
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    Hope this helps,
    Boetsie

    Leave a comment:


  • john6015
    replied
    anyone?
    I have read some articles about short read assembly, and i noticed that after the first assembly step, they All use mate pair or pair end reads to make scaffolds or superconyigs. Anyone knows if its possible without pair end reads?

    Leave a comment:


  • john6015
    started a topic scaffolds without paired end?

    scaffolds without paired end?

    hello all,
    i have data from solid, and from 454, unfortunately both of them are single end.
    is it possible to create scaffolds with single end contigs??

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 06-21-2024, 07:49 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-20-2024, 07:23 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-17-2024, 06:54 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-14-2024, 07:24 AM
0 responses
25 views
0 likes
Last Post seqadmin  
Working...
X