Seqanswers Leaderboard Ad

**ECO** · 03-14-2010, 12:38 PM

Compiled all that you found, as well as a couple more, and couldn't resist making it pretty.

Also put it on SEQwiki: The_Greatest_Papers_in_the_World.

**steven** · 05-17-2010, 05:51 AM

Two more:

Heng Li and Nils Homer
A survey of sequence alignment algorithms for next-generation sequencing
Briefings in Bioinformatics Advance Access published on May 11, 2010.
Pubmed

Mikael Huss
Introduction into the analysis of high-throughput-sequencing based epigenome data
Briefings in Bioinformatics Advance Access published on May 10, 2010.
Pubmed

**shine88** · 09-16-2010, 04:27 PM

good post for the beginer

**NeuroGenXtics** · 10-25-2010, 03:12 PM

One more

Please excuse the shameless self promotion.
Corbett M, Gecz J.
Great expectations: using massively parallel sequencing to solve inherited disorders.
Expert Rev Mol Diagn. 2010 Oct;10(7):833-6.
Pubmed

**steven** · 02-15-2011, 09:49 AM

More:

Huttenhower C, Hofmann O (2010)
A Quick Guide to Large-Scale Genomic Data Mining.
PLoS Comput Biol 6(5): e1000779. doi:10.1371/journal.pcbi.1000779

Zheng Z, Advani A, Melefors O, Glavas S, Nordström H, Ye W, Engstrand L, Andersson AF.
Titration-free massively parallel pyrosequencing using trace amounts of starting material. Nucleic Acids Res. 2010 Jul;38(13):e137. Epub 2010 Apr 30. PubMed PMID: 20435675

(added to the wiki )

**steven** · 08-01-2011, 10:41 AM

Wow.. according to Google scholar, there are 17+ papers or so that are not yet listed in the wiki table!

I don't have time to update the wiki yet, but here are the links:

http://www.springerlink.com/content/m946150214752254/#section=826553&page=3&locus=59

ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence - BMC Genomics

http://www.biomedcentral.com/1471-2164/12/285/abstract

Background The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. Results The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. Conclusions ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf.comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin .

Just a moment...

http://onlinelibrary.wiley.com/doi/10.1002/9780470889909.ch22/summary

http://www.nature.com/nmeth/journal/v8/n7/box/nmeth.1631_BX1.html

Bioinformatics for Next Generation Sequencing Data

http://www.mdpi.com/2073-4425/1/2/294/

The emergence of next-generation sequencing (NGS) platforms imposes increasing demands on statistical methods and bioinformatic tools for the analysis and the management of the huge amounts of data generated by these technologies. Even at the early stages of their commercial availability, a large number of softwares already exist for analyzing NGS data. These tools can be fit into many general categories including alignment of sequence reads to a reference, base-calling and/or polymorphism detection, de novo assembly from paired or unpaired reads, structural variant detection and genome browsing. This manuscript aims to guide readers in the choice of the available computational tools that can be used to face the several steps of the data analysis workflow.

404 Not Found

http://www.genominfo.org/html/UploadFile/article8_201006.pdf

Just a moment...

http://onlinelibrary.wiley.com/doi/10.1111/j.1438-8677.2010.00373.x/full

Large Scale Loss of Data in Low-Diversity Illumina Sequencing Libraries Can Be Recovered by Deferred Cluster Calling

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0016607

Massively parallel DNA sequencing is capable of sequencing tens of millions of DNA fragments at the same time. However, sequence bias in the initial cycles, which are used to determine the coordinates of individual clusters, causes a loss of fidelity in cluster identification on Illumina Genome Analysers. This can result in a significant reduction in the numbers of clusters that can be analysed. Such low sample diversity is an intrinsic problem of sequencing libraries that are generated by restriction enzyme digestion, such as e4C-seq or reduced-representation libraries. Similarly, this problem can also arise through the combined sequencing of barcoded, multiplexed libraries. We describe a procedure to defer the mapping of cluster coordinates until low-diversity sequences have been passed. This simple procedure can recover substantial amounts of next generation sequencing data that would otherwise be lost.

Addressing challenges in the production and analysis of illumina sequencing data - BMC Genomics

http://www.biomedcentral.com/1471-2164/12/382/abstract

Advances in DNA sequencing technologies have made it possible to generate large amounts of sequence data very rapidly and at substantially lower cost than capillary sequencing. These new technologies have specific characteristics and limitations that require either consideration during project design, or which must be addressed during data analysis. Specialist skills, both at the laboratory and the computational stages of project design and analysis, are crucial to the generation of high quality data from these new platforms. The Illumina sequencers (including the Genome Analyzers I/II/IIe/IIx and the new HiScan and HiSeq) represent a widely used platform providing parallel readout of several hundred million immobilized sequences using fluorescent-dye reversible-terminator chemistry. Sequencing library quality, sample handling, instrument settings and sequencing chemistry have a strong impact on sequencing run quality. The presence of adapter chimeras and adapter sequences at the end of short-insert molecules, as well as increased error rates and short read lengths complicate many computational analyses. We discuss here some of the factors that influence the frequency and severity of these problems and provide solutions for circumventing these. Further, we present a set of general principles for good analysis practice that enable problems with sequencing runs to be identified and dealt with.

Application Unavailable | Springer Nature

http://genomebiology.com/2010/11/12/220/?mkt=260328&amp

Rapid Screening of Complex DNA Samples by Single-Molecule Amplification and Sequencing

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0019723

Microbial cloning makes Sanger sequencing of complex DNA samples possible but is labor intensive. We present a simple, rapid and robust method that enables laboratories without special equipment to perform single-molecule amplicon sequencing, although in a low-throughput manner, from sub-picogram quantities of DNA. The method can also be used for quick quality control of next-generation sequencing libraries, as was demonstrated for a metagenomic sample.

SeqGene: a comprehensive software solution for mining exome- and transcriptome- sequencing data - BMC Bioinformatics

http://www.biomedcentral.com/1471-2105/12/267/abstract/

Background The popularity of massively parallel exome and transcriptome sequencing projects demands new data mining tools with a comprehensive set of features to support a wide range of analysis tasks. Results SeqGene, a new data mining tool, supports mutation detection and annotation, dbSNP and 1000 Genome data integration, RNA-Seq expression quantification, mutation and coverage visualization, allele specific expression (ASE), differentially expressed genes (DEGs) identification, copy number variation (CNV) analysis, and gene expression quantitative trait loci (eQTLs) detection. We also developed novel methods for testing the association between SNP and expression and identifying genotype-controlled DEGs. We showed that the results generated from SeqGene compares favourably to other existing methods in our case studies. Conclusion SeqGene is designed as a general-purpose software package. It supports both paired-end reads and single reads generated on most sequencing platforms; it runs on all major types of computers; it supports arbitrary genome assemblies for arbitrary organisms; and it scales well to support both large and small scale sequencing projects. The software homepage is http://seqgene.sourceforge.net .

Just a moment...

http://onlinelibrary.wiley.com/doi/10.1111/j.1755-0998.2011.03024.x/full

Just a moment...

http://onlinelibrary.wiley.com/doi/10.1111/j.1440-1843.2010.01899.x/full

http://www.nature.com/hdy/journal/v107/n1/full/hdy2010152a.html

Conserved generation of short products at piRNA loci - BMC Genomics

http://www.biomedcentral.com/1471-2164/12/46/

Background The piRNA pathway operates in animal germ lines to ensure genome integrity through retrotransposon silencing. The Piwi protein-associated small RNAs (piRNAs) guide Piwi proteins to retrotransposon transcripts, which are degraded and thereby post-transcriptionally silenced through a ping-pong amplification process. Cleavage of the retrotransposon transcript defines at the same time the 5' end of a secondary piRNA that will in turn guide a Piwi protein to a primary piRNA precursor, thereby amplifying primary piRNAs. Although several studies provided evidence that this mechanism is conserved among metazoa, how the process is initiated and what enzymatic activities are responsible for generating the primary and secondary piRNAs are not entirely clear. Results Here we analyzed small RNAs from three mammalian species, seeking to gain further insight into the mechanisms responsible for the piRNA amplification loop. We found that in all these species piRNA-directed targeting is accompanied by the generation of short sequences that have a very precisely defined length, 19 nucleotides, and a specific spatial relationship with the guide piRNAs. Conclusions This suggests that the processing of the 5' product of piRNA-guided cleavage occurs while the piRNA target is engaged by the Piwi protein. Although they are not stabilized through methylation of their 3' ends, the 19-mers are abundant not only in testes lysates but also in immunoprecipitates of Miwi and Mili proteins. They will enable more accurate identification of piRNA loci in deep sequencing data sets.

I only looked for papers since 2010.

**steven** · 08-01-2011, 10:52 AM

By the way, Google Scholar features a nice Alert system.. what about setting up another Newsbot to get an automatic post each time a paper cites "SEQanswers"?

**marcowanger** · 08-01-2011, 11:04 AM

Some more

http://www.nature.com/nmeth/journal/v8/n7/full/nmeth.1631.html

http://www.springerlink.com/index/M946150214752254.pdf

Addressing challenges in the production and analysis of illumina sequencing data - BMC Genomics

http://www.biomedcentral.com/1471-2164/12/382/abstract

Advances in DNA sequencing technologies have made it possible to generate large amounts of sequence data very rapidly and at substantially lower cost than capillary sequencing. These new technologies have specific characteristics and limitations that require either consideration during project design, or which must be addressed during data analysis. Specialist skills, both at the laboratory and the computational stages of project design and analysis, are crucial to the generation of high quality data from these new platforms. The Illumina sequencers (including the Genome Analyzers I/II/IIe/IIx and the new HiScan and HiSeq) represent a widely used platform providing parallel readout of several hundred million immobilized sequences using fluorescent-dye reversible-terminator chemistry. Sequencing library quality, sample handling, instrument settings and sequencing chemistry have a strong impact on sequencing run quality. The presence of adapter chimeras and adapter sequences at the end of short-insert molecules, as well as increased error rates and short read lengths complicate many computational analyses. We discuss here some of the factors that influence the frequency and severity of these problems and provide solutions for circumventing these. Further, we present a set of general principles for good analysis practice that enable problems with sequencing runs to be identified and dealt with.

Conserved generation of short products at piRNA loci - BMC Genomics

http://www.biomedcentral.com/1471-2164/12/46/

Background The piRNA pathway operates in animal germ lines to ensure genome integrity through retrotransposon silencing. The Piwi protein-associated small RNAs (piRNAs) guide Piwi proteins to retrotransposon transcripts, which are degraded and thereby post-transcriptionally silenced through a ping-pong amplification process. Cleavage of the retrotransposon transcript defines at the same time the 5' end of a secondary piRNA that will in turn guide a Piwi protein to a primary piRNA precursor, thereby amplifying primary piRNAs. Although several studies provided evidence that this mechanism is conserved among metazoa, how the process is initiated and what enzymatic activities are responsible for generating the primary and secondary piRNAs are not entirely clear. Results Here we analyzed small RNAs from three mammalian species, seeking to gain further insight into the mechanisms responsible for the piRNA amplification loop. We found that in all these species piRNA-directed targeting is accompanied by the generation of short sequences that have a very precisely defined length, 19 nucleotides, and a specific spatial relationship with the guide piRNAs. Conclusions This suggests that the processing of the 5' product of piRNA-guided cleavage occurs while the piRNA target is engaged by the Piwi protein. Although they are not stabilized through methylation of their 3' ends, the 19-mers are abundant not only in testes lysates but also in immunoprecipitates of Miwi and Mili proteins. They will enable more accurate identification of piRNA loci in deep sequencing data sets.

http://www.akademiai.com/index/K76G173700718605.pdf

**marcowanger** · 08-01-2011, 11:07 AM

Originally posted by steven View Post

By the way, Google Scholar features a nice Alert system.. what about setting up another Newsbot to get an automatic post each time a paper cites "SEQanswers"?

It seems it only works for person

Google Scholar Citations

http://scholar.google.com/citations?view_op=new_profile

Google Scholar Citations lets you track citations to your publications over time.

**ECO** · 08-01-2011, 12:23 PM

Newsbot runs on RSS feeds, which Google (lamely) doesn't provide for Scholar Alerts...but there are several workarounds I will try to get going...

Great list guys, thanks for finding all these!

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 25 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Mentions of SEQanswers in the Literature

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News