Seqanswers Leaderboard Ad

**maubp** · 03-15-2010, 09:34 AM

Originally posted by maasha View Post

The advantage of the Biopieces is that a user can easily solve simple and complex tasks without having any programming experience.

Instead they need to know shell commands and piping?

On a serious note, you have a read_fastq for Sanger FASTQ files, and a read_solexa for Solexa FASTQ file, but no sign of a read_illumina for Illumina 1.3+ FASTQ files. See:

FASTQ format - Wikipedia

http://en.wikipedia.org/wiki/FASTQ_format

http://dx.doi.org/10.1093/nar/gkp1137

**maasha** · 03-15-2010, 11:17 AM

@maubp

A bit of UNIX knowledge can be acquired with a 20 minutes primer - enough to use Biopieces.

Also, I shall add a read_illumina Biopiece (I'll do that tomorrow).

Thanks for the heads-up.

**ohofmann** · 03-15-2010, 12:21 PM

This looks terrific. Add in BEDTools and a few basic shell scripts and quite a lot additional glue can be dropped from workflows.

**KevinLam** · 03-16-2010, 12:01 AM

Neat!
Curious question though.
How do you run bwa and bowtie without the binaries?

they are not listed in external tools here
http://code.google.com/p/biopieces/wiki/Installation

**maasha** · 03-16-2010, 12:10 AM

@KevinLam

Several of the Biopieces are simple wrappers around the binaries, such as BWA and Bowtie. Those Biopieces that have prerequisites have it stated in the "usage" information. E.g. http://code.google.com/p/biopieces/wiki/bowtie_seq

**jmw86069** · 03-16-2010, 06:48 AM

I've been using Biopieces off and on for about a year now, and just wanted to say thank you to Martin, it's fantastic! The recent additions of wrappers around other tools has been great. I have been using it more and more, alongside things like samtools, BEDtools, MUMmer, and the Kent source tree. There are some extremely useful utilities in there that just need a little massaging of data format. (As always.)

I may need to re-read your FAQ about contributing, because I had a couple humble suggestions, if I can be so bold... :-)

A write_sam or write_bam script would be great. I wrote a little wrapper script psl2bam.pl which converts BLAT results into a sequence-containing Bamfile for seeing the actual alignments. I suspect the read_fasta, read_psl, merge_records, write_tab could be used to do something clone, then just needs to call samtools to sort it and make it "Bam!"

What would you think about something to interface with the "UCSC table browser" downloads, so we could download a GTF or Bed file from commandline and manipulate in biopieces. E.g. feature intersects, merge annotations, sequence extraction, etc. I could contribute something here if you or others thought it'd be useful. (I saw your FAQ.)

I saw the BGB tools and a snapshot of it on Flickr. I'll trust your judgement that you need it. :-) But I liked your blog post about wanting more of these types of tools to talk freely with each other. Would you consider interfacing with GBrowse 2? Using Bamfiles, I find it very rapid to go from analysis to visualization (and back.) It seems that the biopieces framework could benefit those people using GBrowse quite a bit if there were a couple more hooks.

**maubp** · 03-16-2010, 07:09 AM

Originally posted by maasha View Post

Also, I shall add a read_illumina Biopiece (I'll do that tomorrow).

I see you have just recently updated the documentation. It looks like read_solexa now expects Illumina 1.3+ FASTQ files (with PHRED scores), and you don't support old Solexa 1.0 to Illumina 1.2 FASTQ files (with Solexa scores). Maybe I'm confused... but I fear you've just complicated things more.

**maasha** · 03-16-2010, 07:31 AM

@jmw86069

Thanks for the kind words. I normally don't hear much from users, and therefore I simply develop Biopieces according to my own needs. I am willing to write new Biopieces if they will be of general use. And of cause I am also open to suggestions for improving existing Biopieces. If anyone wants to contribute code, they are welcome to do so.

Now, a genome browser is a must for any genomic researcher! I have been working a fair bit with the UCSC genome browser, and a couple of Biopieces exists for uploading and downloading tracks, and manipulating the configuration on a local UCSC installation. However, I am now working with prokaryotes, and for that the UCSC genome browser is a bit of an overkill. So I guess, I will not be writing Biopieces for the UCSC genome anytime soon, since I need a working system to test stuff on. The Biopieces Genome Browser (BGB) was meant as a temporary system until Jbrowse matures. Jbrowse is going to be awesome (!!!), but it is rather nasty to install new genomes and custom tracks, and at the same time keeping track of permissions on genomes and tracks. The same goes for Gbrowse2.

Now, a request for write_sam/write_bam is a bit tricky. I must admit, that I don't use any tools that take these formats as input (I am probably missing out on important stuff). Also, I am mildly annoying by the SAM format. There is a rant here:

Google Code Archive - Long-term storage for Google Code Project Hosting.

http://code.google.com/p/biopieces/wiki/KissFormat#Background

But perhaps with a bit of assistance I could get something up and running.

**maasha** · 03-16-2010, 07:45 AM

@maubp

I had a brief look at read_fastq and read_solexa along with the links you send me, and I got confused - yet again - over this pesky matter with quality scores and Phred/Sanger and Solexa and Illumina

P. As far as I have understood, the scores stored as char strings have been calculated differently, however, converting the char score to a decimal is simply a matter of adjust with 33 or 64 integer-wise, for Phred/Sanger, Solexa/Illumina respectively. So read_solexa (specially using the -c switch) should work equally well with any version of the Illumina pipeline. A read_illumina Biopiece should then be a copy of read_solexa. I may indeed have complicated things more

)

**maubp** · 03-16-2010, 08:10 AM

Originally posted by maasha View Post

@maubp

I had a brief look at read_fastq and read_solexa along with the links you send me, and I got confused - yet again - over this pesky matter with quality scores and Phred/Sanger and Solexa and Illumina

P. As far as I have understood, the scores stored as char strings have been calculated differently, however, converting the char score to a decimal is simply a matter of adjust with 33 or 64 integer-wise, for Phred/Sanger, Solexa/Illumina respectively. So read_solexa (specially using the -c switch) should work equally well with any version of the Illumina pipeline. A read_illumina Biopiece should then be a copy of read_solexa. I may indeed have complicated things more

)

There are (at least) THREE different FASTQ formats (see http://en.wikipedia.org/wiki/FASTQ_format or for more detail http://dx.doi.org/10.1093/nar/gkp1137). In summary:

Sanger FASTQ - encodes PHRED scores (at most 0 to 93), offset 33
Solexa FASTQ (and early Illumina) - encodes Solexa scores (at most -5 to 62), offset 64
Illumina 1.3 (or later) FASTQ - encodes PHRED scores (at most 0 to 62), offset 64

It would be consistent with BioPerl, Biopython, EMBOSS, etc to call these the Sanger, Solexa and Illumina (1.3+) variants of FASTQ.

**maubp** · 03-16-2010, 08:14 AM

Originally posted by maasha View Post

So read_solexa (specially using the -c switch) should work equally well with any version of the Illumina pipeline.

The documentation for the -c switch says "Convert octal scores to decimal scores". What are octal scores? Do you mean the ASCII encoded representation (which are not base eight, i.e. not octal)?

**maasha** · 03-16-2010, 08:18 AM

You are right. I shall clean up the docs a bit.

**bioinfosm** · 03-16-2010, 11:05 AM

Originally posted by ohofmann View Post

This looks terrific. Add in BEDTools and a few basic shell scripts and quite a lot additional glue can be dropped from workflows.

what an ideal world! :P

**ohofmann** · 03-16-2010, 02:04 PM

Originally posted by bioinfosm View Post

what an ideal world! :P

What can I say, after ten years of cobbling together parsers I'm not asking for much anymore

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Biopieces - bioinformatic Swiss army knife

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News