Seqanswers Leaderboard Ad

**kmcarr** · 08-04-2012, 11:45 AM

Heidi,

Looking at your read data you can see that all of the reads start with the 4 bases "TCAG" which is known as the 'keytag'; this tells the software that the read was a library fragment (as opposed to a control fragment which are not in your data set). Following the keytag is the Multiplex ID (MID) sequence which corresponds to one of your 96 adapter sequences. "Adapter specific SFF files" means to parse the reads in your input file, identify their MID tag and sort them into new output SFF files according to their MID.

There are a handful of tools available to splitt SFF files by barcode but I recommend getting the Roche/454 software. It is available for free but you have to submit a request through their website. Specifically the program you want to use is called 'sfffile'. This tool can (among other things) read an SFF file and a MID configuration file and output a set of MID (adapter) specific SFF files. Judging by the names of your MID tags (IonExpress_nnn) it would appear that this data was generated on a Life Technologies Ion Torrent instrument, not a Roche/454 but no matter, the SFF format is the same and Roche's software should be able to split the reads. However since the MID tag set is not the default Roche/454 tag set you will need to create a custom MIDConfig.parse file for use with the sfffile program. There are instructions for doing this in the documentation which accompanies the software and you can use the default MIDConfig.parse file as a template.

Good luck.

**HeidiLee** · 08-04-2012, 03:16 PM

Thanks KMCarr for your explanation. I have a better understanding for SFF file now.
I am more familiar with R Bioconductor. So I am trying with a Bioconductor package.
I was able to read in the big SFF file as a big SFFContainer with the function readSFF in the Bioconductor package:R453Plus1Toolbox. I also fond the indexes to split the big SFFContainer. For example, the index (5,11,17,22,25,29,31,33,37) for the first small SFFContainer.
I couldn't figure out how to extract the (5,11,17,22,25,29,31,33,37)th reads from the big SSFContainer and construct a small SFFContainer and save it as an SFF file. Could anyone please help me with this part of work?
Thank you very much in advance.

Heidi

**HeidiLee** · 08-06-2012, 05:42 AM

Dear kmcarr,

I think I didn't pay much attention to your post last time is because it seemed like I didn't need to write any programs. The project I am working in is kind of exam to see my programming skills.
I used R to identify the reads which can be classified to a specific adapter. So for each adapter, I have a list of read names. I have total 377894 reads, but for most adapters, there are only tens of reads, some times even less.
Is this usual case, or do you think I probably made a mistake?

Thank you very much.

Heidi

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 26 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Split SFF file by Adaptors

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News