Seqanswers Leaderboard Ad

**maubp** · 12-30-2010, 04:45 PM

I'd do this with the FASTQ files (easier than filtering the FASTA and QUAL files and keeping them synchronized).

I'm familiar with 5' MID tags, but I don't quite know what you mean by 3' MID tags.

Have you checked if your reads have had the Roche quality trimming applied or not?

I would take the MIDs, compute all variants with one (or maybe two) base changes, and look for sequences which start with them (i.e. start with a desired 5' MID). Personally I'd use Biopython for this.

However, by far the easiest way would be to do this with the raw SFF file and the Roche off instrument application tools which include MID handling. Always ask for the SFF file if doing 454 analysis - it gives you far more options.

**ewilbanks** · 12-30-2010, 05:17 PM

Originally posted by maubp View Post

I'm familiar with 5' MID tags, but I don't quite know what you mean by 3' MID tags.

This is the info that I had on the MIDs used in our run... I assumed it meant there was one at 5' and another at 3'. Is that what these are?

Code:

RLMIDs
{
 mid = "RL1", "ACACGACGACT", 2, "AGTCGTGGTGT";
 mid = "RL2", "ACACGTAGTAT", 2, "ATACTAGGTGT";
 mid = "RL3", "ACACTACTCGT", 2, "ACGAGTGGTGT";
 mid = "RL4", "ACGACACGTAT", 2, "ATACGTGGCGT";
 mid = "RL5", "ACGAGTAGACT", 2, "AGTCTACGCGT";
 mid = "RL6", "ACGCGTCTAGT", 2, "ACTAGAGGCGT";
 mid = "RL7", "ACGTACACACT", 2, "AGTGTGTGCGT";
 mid = "RL8", "ACGTACTGTGT", 2, "ACACAGTGCGT";
 mid = "RL9", "ACGTAGATCGT", 2, "ACGATCTGCGT";
 mid = "RL10", "ACTACGTCTCT", 2, "AGAGACGGAGT";
 mid = "RL11", "ACTATACGAGT", 2, "ACTCGTAGAGT";
 mid = "RL12", "ACTCGCGTCGT", 2, "ACGACGGGAGT";
}

Originally posted by maubp View Post

However, by far the easiest way would be to do this with the raw SFF file and the Roche off instrument application tools which include MID handling.

I'd love to-- but this sequencing was done a while ago and somehow that file has gone a-stray

I also found this other thread where folks have been discussing this...
http://seqanswers.com/forums/showthr...highlight=mids

**maubp** · 12-30-2010, 06:28 PM

Originally posted by ewilbanks View Post

This is the info that I had on the MIDs used in our run... I assumed it meant there was one at 5' and another at 3'. Is that what these are?

I guess so, in which case RL is probably short for rapid library preparation method. We've only ever used 5' MID tags so I can't give you any first hand advice, but the thread you mention looks useful.

Originally posted by ewilbanks View Post

I'd love to-- but this sequencing was done a while ago and somehow that file has gone a-stray

That's a shame.

Originally posted by ewilbanks View Post

I also found this other thread where folks have been discussing this...
http://seqanswers.com/forums/showthr...highlight=mids

Post #7 by kmcarr looks particularly helpful.

**ewilbanks** · 12-30-2010, 06:32 PM

ah RL = rapid library! Thanks!!

Yeah, I'm just sorting on the 5' MID (fastx toolkit) and then I'll trim out any 3's hanging around. Thanks again for your help!

**prisnirath** · 05-11-2011, 08:24 AM

help!!

hi..
I am very new to NGS data analysis.
I am trying to sort my fastq files based on MID tags and I am trying to do that using FASTX_BARCODE_SPLITTER.... but then it generates txt filed wdout any content in it. and the unmatched folder gets all the fastq contents copied into it.
Earlier I was successful while i tried sorting only the fasta files....but thi time with fastq its showing some issues...
any suggestions how to get it worked right?

**maubp** · 05-11-2011, 08:26 AM

If you have the SFF files, I'd use them with the Roche tools to split on the MID barcodes.

**prisnirath** · 05-11-2011, 08:49 AM

hi there..
I finally managed to get my fastq files MID sorted and also MID trmmed .. i used fastx_barcode_splitter (for sorting) and fastx_trimmer (for removing the MID tags). But had to do some prior manipulations to my fastq files.
like converting all the lower cases to upper cases and removing the 'tcag' primer from before the beginning of the lines having the MID tags!
Thanks anyways!!

**prisnirath** · 05-26-2011, 03:37 AM

help in undertsanting using sfffile prgram over command line

I am using sfffile program on command line to sort my sff files by MIDs and remove MIDs. I think I am going wrong somewhere.
sfffile -o roche454_new.sff -e mid.lst -nmft sff/roche454.sff

Also, I am a little confused about using options (-s) and (-i).
Can anyone please suggest how to do that??

**sklages** · 05-26-2011, 04:02 AM

Originally posted by prisnirath View Post

I am using sfffile program on command line to sort my sff files by MIDs and remove MIDs. I think I am going wrong somewhere.
sfffile -o roche454_new.sff -e mid.lst -nmft sff/roche454.sff

Also, I am a little confused about using options (-s) and (-i).
Can anyone please suggest how to do that??

If you just want to split your sff files according to some standard sets of MIDs mentioned in you system-mid-file (MIDConfig.parse) you just want to use:

Code:

sfffile -s RLMIDs MY_SFF_FILE.sff

If you have a custom MID file with a MID group named "SPC_MIDs"

Code:

sfffile -s SPC_MIDs -mcf MyMIDfile.parse MY_SFF_FILE.sff

The '-i'/'-e' is just for including/excluding certain reads (acc).

hth,
Sven

**prisnirath** · 05-26-2011, 04:10 AM

Thank you!!
I understand it now!
But still a little confused...
I have got my MID files in CSV format and I have converted this file into txt, tab delimited and fasta file.
Which format should I be using here?

**sklages** · 05-26-2011, 04:20 AM

Originally posted by prisnirath View Post

Thank you!!
I understand it now!
But still a little confused...
I have got my MID files in CSV format and I have converted this file into txt, tab delimited and fasta file.
Which format should I be using here?

Now, I am confused :-)

You should have your data in SFF files, your MIDs in roche conform "parse" format, e.g.

Code:

CUSTOM_MULTIPLEX
{
    mid = "MID4000", "ACACGT", 0;
    mid = "MID4001", "ACGTAC", 0;
    mid = "MID4002", "ACTGCA", 0;
    mid = "MID4003", "AGAGTC", 0;
}

where '0' stands for the allowed number of mismatches for a MID to be still valid.

If you use "Rapid Libraries" you might want to check 3' ends as well,

Code:

RLMIDs
{
    mid = "RL1",   "ACACGACGACT", 1, "AGTCGTGGTGT";
    mid = "RL2",   "ACACGTAGTAT", 1, "ATACTAGGTGT";
    mid = "RL3",   "ACACTACTCGT", 1, "ACGAGTGGTGT";
    mid = "RL4",   "ACGACACGTAT", 1, "ATACGTGGCGT";
}

Again, the number stands for allowed mismatches in MID recognition.
The second sequence in this format has no influence on splitting, it just gets trimmed (if found). Splitting is exclusively done on MIDs present at the 5' end.

hth,
Sven

**prisnirath** · 05-26-2011, 04:27 AM

i have got my SFF files...true!!
I got a MID file in csv format.
And I have parsed it to a tab-delimited file.
My question is while using ::
sfffile -s SPC_MIDs -mcf MyMIDfile.parse MY_SFF_FILE.sff

MyMIDfile.parse :: MID file (right??)

...what file format shoud I be using for parsing it to its acceptable format?

I took suggestions from the thread http://seqanswers.com/forums/showthread.php?t=10825
and I am getting error!!
sfffile -s Y -mcf file2.txt -o reg1 GGDP4G001.sff >MIDyieldR1.txt
Error: Invalid file format 2: file2.txt

**prisnirath** · 05-26-2011, 04:28 AM

ACGAGTGCGTGTAGCGCGACGGCCAGT
ACGAGTGCGTCAGGGCGCAGCGATGAC
ACGCTCGACAGTAGCGCGACGGCCAGT
ACGCTCGACACAGGGCGCAGCGATGAC
AGACGCACTCGTAGCGCGACGGCCAGT
AGACGCACTCCAGGGCGCAGCGATGAC
AGCACTGTAGGTAGCGCGACGGCCAGT
AGCACTGTAGCAGGGCGCAGCGATGAC
;
;
;

this is the format of my txt MID file

**sklages** · 05-26-2011, 04:30 AM

Originally posted by prisnirath View Post

i have got my SFF files...true!!
I got a MID file in csv format.
And I have parsed it to a tab-delimited file.
My question is while using ::
sfffile -s SPC_MIDs -mcf MyMIDfile.parse MY_SFF_FILE.sff

MyMIDfile.parse :: MID file (right??)

...what file format shoud I be using for parsing it to its acceptable format?

I took suggestions from the thread http://seqanswers.com/forums/showthread.php?t=10825
and I am getting error!!
sfffile -s Y -mcf file2.txt -o reg1 GGDP4G001.sff >MIDyieldR1.txt
Error: Invalid file format 2: file2.txt

Have you read my post? I have described the format you should use for sfffile to split SFFs according to their MIDs ...

Just another ... the output of sfffile is a new SFF; no need to redirect (to a text file) ..

hth,
Sven

Topics	Statistics	Last Post
Study Reveals How Bacteria Defend Against Viral Attacks by seqadmin Started by seqadmin, 08-27-2024, 04:40 AM	0 responses 16 views 0 likes	Last Post by seqadmin 08-27-2024, 04:40 AM
New Single-Molecule Sequencing Platform Introduces Advanced Features for High-Throughput Genomics by seqadmin Started by seqadmin, 08-22-2024, 05:00 AM	0 responses 293 views 0 likes	Last Post by seqadmin 08-22-2024, 05:00 AM
New DNA Code Discovered Revealing Complex Gene Regulation Mechanisms by seqadmin Started by seqadmin, 08-21-2024, 10:49 AM	0 responses 135 views 0 likes	Last Post by seqadmin 08-21-2024, 10:49 AM
Epigenetic Clocks Derived from Retroelements Offer New Insights into Aging by seqadmin Started by seqadmin, 08-19-2024, 05:12 AM	0 responses 124 views 0 likes	Last Post by seqadmin 08-19-2024, 05:12 AM

Seqanswers Leaderboard Ad

Announcement

Sorting and removing MIDs from fastq file (Roche 454)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News