Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • aligenie
    Member
    • Feb 2011
    • 13

    454 MIDs

    Hi everyone. I've searched everywhere but haven't quite found the solution to my problem so bare with me if I'm asking a simple question.

    I just got back from 454 data that we generated using the Fluidigm access array. I am having difficulties parsing out the MIDs. I've converted the .sff files to fasta files and tried parsing out the MIDs with fastx tools. For some reason, the barcode splitter option is not working. I'm sure my syntax is correct so I think something is wrong with my fasta file. I know the keytags and adapters are still on my sequences. Is this the problem? I'm not sure what I should do. We are not big fans of Roche's software especially AVA so we are trying to find a different solution. Also, have people been successful using novoalign for mapping 454 data?


    Any information is greatly appreciated!!!!! Thank you!

    Ali
  • JackieBadger
    Senior Member
    • Mar 2009
    • 385

    #2
    Originally posted by aligenie View Post
    Hi everyone. I've searched everywhere but haven't quite found the solution to my problem so bare with me if I'm asking a simple question.

    I just got back from 454 data that we generated using the Fluidigm access array. I am having difficulties parsing out the MIDs. I've converted the .sff files to fasta files and tried parsing out the MIDs with fastx tools. For some reason, the barcode splitter option is not working. I'm sure my syntax is correct so I think something is wrong with my fasta file. I know the keytags and adapters are still on my sequences. Is this the problem? I'm not sure what I should do. We are not big fans of Roche's software especially AVA so we are trying to find a different solution. Also, have people been successful using novoalign for mapping 454 data?


    Any information is greatly appreciated!!!!! Thank you!

    Ali

    Hey,

    Yep unless you have code to do this it isnt easy (what with you perhaps having a bunch of sorting parameters you may be interested in).

    Geneious, offers a free trial, is easy to use, and super cheap for students. You can do it there. http://www.geneious.com/


    I used jMHC, to parse mine as I found their parsing criteria particularly stringent (No Ns in primers or sequence, and 1bp = new allele) http://code.google.com/p/jmhc/

    I found that with jMHC parsing could take hours, despite running on a VERY powserful desktop PC. Ran it on my Mac laptop -wizzed through it in minutes!

    I attempted to get SESAME up and running, but it is a fiddly process and got tired of trouble shooting... http://bioinformatics.oxfordjournals...2/277.abstract

    Good luck!

    J

    Comment

    • sklages
      Senior Member
      • May 2008
      • 628

      #3
      Usually, when you get back data from a multiplexed run, the MIDs are already removed from the SFF file (better: offsets are shifted by the length of the MIDs used).
      So if you extract your reads with e.g. 'sffinfo' you get the "clipped" sequence (unless your are using the flag '-n'). Your fasta files do not contain the MIDs anymore.
      You should keep this in mind, also when using other tools for extraction.
      What tool have you been using for sff->fasta extraction?

      cheers,
      Sven

      Comment

      • JackieBadger
        Senior Member
        • Mar 2009
        • 385

        #4
        Originally posted by sklages View Post
        Usually, when you get back data from a multiplexed run, the MIDs are already removed from the SFF file (better: offsets are shifted by the length of the MIDs used).
        So if you extract your reads with e.g. 'sffinfo' you get the "clipped" sequence (unless your are using the flag '-n'). Your fasta files do not contain the MIDs anymore.
        You should keep this in mind, also when using other tools for extraction.
        What tool have you been using for sff->fasta extraction?

        cheers,
        Sven
        If a sequencing company removed the MIDs which I had attached to ID individuals, I wouldn't pay them. The whole point of MIDs is so they can be used to sort sequences.
        I have never had MIDs removed from my data, only 454 adapter sequences.

        You may as well parse the data using your FASTA and QUAL files. These will have the MIDs. If your sequences do not contain MIDs, you either didn't ligate them properly or the sequencing company shouldn't be paid. I highly doubt they would remove MIDs.

        Comment

        • sklages
          Senior Member
          • May 2008
          • 628

          #5
          Well, when I "remove MIDs", I do this usually by splitting a SFF file from either region of the 454 into individual SFF files (according to their MID); in this step, the MID is removed (== offset shifted in SFF file). That's a normal process when working with multiplexed data.

          No need to refuse paying ;-)

          cheers,
          Sven

          Comment

          • JackieBadger
            Senior Member
            • Mar 2009
            • 385

            #6
            Originally posted by sklages View Post
            Well, when I "remove MIDs", I do this usually by splitting a SFF file from either region of the 454 into individual SFF files (according to their MID); in this step, the MID is removed (== offset shifted in SFF file). That's a normal process when working with multiplexed data.

            No need to refuse paying ;-)

            cheers,
            Sven
            Right, but you remove the MIDs once you have sorted by them. I think this was the original question "how can i sort by barcodes?"

            Comment

            • sklages
              Senior Member
              • May 2008
              • 628

              #7
              Originally posted by JackieBadger View Post
              Right, but you remove the MIDs once you have sorted by them. I think this was the original question "how can i sort by barcodes?"
              Yes, that's what you usually do: split a run SFF file into individual SFF files according to their barcodes/MIDs. If you use the Roche tools for this task, then the offsets are getting shiftet. We usually send these SFF files to our customers, MID removed, files sorted.
              The OP didn't mention what kind of SFF he received .. individual ones? Whole region SFFs?

              cheers,
              Sven

              Comment

              • JackieBadger
                Senior Member
                • Mar 2009
                • 385

                #8
                Ahhh so you preprocess the MIDs for the customer?
                How nice of you! haha I'm sure most I know would charge $ for this.

                Anyway, the programs I mentioned are a great way for a non-code based approach.

                Cheers

                J

                Comment

                • aligenie
                  Member
                  • Feb 2011
                  • 13

                  #9
                  Originally posted by sklages View Post
                  Yes, that's what you usually do: split a run SFF file into individual SFF files according to their barcodes/MIDs. If you use the Roche tools for this task, then the offsets are getting shiftet. We usually send these SFF files to our customers, MID removed, files sorted.
                  The OP didn't mention what kind of SFF he received .. individual ones? Whole region SFFs?

                  cheers,
                  Sven
                  I wish our core did this!! LOL they definitely don't. I just received several .sff files but they are not parsed by MID or for anything else for that matter. I guess I will look into Roche tools for separating by MID and trimming. Sfffile can do something like this I think although I find the syntax very confusing. Is this what you use?

                  Comment

                  • JackieBadger
                    Senior Member
                    • Mar 2009
                    • 385

                    #10
                    Originally posted by aligenie View Post
                    I wish our core did this!! LOL they definitely don't. I just received several .sff files but they are not parsed by MID or for anything else for that matter. I guess I will look into Roche tools for separating by MID and trimming. Sfffile can do something like this I think although I find the syntax very confusing. Is this what you use?
                    jMHC and Geneious (links above) are super easy to use, with graphical interfaces.
                    You designate your primer, adapter length, and Bob's your uncle!

                    Comment

                    • sklages
                      Senior Member
                      • May 2008
                      • 628

                      #11
                      Originally posted by aligenie View Post
                      I wish our core did this!! LOL they definitely don't. I just received several .sff files but they are not parsed by MID or for anything else for that matter. I guess I will look into Roche tools for separating by MID and trimming. Sfffile can do something like this I think although I find the syntax very confusing. Is this what you use?
                      Yes, sfffile is very fast and reliable. I don't know the other tools mentioned, but the advantage of sfffile is, that it works on the SFF file and generates new SFF files. If you are done with MID clipping, then you extract the fasta sequences from the newly created SFF files (without MID, except when you use '-n' with sffinfo).

                      cheers,
                      Sven

                      Comment

                      • aligenie
                        Member
                        • Feb 2011
                        • 13

                        #12
                        Originally posted by sklages View Post
                        Yes, sfffile is very fast and reliable. I don't know the other tools mentioned, but the advantage of sfffile is, that it works on the SFF file and generates new SFF files. If you are done with MID clipping, then you extract the fasta sequences from the newly created SFF files (without MID, except when you use '-n' with sffinfo).

                        cheers,
                        Sven
                        Hi Sven, thanks for your help. Unfortunately I still cannot get sfffile to parse by MID. I used sfffile -mcf barcode.txt -s read.sff and I get errors. My barcode file looks like this
                        barcode
                        {
                        mid = "MID1", "ACGAGTGCGT", 2;
                        mid = "MID2", "ACGCTCGACA", 2;
                        mid = "MID3", "AGACGCACTC", 2;
                        mid = "MID5", "ATCAGACACG", 2;
                        mid = "MID6", "ATATCGCGAG", 2;
                        mid = "MID7", "CGTGTCTCTA", 2;
                        mid = "MID8", "CTCGCGTGTC", 2;
                        mid = "MID10", "TCTCTATGCG", 2;
                        mid = "MID11", "TGATACGTCT", 2;
                        mid = "MID13", "CATAGTAGTG", 2;
                        mid = "MID14", "CGAGAGATAC", 2;
                        mid = "MID15", "ATACGACGTA", 2;
                        mid = "MID16", "TCACGTACTA", 2;
                        mid = "MID17", "CGTCTAGTAC", 2;
                        mid = "MID18", "TCTACGTAGC", 2;
                        mid = "MID19", "TGTACTACTC", 2;
                        mid = "MID20", "ACGACTACAG", 2;
                        mid = "MID21", "CGTAGACTAG", 2;
                        mid = "MID22", "TACGAGTATG", 2;
                        mid = "MID23", "TACTCTCGTG", 2;
                        mid = "MID24", "TAGAGACGAG", 2;
                        mid = "MID25", "TCGTCGCTCG", 2;
                        mid = "MID26", "ACATACGCGT", 2;
                        mid = "MID27", "ACGCGAGTAT", 2;
                        mid = "MID28", "ACTACTATGT", 2;
                        mid = "MID68", "TCGCTGCGTA", 2;
                        mid = "MID30", "AGACTATACT", 2;
                        mid = "MID31", "AGCGTCGTCT", 2;
                        mid = "MID32", "AGTACGCTAT", 2;
                        mid = "MID33", "ATAGAGTACT", 2;
                        mid = "MID34", "CACGCTACGT", 2;
                        mid = "MID35", "CAGTAGACGT", 2;
                        mid = "MID36", "CGACGTGACT", 2;
                        mid = "MID37", "TACACACACT", 2;
                        mid = "MID38", "TACACGTGAT", 2;
                        mid = "MID39", "TACAGATCGT", 2;
                        mid = "MID40", "TACGCTGTCT", 2;
                        mid = "MID69", "TCTGACGTCA", 2;
                        mid = "MID42", "TCGATCACGT", 2;
                        mid = "MID43", "TCGCACTAGT", 2;
                        mid = "MID44", "TCTAGCGACT", 2;
                        mid = "MID45", "TCTATACTAT", 2;
                        mid = "MID46", "TGACGTATGT", 2;
                        mid = "MID47", "TGTGAGTAGT", 2;
                        mid = "MID48", "ACAGTATATA", 2;
                        mid = "MID49", "ACGCGATCGA", 2;
                        mid = "MID50", "ACTAGCAGTA", 2;
                        mid = "MID67", "TCGATAGTGA", 2;
                        }

                        Any idea with the -mcf function isn't working? Is my syntax wrong? sorry for all the questions but this is frustrating!!

                        I find geneious to be really slow....

                        Cheers

                        Comment

                        • sklages
                          Senior Member
                          • May 2008
                          • 628

                          #13
                          Originally posted by aligenie View Post
                          Hi Sven, thanks for your help. Unfortunately I still cannot get sfffile to parse by MID. I used sfffile -mcf barcode.txt -s read.sff and I get errors. My barcode file looks like this
                          barcode
                          {
                          mid = "MID1", "ACGAGTGCGT", 2;
                          [...] mid = "MID67", "TCGATAGTGA", 2;
                          }

                          Any idea with the -mcf function isn't working? Is my syntax wrong? sorry for all the questions but this is frustrating!!

                          I find geneious to be really slow....

                          Cheers
                          What error do you get? The syntax looks ok.
                          How did you call 'sfffile' (command line)?

                          And Geneious, that's my impression too, very nice looking but too slow for NGS.

                          cheers,
                          Sven

                          Comment

                          • zhengz
                            Member
                            • Aug 2010
                            • 24

                            #14
                            Hi aligenie,

                            Since you define the name for the set of barcodes as 'barcode', which is the line above {, would the following command work?

                            sfffile -mcf barcode.txt -s barcode read.sff


                            In my case, for my own customized adapters, I use a barcode file with example in the comment lines (change the x with your barcodes):

                            /* User-defined MID sets for the 8 Y adapters...

                            An example:

                            sfffile -s Y -mcf Yscheme.txt -o region1 NameOfYourSFFfile1.sff > MIDyieldR1.txt
                            sfffile -s Y -mcf Yscheme.txt -o region2 NameOfYourSFFfile2.sff > MIDyieldR2.txt

                            */
                            Y
                            {
                            mid = "Y3", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Y5", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Y8", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Y9", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Y10", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Y11", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Ya1", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            mid = "Ya2", "xxxxxxxxxx", 1, "xxxxxxxxxx";
                            }

                            Comment

                            • JackieBadger
                              Senior Member
                              • Mar 2009
                              • 385

                              #15
                              Originally posted by sklages View Post
                              What error do you get? The syntax looks ok.
                              How did you call 'sfffile' (command line)?

                              And Geneious, that's my impression too, very nice looking but too slow for NGS.

                              cheers,
                              Sven
                              Their latest release 5.4.1 is supposed to be designed for NGS, but yes I agree that loading and moving files around is SLOW, and can cause the program to hang!
                              jMHC operates on a much better level for barcodes.

                              Cheers
                              j

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...