Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to sort with sfffile/sffinfo...

    Hey all!
    I've got an unusual request. I want to be able to access just the reads that do not match any barcodes. I can easily sort reads into .sffs based on barcodes, but I want to look at the reads that don't match any barcodes. Is there a way to do that?
    Thank you!!

  • #2
    You could split your .sff file by MID, then use sffinfo with -a to output just the accession numbers from each of the resulting .sff files and the original .sff file. Then use the lists from the split files to subtract from the list from the original file. There are probably several ways to do that, but a simple one would be to just put all of the lists into one column in Excel and use the "remove duplicates" function. Finally, use that list with the -i option of sfffile to get a .sff file with just those.

    Now that I think about it, you could just combine all of the accession number lists from the split files and use that list with the -e option of sfffile. That's a bit simpler.

    Comment


    • #3
      Thank you! I'll give that a try.

      Comment


      • #4
        Hi Anthony, I did this a year or so ago, and the Fastx toolkit can do what you want. The barcode splitter in that toolkit puts the unmatched reads into a separate file. I have in my notes that the key was figuring out that the 454 .fna file isn’t really fasta format. You may have to use "fasta formatter" first. No guarantee, but I hope this helps.

        Comment


        • #5
          Have you realized that a read in a dataset seemingly lacking a MID in a datasets of just *all* MID-tagged reads means you or the software failed to identify the MID? ;-) At least you should throw away the first 4+10 or 4+11 nt from your reads (provided you speak about reads on the left end, right after the sequencing key).

          Did you use RapidLibrary protocol or one of those with GSMIDs/TiMIDs? If Rapid, then mask also the rcRLMIDs somewhere on the right side of each read. Or was that a multiplexing setup with different TiMIDs/GSMIDs on each side? Then treat it same way like when RapidLib was involved.

          Finally to say, if you sequenced beads with some MID-tagged samples and some without MID tags then the above still does apply. Rather throw away the nucleotides where a MID, possibly masked due to sequencing errors, might reside. And next time separate the samples into different regions. ;-)

          In brief, just don't do this next time, that is my best advice, really.

          Comment


          • #6
            I actually ran across what is probably the simplest way to do this a couple of days ago. This option is not in the documentation for some reason, but you can get it if you call up the sfffile help at the command line. If you do that, you will see the -umid option, which allows you to specify a name under which to report all reads that don't match any of the specified MIDs. So, forget all that list stuff, just add in -umid <name> to the options list when you use sfffile and it's done.

            Comment


            • #7
              Number6, I'll look into that. That could be helpful.

              Martin2, I think you misunderstood the question. I want to be able to look at reads that don't sort to any barcode. It doesn't matter how they were prepared. And, no, I did not mix MID-tagged with non-tagged samples. I can't imagine how that would be useful...is that what you were referring to in that last line?

              aj, you're using a newer software version than I, I'm afraid...that option doesn't seem to be available in 2.5.3. Someday we'll get upgraded.

              Comment


              • #8
                Ah, yes. I do have 2.8. I don't know when this option showed up, but there have been several versions between 2.5.3 and 2.8, so it's hard to say.

                You might want to talk to your FAS about getting a newer version. In my experience, it doesn't take any more than an email and they'll send you a link to download it. The latest versions, 2.7 (Jr.) and 2.8 (FLX), have some new options and features in the analysis programs that might be useful to you.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 12:08 PM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                14 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                43 views
                0 likes
                Last Post seqadmin  
                Working...
                X