Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to sort with sfffile/sffinfo...

    Hey all!
    I've got an unusual request. I want to be able to access just the reads that do not match any barcodes. I can easily sort reads into .sffs based on barcodes, but I want to look at the reads that don't match any barcodes. Is there a way to do that?
    Thank you!!

  • #2
    You could split your .sff file by MID, then use sffinfo with -a to output just the accession numbers from each of the resulting .sff files and the original .sff file. Then use the lists from the split files to subtract from the list from the original file. There are probably several ways to do that, but a simple one would be to just put all of the lists into one column in Excel and use the "remove duplicates" function. Finally, use that list with the -i option of sfffile to get a .sff file with just those.

    Now that I think about it, you could just combine all of the accession number lists from the split files and use that list with the -e option of sfffile. That's a bit simpler.

    Comment


    • #3
      Thank you! I'll give that a try.

      Comment


      • #4
        Hi Anthony, I did this a year or so ago, and the Fastx toolkit can do what you want. The barcode splitter in that toolkit puts the unmatched reads into a separate file. I have in my notes that the key was figuring out that the 454 .fna file isn’t really fasta format. You may have to use "fasta formatter" first. No guarantee, but I hope this helps.

        Comment


        • #5
          Have you realized that a read in a dataset seemingly lacking a MID in a datasets of just *all* MID-tagged reads means you or the software failed to identify the MID? ;-) At least you should throw away the first 4+10 or 4+11 nt from your reads (provided you speak about reads on the left end, right after the sequencing key).

          Did you use RapidLibrary protocol or one of those with GSMIDs/TiMIDs? If Rapid, then mask also the rcRLMIDs somewhere on the right side of each read. Or was that a multiplexing setup with different TiMIDs/GSMIDs on each side? Then treat it same way like when RapidLib was involved.

          Finally to say, if you sequenced beads with some MID-tagged samples and some without MID tags then the above still does apply. Rather throw away the nucleotides where a MID, possibly masked due to sequencing errors, might reside. And next time separate the samples into different regions. ;-)

          In brief, just don't do this next time, that is my best advice, really.

          Comment


          • #6
            I actually ran across what is probably the simplest way to do this a couple of days ago. This option is not in the documentation for some reason, but you can get it if you call up the sfffile help at the command line. If you do that, you will see the -umid option, which allows you to specify a name under which to report all reads that don't match any of the specified MIDs. So, forget all that list stuff, just add in -umid <name> to the options list when you use sfffile and it's done.

            Comment


            • #7
              Number6, I'll look into that. That could be helpful.

              Martin2, I think you misunderstood the question. I want to be able to look at reads that don't sort to any barcode. It doesn't matter how they were prepared. And, no, I did not mix MID-tagged with non-tagged samples. I can't imagine how that would be useful...is that what you were referring to in that last line?

              aj, you're using a newer software version than I, I'm afraid...that option doesn't seem to be available in 2.5.3. Someday we'll get upgraded.

              Comment


              • #8
                Ah, yes. I do have 2.8. I don't know when this option showed up, but there have been several versions between 2.5.3 and 2.8, so it's hard to say.

                You might want to talk to your FAS about getting a newer version. In my experience, it doesn't take any more than an email and they'll send you a link to download it. The latest versions, 2.7 (Jr.) and 2.8 (FLX), have some new options and features in the analysis programs that might be useful to you.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  05-06-2024, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 05-14-2024, 07:03 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-10-2024, 06:35 AM
                0 responses
                43 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-09-2024, 02:46 PM
                0 responses
                53 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-07-2024, 06:57 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Working...
                X