Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to sort with sfffile/sffinfo...

    Hey all!
    I've got an unusual request. I want to be able to access just the reads that do not match any barcodes. I can easily sort reads into .sffs based on barcodes, but I want to look at the reads that don't match any barcodes. Is there a way to do that?
    Thank you!!

  • #2
    You could split your .sff file by MID, then use sffinfo with -a to output just the accession numbers from each of the resulting .sff files and the original .sff file. Then use the lists from the split files to subtract from the list from the original file. There are probably several ways to do that, but a simple one would be to just put all of the lists into one column in Excel and use the "remove duplicates" function. Finally, use that list with the -i option of sfffile to get a .sff file with just those.

    Now that I think about it, you could just combine all of the accession number lists from the split files and use that list with the -e option of sfffile. That's a bit simpler.

    Comment


    • #3
      Thank you! I'll give that a try.

      Comment


      • #4
        Hi Anthony, I did this a year or so ago, and the Fastx toolkit can do what you want. The barcode splitter in that toolkit puts the unmatched reads into a separate file. I have in my notes that the key was figuring out that the 454 .fna file isn’t really fasta format. You may have to use "fasta formatter" first. No guarantee, but I hope this helps.

        Comment


        • #5
          Have you realized that a read in a dataset seemingly lacking a MID in a datasets of just *all* MID-tagged reads means you or the software failed to identify the MID? ;-) At least you should throw away the first 4+10 or 4+11 nt from your reads (provided you speak about reads on the left end, right after the sequencing key).

          Did you use RapidLibrary protocol or one of those with GSMIDs/TiMIDs? If Rapid, then mask also the rcRLMIDs somewhere on the right side of each read. Or was that a multiplexing setup with different TiMIDs/GSMIDs on each side? Then treat it same way like when RapidLib was involved.

          Finally to say, if you sequenced beads with some MID-tagged samples and some without MID tags then the above still does apply. Rather throw away the nucleotides where a MID, possibly masked due to sequencing errors, might reside. And next time separate the samples into different regions. ;-)

          In brief, just don't do this next time, that is my best advice, really.

          Comment


          • #6
            I actually ran across what is probably the simplest way to do this a couple of days ago. This option is not in the documentation for some reason, but you can get it if you call up the sfffile help at the command line. If you do that, you will see the -umid option, which allows you to specify a name under which to report all reads that don't match any of the specified MIDs. So, forget all that list stuff, just add in -umid <name> to the options list when you use sfffile and it's done.

            Comment


            • #7
              Number6, I'll look into that. That could be helpful.

              Martin2, I think you misunderstood the question. I want to be able to look at reads that don't sort to any barcode. It doesn't matter how they were prepared. And, no, I did not mix MID-tagged with non-tagged samples. I can't imagine how that would be useful...is that what you were referring to in that last line?

              aj, you're using a newer software version than I, I'm afraid...that option doesn't seem to be available in 2.5.3. Someday we'll get upgraded.

              Comment


              • #8
                Ah, yes. I do have 2.8. I don't know when this option showed up, but there have been several versions between 2.5.3 and 2.8, so it's hard to say.

                You might want to talk to your FAS about getting a newer version. In my experience, it doesn't take any more than an email and they'll send you a link to download it. The latest versions, 2.7 (Jr.) and 2.8 (FLX), have some new options and features in the analysis programs that might be useful to you.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Genetic Variation in Immunogenetics and Antibody Diversity
                  by seqadmin



                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                  11-06-2024, 07:24 PM
                • seqadmin
                  Choosing Between NGS and qPCR
                  by seqadmin



                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                  10-18-2024, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 11:09 AM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Today, 06:13 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-01-2024, 06:09 AM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-30-2024, 05:31 AM
                0 responses
                21 views
                0 likes
                Last Post seqadmin  
                Working...
                X