Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • JohnK
    replied
    Originally posted by westerman View Post
    Perhaps because the processing software would then have to deal with different names for F3 and R3 (or F3/F5) files. In other words, currently the files look something like:

    primary.date1/reads/run_F3_sample.csfasta
    primary.date2/reads/run_R3_sample.csfasta

    Or however you have your experiment set up. The point is that the file names (not the directories) look similar thus making it easy to switch files in and out of the pipeline. Whereas with file stamps in the file names it would look like:

    primary.date1/reads/run_F3_sample_date1.csfasta
    primary.date2/reads/run_R3_sample_date2.csfasta

    Not much of a change but perhaps something that Bioscope does not want to handle. Although, granted, making the individual files a lot more trackable.

    Anyway the above is just being "the devil's advocate". Getting back to the original question:



    The only time you should get multiple primary.* directories for the same primer is when you doing, as you say, a reanalysis mid-run. In this case the primary.* directory with the latest date in its name should be the reanalysis one ... which is presumably the analysis you want. If it turns out to not be the one you want (e.g., because of lower quality scores) then use a previous one.

    In our (Purdue Genomics) case the only time we do a reanalysis mid-run is when something is screwed up -- i.e., we are desperately trying to save a run from a total meltdown. Fortunately the SOLiD technology is really good at being able redo parts of a run and extract data from a potentially failed run. However there have been cases where we redid a partial part of run, did not like that, re-did that part again, and ended up with even worse results. At that point we had a jumble of primary.* directories from which to choose the best of many poor results. You may be in a similar situation -- i.e., none of the results will be that great. But one of "your people" should be able to say, via looking at the heat maps and the cycle scans, which primary.* has the best chance of containing good data.

    I know that the above doesn't really answer your question "... [a] way to ID the best-run and best primary* file to use ..." but this is in part because any time a reanalysis is done then implies that there is likely to be no "best". You have a non-typical case and thus are charting a non-typical path.
    Thanks, Westernman. That's my very issue. I can't tell which primary* directory is the right one and there's no real way for me to tell unless there is no reanalysis done and only one directory to choose from. I've had cases where there were reanalyses and the latest was invalid as you said, which resulted in terrible mapping results and a waste of my time and our cluster's time. I'll talk to my SOLiD tech to try and figure out a way to analyze the heat maps and cycle scans and pass that along to our technician. That has to be the solution. Thanks a lot!

    Leave a comment:


  • westerman
    replied
    Originally posted by KevinLam View Post
    I can't understand why they can't put the timestamp in the filename instead of the directory as well.
    such a pain
    Perhaps because the processing software would then have to deal with different names for F3 and R3 (or F3/F5) files. In other words, currently the files look something like:

    primary.date1/reads/run_F3_sample.csfasta
    primary.date2/reads/run_R3_sample.csfasta

    Or however you have your experiment set up. The point is that the file names (not the directories) look similar thus making it easy to switch files in and out of the pipeline. Whereas with file stamps in the file names it would look like:

    primary.date1/reads/run_F3_sample_date1.csfasta
    primary.date2/reads/run_R3_sample_date2.csfasta

    Not much of a change but perhaps something that Bioscope does not want to handle. Although, granted, making the individual files a lot more trackable.

    Anyway the above is just being "the devil's advocate". Getting back to the original question:

    Our people are unable to tell me which primary.* files to use and the 'latest' primary.* file doesn't always have the 'latest' write-to time stamp (unix time attribute) off the machine- nor is it the best quality.
    The only time you should get multiple primary.* directories for the same primer is when you doing, as you say, a reanalysis mid-run. In this case the primary.* directory with the latest date in its name should be the reanalysis one ... which is presumably the analysis you want. If it turns out to not be the one you want (e.g., because of lower quality scores) then use a previous one.

    In our (Purdue Genomics) case the only time we do a reanalysis mid-run is when something is screwed up -- i.e., we are desperately trying to save a run from a total meltdown. Fortunately the SOLiD technology is really good at being able redo parts of a run and extract data from a potentially failed run. However there have been cases where we redid a partial part of run, did not like that, re-did that part again, and ended up with even worse results. At that point we had a jumble of primary.* directories from which to choose the best of many poor results. You may be in a similar situation -- i.e., none of the results will be that great. But one of "your people" should be able to say, via looking at the heat maps and the cycle scans, which primary.* has the best chance of containing good data.

    I know that the above doesn't really answer your question "... [a] way to ID the best-run and best primary* file to use ..." but this is in part because any time a reanalysis is done then implies that there is likely to be no "best". You have a non-typical case and thus are charting a non-typical path.

    Leave a comment:


  • KevinLam
    replied
    Sorry not quite getting you.
    But if you mean how to set a rule to find the proper F3.csfasta files to use.
    then yes I know wat you mean.

    I usually do a
    find . -iname *.stats

    only the correct reads dir with a .stats file contains the F3 that I need.

    I can't understand why they can't put the timestamp in the filename instead of the directory as well.
    such a pain

    Leave a comment:


  • JohnK
    replied
    Thanks, mrawlins. Maybe someone can also shed some light on the primary* issue...

    Leave a comment:


  • mrawlins
    replied
    If you disable the auto-export and export them manually using scp you can rename the files when you copy them. This doesn't solve the problem of finding which primary.* to use, though in our case it's always been the one with the largest number (most recent timestamp).

    Leave a comment:


  • JohnK
    started a topic Finding optimal reads generated on SOLiD:

    Finding optimal reads generated on SOLiD:

    Hi,

    We're having some issues with the data coming off the SOLiD. Our people are unable to tell me which primary.* files to use and the 'latest' primary.* file doesn't always have the 'latest' write-to time stamp (unix time attribute) off the machine- nor is it the best quality. Similarly, the primary.* directories may differ across the run from sample to sample and so one sample might use primary.20101016etc on one sample and another sample of the same exp/lib/run might use primary.20101014 and both are present in each directory. These are all attributable to doing reanalysis mid-seq during a run.

    To make matters worse, the way the files are being copied is creating an entire system/network of directories that I have to go through to find the proper primary.* file for mapping. Using the wrong one results in very poor mapping. Could someone shed some light on a manual to look through or a way to ID the best-run and best primary* file to use off a run? Finally, is there a way to name these directories prior to export off the SOLiD? Thanks.

    J

Latest Articles

Collapse

  • seqadmin
    Addressing Off-Target Effects in CRISPR Technologies
    by seqadmin






    The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
    08-27-2024, 04:44 AM
  • seqadmin
    Selecting and Optimizing mRNA Library Preparations
    by seqadmin



    Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
    08-07-2024, 12:11 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 08-27-2024, 04:40 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 08-22-2024, 05:00 AM
0 responses
293 views
0 likes
Last Post seqadmin  
Started by seqadmin, 08-21-2024, 10:49 AM
0 responses
135 views
0 likes
Last Post seqadmin  
Started by seqadmin, 08-19-2024, 05:12 AM
0 responses
124 views
0 likes
Last Post seqadmin  
Working...
X