Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • kmcarr
    replied
    Originally posted by silin284 View Post
    I can't blame the core for it too much, they have been stretched a lot recently. But I do need to find a way to sort the data...
    Perhaps is was a little under-caffeinated earlier and initial reaction reflected that. I can certainly understand being stretched as I'm the only person to deal with the data stream from one HiSeq, one GAIIx, one 454 and soon an IonTorrent PGM.
    With demultiplex, the fastq file is like:
    @D3B4KKQ1_0176:3:1101:9745:2659#ATCACG/1

    without demultiplexing:
    @D3B4KKQ1_0182:3:1101:11400:2655#0/1
    It's obvious from the format of the read IDs that your core is not using the latest versions of the Illumina software; the ID format changed with CASAVA 1.8.

    You indicate that you are using TruSeq so your index (barcode) read is separate from your sequence read(s). Is your core providing you a FASTQ file of the index read? If they expect you to do the demultiplexing they will have to. If all you have as inputs are FASTQ files of the sequence read(s) and index read then no Illumina script will help. Their scripts perform demultiplexing prior to writing the FASTQ files. Let's assume that you have one FASTQ file with the index read and one fastq file with the sequence read (single read in this example). A second assumption is that the order of the reads in the two files is identical; this should be a safe assumption if the two files were produces concurrently by the Illumina software. The processes is conceptually simple in this case: 1) read the first entry from the index FASTQ and compare the sequence to a list of your tags to decide which one it is, 2) read the first sequence from your read FASTQ and then write it out to the appropriate file depending on the barcode, 3) repeat 200 million times for each lane of HiSeq data.

    Leave a comment:


  • silin284
    replied
    Hi Kmcarr and Heisman

    Thanks for the replies. I can't blame the core for it too much, they have been stretched a lot recently. But I do need to find a way to sort the data...

    With demultiplex, the fastq file is like:
    @D3B4KKQ1_0176:3:1101:9745:2659#ATCACG/1

    without demultiplexing:
    @D3B4KKQ1_0182:3:1101:11400:2655#0/1

    The ID does not have the barcode read sequence. So I can't write a script to sore it. Unless they can send me the fastq file for the index read. I can paste them together and sort it.

    I am not sure how the core deal with the raw image file and convert them to fastq. Is there a way for them to generate a single fastq (or 2 fastq files: read1 + index read2) that can be sorted by us?

    They are "tired" of sending data because they dont use FTP! they just scp each file to us. I am pretty sure they might not know how to use scp -r as well

    Leave a comment:


  • kmcarr
    replied
    Silin,

    I'm frankly a little appalled at the attitude displayed by your sequencing core facility. Delivering data to the client in a usable format is the job they contracted to do; if they find this job to onerous then perhaps they should get out of the business. Our core does demultiplexing of run data regularly, sometimes at much higher orders than 10 samples and would never think of shifting this task to the researcher. The latest versions of the Illumina software, specifically CASAVA 1.8.x, make demultiplexing dead simple. They have to run CASAVA anyway to convert the .bcl files to FASTQ anyway so there's no excuse to complain about doing it.

    O.K., rant over.

    As Heisman stated and I mentioned above, CASAVA makes demultiplexing easy but it uses the .bcl files as input, not FASTQs. To run this yourself you would need access to the entire run directory which is unreasonable. For you to demultiplex from FASTQ files would require writing (or finding) some custom scripts to read the FASTQ of the index read, store the IDs associated with each tag and then parse through the read file(s) to sort them. Doing it this way just seems so silly to me when I know how easy it is to do with CASAVA.

    You indicated that they "tire" of "sending" the large number of files to you. How are they doing this? If they were smart they would post user data to an FTP server with individual client directories and logons to download their data. This is how we do it at our core and it is no more difficult to post 20 files than it is to post 1.

    Leave a comment:


  • Heisman
    replied
    There is a demultiplexer script with Casava that you will probably find useful. I actually got it from a friend in a different lab so I am unsure how to find it online, but if you cannot find it I will send it to you somehow.

    Leave a comment:


  • silin284
    started a topic can user do the demultiplexing (truseq)

    can user do the demultiplexing (truseq)

    Hi

    Our sequencing core is getting "tried" of demultiplexing the data for us since we put 20 homemade truseq index into each lane. They think it is a pain to send 7 x 20 fastq files to us each time.

    I suppose they can send us just 2 fastq files for read1 and the read2 (index read)? If that can be done, we can sort the reads ourselves.

    Has anyone done that before? Or there is a better way to do it?

    thanks in advance
    silin

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 07:20 AM
0 responses
23 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
38 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
44 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
41 views
0 likes
Last Post seqadmin  
Working...
X