Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Barcodes in HiSEQ FASTQ Files

    Hi,

    For HiSEQ paired end barcoded data, I am splitting FASTQs into multiple
    sample-based FASTQs. I am wondering exactly where are the barcodes stored in the sequence string of the FASTQ files. At the beginning? AT the end? Beginning for first mate and end for the
    second mate?

    I.e., if

    mate_1=ATCGTAGA.....................TTAGACGA
    mate_2=GCATGATG.....................ATCGATAG
    which sub-strings are barcodes?

    Where are the barcodes stored in qseq.txt files?

    Is there a web-page/white paper that explains this clearly?

    Thanks,

  • #2
    Hi wdt,

    I actually have the exact same question, just wondering if you got the answer to your question or if you are still looking?
    If yes can you let me know the answer or where to find an answer, I would be very grateful.
    Thanks a lot,

    Comment


    • #3
      Barcoding can be done in a couple of different ways. If you are using Illumina barcodes then they are generally "read" as a separate sequence read. You will not see this "read" appear in the final sequence data. Illumina CASAVA (pre-processing/de-multiplexing) pipeline takes this third (and fourth if you are using dual indexing) read into consideration when doing the sample de-multiplexing.

      If you are using "inline" or custom barcodes then it will be your responsibility to do the demultiplexing since the barcode sequence will be part of the actual read.

      There is a primer at this link: http://www.umassmed.edu/uploadedFile...Sequencing.pdf

      Comment


      • #4
        Hi GenoMax,

        Thanks a lot for your reply.
        For our RNA-seq experiment, our RNA-seq libraries were prepared using the epicentre (an Illumina company) ScriptSeq™ v2 RNA-Seq Library Preparation Kit and used their ScriptSeq™ Index PCR Primers for barcoding; and then our libraries were sent for paired-end sequencing to the BGI. So from my understanding of your reply, the de-multiplexing of my samples will probably be done by the BGI with Illumina CASAVA, am I right?
        Thanks a lot,

        Comment


        • #5
          Originally posted by Nicolas Nalpas View Post
          Hi GenoMax,

          So from my understanding of your reply, the de-multiplexing of my samples will probably be done by the BGI with Illumina CASAVA, am I right?
          Thanks a lot,
          Correct. The output from a CASAVA run will be files in the following format for standard illumina tags.

          Two files for each sample/tag combination (provided they concatenate the results into one single large file for each read).

          SampleID_TAGSEQ_L00#_R1_001.fastq.gz (read 1)
          SampleID_TAGSEQ_L00#_R2_001.fastq.gz (read 2)

          SampleID - you provided
          TAGSEQ - sequence of tag
          L00# - would be the lane number on the flowcell.

          Comment


          • #6
            Dear GenoMax,
            I want to add barcode file, list of cultivars in command for bowtie2, how can i do?
            Thank you very much,

            Comment


            • #7
              Originally posted by maivantan View Post
              Dear GenoMax,
              I want to add barcode file, list of cultivars in command for bowtie2, how can i do?
              Thank you very much,
              Can you clarify what it is you are trying to do? Are your samples already de-multiplexed (i.e. are they in separate files?)?

              Comment


              • #8
                yes,
                I did 5 cultivar and they are in separate files.
                I also prepare key.txt for 5 barcodes.
                so do i need to add the key.txt file to bowtie2?

                one more question i would like to ask you is after i align using samtools and i found reference is not available (Red color)

                user$ ./bowtie2 -x ~/tan_analysis/rice1 -U ~/tan_analysis/analysis20140111/20140111_A1_PE1.fastq -S wrc20140118.sam
                2398641 reads; of these:
                2398641 (100.00%) were unpaired; of these:
                2094522 (87.32%) aligned 0 times
                228676 (9.53%) aligned exactly 1 time
                75443 (3.15%) aligned >1 times
                12.68% overall alignment rate

                CHROM POS ID REF ALT QUAL FILTER INFO FORMAT wrc20140118.sorted.bam.bam
                chr01 27400 . N G 68 . DP=22;VDB=3.856020e-04;AF1=1;AC1=2;DP4=0,0,0,8;MQ=29;FQ=-51 GT:PL:GQ 1/1:101,24,0:45
                chr01 27401 . N C,G 75 . DP=22;VDB=1.934810e-04;AF1=1;AC1=2;DP4=0,0,0,9;MQ=28;FQ=-51 GT:PL:GQ 1/1:108,24,0,104,10,101:45
                chr

                Please give me your suggestions

                Comment


                • #9
                  Are you trying to do multi-sample SNP calling with samtools?

                  Your alignment rate looks low in the example posted above.

                  Comment


                  • #10
                    yes, i am trying to do multi-sample SNP calling with samtools.

                    i don't know the reason why overall alignment rate very low.

                    Please give your suggestions

                    Comment


                    • #11
                      Is the rice variety you are sequencing very different than what you used as reference? Did you do any QC on your reads (FastQC/Trimming etc) before doing the alignments?

                      If you have already completed your alignments as independent files you can use them as shown in the samtools mpileup example to do SNP calls across multiple files. Is your question about key.txt related to @RG records (if a BAM file contains multiple samples)?

                      Comment


                      • #12
                        the rice varieties that i sequenced is indica, the reference is japonica, i think it is not big different. I did not do any QC before doing the alignments.

                        I did align only one file, so i would like to ask you how i can do all files together in bowtie2.

                        my second question is: do i need to add barcode file on the command in bowtie2

                        Thank you very much

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Genetic Variation in Immunogenetics and Antibody Diversity
                          by seqadmin



                          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                          11-06-2024, 07:24 PM
                        • seqadmin
                          Choosing Between NGS and qPCR
                          by seqadmin



                          Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                          10-18-2024, 07:11 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Today, 11:09 AM
                        0 responses
                        24 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Today, 06:13 AM
                        0 responses
                        20 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 11-01-2024, 06:09 AM
                        0 responses
                        30 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 10-30-2024, 05:31 AM
                        0 responses
                        21 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X