Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Demultiplexing Illumina RNASeq paired reads

    Hello everyone,

    BGI normally provides us with demultiplexed reads but this time we received our fastq files before demultiplexed. Can anyone recommended a software to perform the demultiplexing? And also where I can get the fastq files for the Illumina barcodes?

    Thank you very much in advance.

    Bruno

  • #2
    What kind of run is this (HiSeq/MiSeq)? Is that data truly non-demultiplexed? Do you see barcodes in the Fastq ID Header? Can you post a few example sequences?

    You won't be able to get fastq files for the barcodes unless the instrument was set up to run (or the post-processing of the data was done) in a special way.

    Comment


    • #3
      You should use bcl2fastq from Illumina to demultiplex your data. Download and employ version according to the sequencing instrument used to obtain the data.

      Comment


      • #4
        Originally posted by GenoMax View Post
        What kind of run is this (HiSeq/MiSeq)? Is that data truly non-demultiplexed? Do you see barcodes in the Fastq ID Header? Can you post a few example sequences?

        You won't be able to get fastq files for the barcodes unless the instrument was set up to run (or the post-processing of the data was done) in a special way.
        This is HiSeq (2000 I believe but need to double check) and I do see barcodes on the Fastq ID. Does that mean that effectively the data has been demultiplexed just needs to be split?
        Here is the head of one of the files:
        head 140916_I607_FCC5618ACXX_L4_CHKPEI14080416_1.fq
        Code:
        @FCC5618ACXX:4:1101:1415:1818#GTGGCCAT/1
        NCCCAAACGCGCGTGACTTCACAATAATTAGCCCGTACCTGCTGGTTACGTGGCGGCACCGTGTACAATACCCTAGGCATCAGGGTTAGGCATGGTTACT
        +
        BP\ceeeegggggghiiiiiiiiihiiiiihiiiiiiiiiiiiiifgggggeeeccaccaccaacdcccccbccccbccccccbc[`accccccc`bccc
        @FCC5618ACXX:4:1101:1308:1827#ATGTCAAT/1
        NCCCACCAAAACCGGAAAATGCAGGCCCTGTCGTCTCGCGTGAACATCGCGGCCAAGCCCCAGCGCGCTCAGCGCCTGGTGGTCCGCGCCGAGGAGGTTA
        +
        BP\ccecegggggiihhhiegghhhhihihgiihhiiihighfhiihfggecaacca_acccccZ]]]aaXb]]aX]ac]^_]bccccccc]_a___QW`
        @FCC5618ACXX:4:1101:1465:1834#TCCCGAAT/1
        NAACCAGGCGAACGGTTGGCGTCGGGATTCGGGACGCAAGCATGGCGCTGACCAGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCCGAAGCT
        head 140916_I607_FCC5618ACXX_L4_CHKPEI14080416_2.fq
        Code:
        @FCC5618ACXX:4:1101:1415:1818#GTGGCCAT/2
        CTCCGGTGTCAAGTAACCATGCCTAACCCTGATGCCTAGGGTATTGTACACGGTGCCGCCACGTAACCAGCAGGTACGGGCTAATTATTGTGAAGTCACG
        +
        _bbeeecegggggihiiiiiiiiiiiiiiiiiiiiiihhiicffhhhhhighieghhhhiggeeeecddccccccccccccccbbcdddcdcbdbbbbcc
        @FCC5618ACXX:4:1101:1308:1827#ATGTCAAT/2
        CGGGGCGCAGGATCTTCACCAGCGAGCCGCGCTTGGGGCCGACCTCCTTCTTGGGGGCAGCCTTAACCTCCTCGGCGCGGACCACCAGGCGCTGAGCGCG
        +
        ab_ceeeef`geghhiiihhiiihiihhiigeeca`accccccccccccc]bbcacW[acccccbbccccccb__cccaaccc^aa[[_`accca^baac
        @FCC5618ACXX:4:1101:1465:1834#TCCCGAAT/2
        CCTGGTCAGCGCCATGCTTGCGTCCCGAATCCCGACGCCAACCGTTCGCCTGGTTCAGATCGGAAGAGCGTCGTGTAGGGA
        Last edited by Bacms; 01-13-2015, 08:54 AM.

        Comment


        • #5
          The reads in the fastq file have the same barcode, which should have been demultiplexed.

          Comment


          • #6
            @Bacms: Unless you have access to the original flowcell folder de-multiplexing these files will require a custom script/grep solution. Is there a chance you can go back and ask BGI to do the de-multiplexing? It would be easy for them to do.

            One reason this could have happened is if you had not provided them with the barcodes for your samples. Was that the case?

            Comment


            • #7
              Originally posted by GenoMax View Post
              @Bacms: Unless you have access to the original flowcell folder de-multiplexing these files will require a custom script/grep solution. Is there a chance you can go back and ask BGI to do the de-multiplexing? It would be easy for them to do.

              One reason this could have happened is if you had not provided them with the barcodes for your samples. Was that the case?
              This is the only data we got from BGI. They normally do the demultiplexing but this was at the end of the agreement between BGI and our University and apparently demultiplexing was not included on the cost of the contract even if they had been doing for a year. I wrote a quick python script just to look for the barcode sequence on the ID (perfect matching) and the diversity of barcodes in the sample is ridiculous including some other barcodes that Illumina provides but we did not use so I am suspecting a bit of cross contamination with someone else samples going on. Need to pull the sequences and see what they match to.

              The main question is whether I also need to cut the barcode sequence from the sequence itself or not?

              Comment


              • #8
                You will only get barcodes in the reads fot those reads where the insert is short and you read into the Illumina adapter, and all the way through the first part of the adapter into the barcode.

                If you trim your reads with something like Trimmomatic, the barcodes will be removed when Illumina adapter sequences are removed.

                As for having a lot of different barcodes in the file, I think that as well as perfect matches to the barcode, the demultiplexing usually allows for a one-base mismatch to the barcode sequence, and at the end you are usually left with a small number of reads that don't match to any of the barcodes because they have too many sequencing erors.

                Comment


                • #9
                  Originally posted by Bacms View Post
                  The main question is whether I also need to cut the barcode sequence from the sequence itself or not?
                  In illumina sequencing barcode sequence is *never* part of the actual read (when the reads are pre-processed, which your reads appear to be). Did you get files with generic names like (lane1_undetermined*)? What you could have is adapter contamination in reads. That can be taken care of by an appropriate trimming program.

                  If you have written a python script to enumerate tags then separate the reads (4 lines per) into separate files. Remember to maintain the order of R1/R2 in the two files to not get reads out of order.

                  Note: If you have "not expected" barcodes present (after allowing for one error as Mastal pointed out) there may be some other issue going on.
                  Last edited by GenoMax; 01-14-2015, 09:47 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    The Impact of AI in Genomic Medicine
                    by seqadmin



                    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                    02-26-2024, 02:07 PM
                  • seqadmin
                    Multiomics Techniques Advancing Disease Research
                    by seqadmin


                    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                    A major leap in the field has
                    ...
                    02-08-2024, 06:33 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 02-28-2024, 06:12 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-23-2024, 04:11 PM
                  0 responses
                  74 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-21-2024, 08:52 AM
                  0 responses
                  82 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 02-20-2024, 08:57 AM
                  0 responses
                  69 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X