Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jdanderson
    Member
    • Sep 2010
    • 45

    FASTX Toolkit barcode splitter issue

    Hello All,

    I've been trying to use the FASTX Toolkit barcode splitter to demultiplex my illumina reads. The following command runs okay without any errors:

    [cat /home/johnathon/jda_ev_extended.txt | fastx_barcode_splitter.pl --bcfile /home/johnathon/mybarcodes.txt --bol --mismatches 1 --prefix /home/johnathon/split_bc/jda_ev_split_bc --suffix ".txt"]

    But none of the output files contain any reads except for the mismatched file.

    The following is mybarcode.txt file:

    [#I hope the following is the appropriate format for this txt file, it should contain the barcode identifier and the barcode sequence itself in a tab delimited fashion--Johnathon David Anderson
    BC1 ACCC
    BC2 CGTA
    BC3 GAGT
    BC4 TTAG]


    However, when I look at the extended.txt file i can see the right barcodes on the 5' end. I have also tried to use the export.txt file to no avail; apparently it is not formatted appropriately. I get an error message saying for the first character there is an "S" instead of an "<" or an "@".

    I have not converted these files from Solexa to Sanger Fastq. Could this be the issue?

    For my first data set that was not barcoded I was using the MAQ fq_all2std.pl script export2std command to convert the export.txt file. It worked just fine and I was able to visualize the data on IGV. I haven't had much success with MAQ patch ill2sanger and am wondering if this is the issue with FASTX Toolkit, then can anyone recommend a user friendly script to convert. I am using Solexa pipeline 1.6.

    Is anyone familiar with the FASTX Toolkit? Is the problem probably that the Illumina files need to be converted to Sanger FASTQ first?

    Any guidance would be most appreciated?

    Regards,
    Johnathon
  • jdanderson
    Member
    • Sep 2010
    • 45

    #2
    Hello All,

    I am updating my progress in case this may help someone in the future.

    As previously mentioned I used the FASTX Toolkit on the export.txt and extended.txt files from Illumina pipeline 1.6 with minimal success and I suspected a formatting error in these files. I just tried using the same Barcode Splitting module on the sequence.txt file (prior to reformatting to Sanger Fastq) and it seems to have worked fine, with the caveat that there appears to be more reads in the unmatched file than I had expected (199,524 out of 28,223,602 or 0.7%), but perhaps this is normal. For reference, I had used the NuGen Ovation and Encore Kits for library prep.

    Regards,
    Johnathon

    Comment

    • KevinLam
      Senior Member
      • Nov 2009
      • 204

      #3
      sorry to hijack your thread but would fastx toolkit be able to demultiplex SOLiD reads as well?
      http://kevin-gattaca.blogspot.com/

      Comment

      • hyjkim
        Member
        • Apr 2010
        • 18

        #4
        Fastx toolkit does not work for solid data. I wrote some perl scripts to demultiplex some solid data few months back. The code and the syntax weren't pretty. If you're interested, I can dig the scripts up and post them.

        Comment

        • jdanderson
          Member
          • Sep 2010
          • 45

          #5
          Hello Kevin,

          I am not sure. I cannot directly tell from the documentation, however, i don't see any mention of color space reads. Maybe you could query the Hannon Lab if you don't get an immediate answer on here ([email protected]).

          -
          Johnathon

          Comment

          • 2007lab
            Member
            • Mar 2009
            • 14

            #6
            Bump for the solid part of this thread.
            Once I run the solid2fastq.pl to convert my csfasta and qual to a fastq.gz file, can I use fastx to do QC on my solid PE reads?

            Comment

            • upendra_35
              Senior Member
              • Apr 2010
              • 102

              #7
              Hi jdanderson,
              I think your command looks good to me and i suspect the problem is with the barcode file.Try opening the barcode file with vi and see if there is anything werid going on. Sometimes you see ^M at the end of the line and if you see so then you can manually fix this and re-run the command. Good luck....

              Comment

              • carmeyeii
                Senior Member
                • Mar 2011
                • 137

                #8
                Hi everyone,

                I've been using the FastX Barcode Splitter successfully, but regarding the --partial option, I have realized I'm losing some reads with a particular problem:

                With --partial 1

                The barcode

                Code:
                CGCGTCAGCATTGTTCATAC
                will pick up the read

                Code:
                [COLOR="purple"]GCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACT
                since it is missing just one base at the left end to match the barcode exactly.

                However, the read:

                Code:
                C[COLOR="Purple"]CGCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACT
                will not be taken as matching the barcode, since it has one extra base at the beginning. Unfortunately, there are many reads that fall into this category, but not all of them begin with the extra 'G'.

                Do you use anything else to get around this?

                Thanks!
                Carmen

                Comment

                • chadn737
                  Senior Member
                  • Jan 2009
                  • 392

                  #9
                  A quick and dirty solution would be to trim of the first base pair of all your reads and then just use FastX barcode splitter with --partial

                  Comment

                  • carmeyeii
                    Senior Member
                    • Mar 2011
                    • 137

                    #10
                    Thank you, chadn!

                    Of course this was the easiest solution.

                    The barcode is:
                    Code:
                    REVERSEPRIMER	[COLOR="red"]CGCGTCAGCATTGTTCATAC[/COLOR]
                    Read 1 begins with a perfect match to the barcode.

                    Code:
                    @HWI-M00149:16:000000000-A12VK:1:2114:17873:29127 2:N:0:
                    [COLOR="Red"]CGCGTCAGCATTGTTCATACAAAGCTAC[/COLOR]TTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    Read 2 has an extra base at the beginning, followed by a perfect match to the barcode.

                    Code:
                    @HWI-M00149:16:000000000-A12VK:1:2114:17873:29128 2:N:0:
                    A[COLOR="red"]CGCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    Read 3 is missing the first base of the barcode.

                    Code:
                    @HWI-M00149:16:000000000-A12VK:1:2114:17873:29129 2:N:0:
                    [COLOR="red"]GCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    By trimming the first base of every read,

                    we are left with

                    Code:
                    Read 1 [now missing 1 base at the beginning]
                    
                    [COLOR="Red"]GCGTCAGCATTGTTCATACAAAGCTAC[/COLOR]TTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    
                    Read 2 [now perfect match]
                    
                    [COLOR="red"]CGCGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    
                    Read 3 [now missing 2 bases at the beginning]
                    
                    [COLOR="red"]CGTCAGCATTGTTCATAC[/COLOR]AAAGCTACTTAGTTGCTACGAAGCAATACATTGTTAGTTGTTAACTACTCCCCCCTCTTGTTTTNNNCNNTNNNNNNNNNNNNNNNNNNNNNNNNNNTNNNNNNNNNNNNNNNNNNNNNNNNNN
                    and by using

                    Code:
                    --mismatch [COLOR="red"]4[/COLOR] --partial [COLOR="red"]4[/COLOR]
                    all reads will be matched to the barcode.

                    The --4 doesn't make sense to me, as I thought this would be --2, but this is the only thing hat gets it to work, so...

                    Thanks a lot!

                    Carmen

                    Comment

                    • vivi7
                      Member
                      • Mar 2014
                      • 10

                      #11
                      fastx_barcodes_splitter issue with run

                      Hi,

                      I saw the post and I hope maybe some of you can help me

                      When I run fastx_barcode_splitter.pl with this script

                      /usr/local/bin/fastx_barcode_splitter.pl --bcfile ./Barcodes9nt.txt --prefix ./Rescued9nt --suffix .fq –bol

                      In the command line it looks like is running (no error message, no > sign), see attachment for screenshot.
                      However is not running at all, I can see with top that is not using any memory or CPUs and it has been ‘running’ for days on a very small file without producing any results.
                      The input file is in the STDIN folder as supposed to.

                      I would be very grateful if you could suggest what might be wrong.
                      Thanks in advance
                      Vivi

                      Comment

                      • smitra
                        Member
                        • May 2013
                        • 20

                        #12
                        Hi vivi7,
                        I guess you need to provide your fastq or fasta file. You haven't provide that.
                        Use as
                        Code:
                        cat File.fastq | /usr/local/bin/fastx_barcode_splitter.pl --bcfile mybarcodes.txt ...other options if you want.

                        Comment

                        • smitra
                          Member
                          • May 2013
                          • 20

                          #13
                          Hi Everybody,
                          I came back to this thread again as I am getting a very similar problem to the first post by janderson.

                          My code works fine:
                          cat test_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 1
                          But none of the output files contain any reads except for the mismatched file.

                          This data we got from Mr.DNA and raw fastq file for 10 sample together which I need to split. Johnathon's later suggestion din't help.
                          Can anybody help please?
                          Thanks,
                          smitra

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            #14
                            Can post a few lines of your fastq file and the mapping file?

                            Comment

                            • smitra
                              Member
                              • May 2013
                              • 20

                              #15
                              Thanks for replying GenoMax

                              Code:
                              #SampleID	BarcodeSequence
                              AP1E	CGTAACCA
                              AP25E	CGTACCCA
                              AP5D	CGTAAGAA
                              AP8C	CGTAGATA
                              P29F	CGTAGGCT
                              P30N	CGTATTCA
                              P31B	CGTCAAGA
                              P35C	CGTATTTC
                              V2A	CGTCCAGG
                              V3J	CGTCACAG
                              But as the fastq files look like (I assume the bold red part is the barcode with one N)

                              mitras$ less test_R1.fastq

                              Code:
                              @M02542:124:000000000-AKFBJ:1:1101:13841:1000 1:N:0:5
                              
                              NGTACCCAAGGGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACANNCNNGTCGAACGGTAGCNCAGAGAGCTTGCTCTNGGNTGACGAGTGGCGGACGGGNGANTAATGTCTGGGAAACTGCCCGATGGAGGGGGATANCTACTGGANANNGNNGCTAATACCGCATAACGNCGCAAGACCAAAGAGGGNGANNTCAGGGCCTCTTGNCATCGGATGNNCCCAGATGGGATNGGCTTGTAGGTGAGGTAAGNGCTCACGCNGGCGACGATCCCTAGCTTGGNNGNGAGG
                              
                              +
                              
                              #8ABCFGGGGGGGGEEGGGGGGGG<FGGGFFGFGFGFGGEG@FGEEGGCFGGGGG?##:##6:CFFGGGDG<CG#:CCFFGEGGGGFAFG#:<#:BBFF7FFGDGGGGGGGD#8+#+:BFGGGGGGGCFFGDGG<FGGGECCGDEGGGF@#611:D,>>#6##6##66<1CF@7FFFGEGF7E#41=8=EGFFG7*?CF>>#22##2*2;@;8C8CFC<#/2AC=E*:5##/2:CFCG+8**+#*1*1552<+*+0+8D6D4+#1**)**)*#*15/*//7>5:5<.*,*)0)##1#..73
                              
                              @M02542:124:000000000-AKFBJ:1:1101:12174:1002 1:N:0:5
                              
                              NGTAACCAAGGGTTTGATCCTGGCTCAGGATGAACGCTAGCTACAGGCTTAACACANNCNNGTCGAGGGGCAGCATTTCAGTTTGCTTGCNAANTGGAGATGGCGACCGGCGNACNGGTGAGTAACACGTATCCAACCTGCCGATAACTCNGGGATAGCNTNNCNNAAGAAAGATTGATACCCNATGGTATAATCAGACCGNATGGTCTTATTATTAAANAATTTCGGTNNTCGATGGGGATGNGTTCCATTAGGCAGTTGGTGTGTTAATGNCGCACCAAACCTTCCTGTGANNGNGTTT
                              
                              +
                              
                              #8ACCGGGGGGGCFGGGGGGGGGGGGGGGGGGGFGGGGDGGGGGGGGGFGGGGGGG##:##6:CFGFDEGGGGDGGGFGGGFGGGGGGGG#:C#66=,CFFFGGGG@FGEE7#++#:BBFFGGGFCFGGGGGGCGDGGGFGGGGGGGGC=#8@<<<FGG#5##8##86DCF<FCCC:BFCFFF#6>F>FGG92;@CFFGF@#116*=CF<CG?@CFFFG#3;5375:CG##212**<5C5/::#11:91A>+<>C6CE<FC:*****0:FB<#1*)//75<F30762*-2)**##1#0)0.


                              But as you can see I have N, so may be I need to allow 1 mismatch for the barcode.
                              Thus I tried code as:
                              cat test_R1.fastq | fastx_barcode_splitter.pl --bcfile mapping2_bcfile.txt --prefix /Volumes/Cristina/Mr.DNA_2016/fastq_files/testdata/ --bol --mismatches 1
                              Thanks for helping
                              smitra
                              Last edited by GenoMax; 01-25-2016, 09:10 AM. Reason: added CODE tags to improve readability

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 05:03 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              18 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              185 views
                              0 reactions
                              Last Post seqadmin  
                              Working...