Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    deinterleave with singletons

    Hi!

    I have a interleaved fastq containing unmapped reads produced by segemehl -u. I want to deinterleave it into the two mate pair files as well as removing/saving the singletons into a separate file.

    Currently, reformat.sh cannot deal with it, even if I give outsingle= as parameter. The header contains the strand information (i. e. 2:N:0:2).

    Is there some way to get at least the pairing reads extracted without singletons in between?

    --
    Kind regards,
    Mathias

    Comment


    • #32
      You could use `repair.sh` to separate the singletons out afterwards.

      Comment


      • #33
        Originally posted by GenoMax View Post
        You could use `repair.sh` to separate the singletons out afterwards.
        Thanks for pointing me into this direction. Unfortunately, repair.sh did not produce well ordered files. Fortunately, bbsplitpairs.sh could be used instead of the reformat.sh/repair.sh combination and extracted the correct pairing reads as well as singletons into a separate file.

        Comment


        • #34
          I'm confused about sam/bam options

          In version 37.52, the parameters under Sam and bam processing options are confusing to me
          Sam and bam processing options:

          mappedonly=f Toss unmapped reads.
          unmappedonly=f Toss mapped reads.
          pairedonly=f Toss reads that are not mapped as proper pairs.
          unpairedonly=f Toss reads that are mapped as proper pairs.
          primaryonly=f Toss secondary alignments. Set this to true for sam to fastq conversion.

          if 'mappedonly' is false, shouldn't that mean to KEEP unmapped plus mapped reads?
          Likewise, 'pairedonly' false (to me) means KEEP unpaired and paired

          In the end, I want my bam to only contain paired reads, so I've been running it with 'pairedonly=t' , but reformat.sh says 'input is being processed as unpaired' for my bam file.
          Last edited by milw; 02-26-2019, 11:31 AM.
          Scott Monsma
          Sr Scientist at Lucigen

          Comment


          • #35
            BBMap not retrieving fastq quality values from bam file

            Hi,

            I'm trying to use BBMap version 38.08 to retrieve fastq sequences from a bam file. However, I keep getting a problem where the quality output is merely: JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

            Here's some lines from the bam file:

            Code:
            HISEQ:378:C7F64ANXX:3:1207:13039:83924	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	15	60	125M	=	644	754	GGGGGAGTGATAAAAATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGA	bbbbbfffffffffffffffffffffffffffffffffffffbffffffffef_ebffffffffffffffdfcefffffbfffffffbbfffffffdfefd_\ebOdefOWZW_bWefffdWce[	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
            HISEQ:378:C7F64ANXX:3:1106:10647:86342	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	30	60	125M	=	669	764	ATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAG	aabbbbffffffffffffffbbeffffffffffePaaebcfeefffffffeffYeffff\efPe]PePPPbc\bbPedP^PeaP]\dYbc]edcfOPOYd_bfeOcfOYOZ\\OdefffNObeOf	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
            HISEQ:378:C7F64ANXX:3:1208:17933:95359	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	32	60	125M	=	620	713	ATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAA	aab_`dedbefe]ZPPP^PePdZbbPPY_cbffbfffdfPePPbPP[d][effdfffcebbffcbYPbP\P[PYdb\d]Pd\NdP]P]\eaP[aeePec_OYYOOOYea\O]_O_^dW]edcfef	NM:i:11	MD:Z:14T2C9A1C23T8A0C11G3G23G10G10	MC:Z:125M	AS:i:70	XS:i:0
            HISEQ:378:C7F64ANXX:3:2305:17850:4846	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	52	60	125M	=	625	690	ATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGG	abbbbffffffffffffffffffffffffffffeffffffffffffeffffff]db\aeffffffffffffffP]ecaeffff]efffeeff_fffffffffffeffffffffffffffffffef	NM:i:9	MD:Z:7A1C23T8A0C11G3G23G10G30	MC:Z:117M8S	AS:i:80	XS:i:0
            HISEQ:378:C7F64ANXX:3:2205:15096:32122	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	59	60	125M	=	675	741	AACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGGTTGAGGATGCATCAAACCTTGGGAAGGAATAAGT	`aa_`ffaefP^Z\bPdP^_cebPPYbadfffbfbecac_a_Pef]P\Y[PbedPP[e\ed_facYPefff_efePbYYbP]\PP[deO\NN]e[aOOZbOOYaeef]bcb_OeOWb]ZWbOObO	NM:i:2	MD:Z:90T5A28	MC:Z:125M	AS:i:115	XS:i:0
            HISEQ:378:C7F64ANXX:3:1307:17567:99979	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	68	60	125M	=	718	775	GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC	ababbfffffffffffffffffffffffffffffffffffcffdfeffff]efffffefcffffffffefffffffecfffaefffffffffffffffffffeffffffffffffffb]e]fa]e	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
            HISEQ:378:C7F64ANXX:3:2211:20485:88833	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	68	60	125M	=	654	711	GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC	aabaaecfffffffffffffffffffffffffffffdefffffefffffffdeffffdefffffffffdffffefffffffefffffffff]cfdfdffffffffffcffffffffbeffffeff	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
            HISEQ:378:C7F64ANXX:3:1212:7300:28109	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	71	60	125M	=	636	690	CTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTC	bbbbbfffffffffffffffffffffffffffffffffffffaffffeefffff^efffffffffcffPeff]efffffffffffefffffffffffffffdfffffdfffff]bfdffffffff	NM:i:7	MD:Z:14T8A0C11G3G23G10G49	MC:Z:125M	AS:i:90	XS:i:0
            HISEQ:378:C7F64ANXX:3:2303:16430:40702	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	72	60	125M	=	694	747	TTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCC	abbbaffffffffffffdffffffeffffffffffeefffeffffefffefffcffffffffffdffffff_fffffaefffffff]edfffffffff]fffffffffdffcefffffff]ae_b	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
            HISEQ:378:C7F64ANXX:3:2116:16496:33002	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	95	60	125M	=	722	752	CAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCCAAAACTATATAGATAGATAGAGC	bbbbbffffffffffffdffffeeffffffdffffffeffffffffffdfffff]ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
            Here's the command and STDERR (NB in this case I was using qin=64 qout=33 for troubleshooting but I get the same result without these flags):

            Code:
            /home/xub/host/opt/bbmap/bbmap/reformat.sh qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
            java -ea -Xmx200m -cp /home/xub/host/opt/bbmap/bbmap/current/ jgi.ReformatReads qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
            Executing jgi.ReformatReads [qin=64, qout=33, requiredbits=16, overwrite=t, in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam, out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz]
            
            Could not find sambamba.
            Found samtools 1.8
            Input is being processed as unpaired
            Input:                  	464 reads          	58000 bases
            Output:                 	230 reads (49.57%) 	28750 bases (49.57%)
            
            Time:                         	0.634 seconds.
            Reads Processed:         464 	0.73k reads/sec
            Bases Processed:       58000 	0.09m bases/sec


            Here's some of the output:

            Code:
            @HISEQ:378:C7F64ANXX:3:2205:16922:87749
            CAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTTGAGCCTATT
            +
            JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
            @HISEQ:378:C7F64ANXX:3:1210:20568:23121
            AAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTT
            +
            JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
            @HISEQ:378:C7F64ANXX:3:2212:11893:40357
            GGTTTATGGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAAT
            +
            JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
            @HISEQ:378:C7F64ANXX:3:2210:7117:7877
            GGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGG
            +
            JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

            Thanks for any advice.

            Cheers.

            Comment


            • #36
              Could not find or load main class

              Hi,

              first time poster so my apologies for any horrid faux pas', and also for the thread necromancy!

              I'm trying to convert fastq's to an unmapped SAM (ultimately to a cram to test space saving and to check no loss of data when it's reconverted back to fastq's again) and I saw a thread suggesting BBMAP. However I'm having the same issue as pepe84.

              The command:
              reformat.sh in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam

              results in:
              "java -ea -Xm200m -cp /opt/science/BBMap/sh/current/ jgi.ReformatReads in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam
              Error: Could not find or load main class jgi.ReformatReads"

              When I copy/paste the full java line quoted in the error but remove the space between "/current/" and "jgi.ReformatReads" I instead get:
              "Error: Could not find or load main class in=.opt.science.blah.fastq.gz"

              I've tried it with the fastq file both in and out of the BBMap directory to see if it would help, but got the same error.

              Any advice would be gratefully accepted

              Roy.

              Originally posted by pepe84 View Post
              here is the command:
              java -cp C:\BBMap\current\jgi.ReformatReads in=“C:\BBMap\resources\SRRXXXXX.fastq” out1=EFB_R1.fq out2=EFB_R2.fq

              And here is the error:
              Error: Could not find or load main class in=C:\BBMap\resources\SRRXXXXX.fastq

              Just an FYI I am using the command line on windows.

              Thanks, I appreciate any help

              Comment


              • #37
                Hi,

                I'm trying to use BBMap version 38.08 to retrieve fastq sequences from a bam file. However, I keep getting a problem where the quality output is merely: JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

                Here's some lines from the bam file:

                Code:
                HISEQ:378:C7F64ANXX:3:1207:13039:83924	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	15	60	125M	=	644	754	GGGGGAGTGATAAAAATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGA	bbbbbfffffffffffffffffffffffffffffffffffffbffffffffef_ebffffffffffffffdfcefffffbfffffffbbfffffffdfefd_\ebOdefOWZW_bWefffdWce[	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
                HISEQ:378:C7F64ANXX:3:1106:10647:86342	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	30	60	125M	=	669	764	ATATATTTATTTCATCTAACTGATGAAATAACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAG	aabbbbffffffffffffffbbeffffffffffePaaebcfeefffffffeffYeffff\efPe]PePPPbc\bbPedP^PeaP]\dYbc]edcfOPOYd_bfeOcfOYOZ\\OdefffNObeOf	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
                HISEQ:378:C7F64ANXX:3:1208:17933:95359	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	32	60	125M	=	620	713	ATATTTATTTCATCCAATTGATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAA	aab_`dedbefe]ZPPP^PePdZbbPPY_cbffbfffdfPePPbPP[d][effdfffcebbffcbYPbP\P[PYdb\d]Pd\NdP]P]\eaP[aeePec_OYYOOOYea\O]_O_^dW]edcfef	NM:i:11	MD:Z:14T2C9A1C23T8A0C11G3G23G10G10	MC:Z:125M	AS:i:70	XS:i:0
                HISEQ:378:C7F64ANXX:3:2305:17850:4846	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	52	60	125M	=	625	690	ATGAAATGATGTTTTTGCTCTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGG	abbbbffffffffffffffffffffffffffffeffffffffffffeffffff]db\aeffffffffffffffP]ecaeffff]efffeeff_fffffffffffeffffffffffffffffffef	NM:i:9	MD:Z:7A1C23T8A0C11G3G23G10G30	MC:Z:117M8S	AS:i:80	XS:i:0
                HISEQ:378:C7F64ANXX:3:2205:15096:32122	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	59	60	125M	=	675	741	AACGTTTTTGCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGGTTGAGGATGCATCAAACCTTGGGAAGGAATAAGT	`aa_`ffaefP^Z\bPdP^_cebPPYbadfffbfbecac_a_Pef]P\Y[PbedPP[e\ed_facYPefff_efePbYYbP]\PP[deO\NN]e[aOOZbOOYaeef]bcb_OeOWb]ZWbOObO	NM:i:2	MD:Z:90T5A28	MC:Z:125M	AS:i:115	XS:i:0
                HISEQ:378:C7F64ANXX:3:1307:17567:99979	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	68	60	125M	=	718	775	GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC	ababbfffffffffffffffffffffffffffffffffffcffdfeffff]efffffefcffffffffefffffffecfffaefffffffffffffffffffeffffffffffffffb]e]fa]e	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
                HISEQ:378:C7F64ANXX:3:2211:20485:88833	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	68	60	125M	=	654	711	GCTCTTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCC	aabaaecfffffffffffffffffffffffffffffdefffffefffffffdeffffdefffffffffdffffefffffffefffffffff]cfdfdffffffffffcffffffffbeffffeff	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
                HISEQ:378:C7F64ANXX:3:1212:7300:28109	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	71	60	125M	=	636	690	CTTACAACTAATAGCTAAATACAGTAGAACTTGGATAATGCGTATGTGTTTGAGTTTTTAAAATATTGAGAGTGGAAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTC	bbbbbfffffffffffffffffffffffffffffffffffffaffffeefffff^efffffffffcffPeff]efffffffffffefffffffffffffffdfffffdfffff]bfdffffffff	NM:i:7	MD:Z:14T8A0C11G3G23G10G49	MC:Z:125M	AS:i:90	XS:i:0
                HISEQ:378:C7F64ANXX:3:2303:16430:40702	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	72	60	125M	=	694	747	TTACAACTAATAGTTAAATACAACAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCC	abbbaffffffffffffdffffffeffffffffffeefffeffffefffefffcffffffffffdffffff_fffffaefffffff]edfffffffff]fffffffffdffcefffffff]ae_b	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
                HISEQ:378:C7F64ANXX:3:2116:16496:33002	97	smaller_kp_promoter_region_upstream_of_atg_with_chloroplast_insertion_removed_1814_upstream_of_insertion_and_1089_downstream_upstream__5_prime_end_adjacent_to_a_stretch_of_n_residues	95	60	125M	=	722	752	CAGAACTTGGATGATGGGTATGTGTTTGAGTTTTTAAAATGTTGAGAGTGGGAGTTTGAGAATGCATCAAACCTTGGGAAGGAATAAGTCTTTTGGCCTTCCAAAACTATATAGATAGATAGAGC	bbbbbffffffffffffdffffeeffffffdffffffeffffffffffdfffff]ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff	NM:i:0	MD:Z:125	MC:Z:125M	AS:i:125	XS:i:0
                Here's the command and STDERR (NB in this case I was using qin=64 qout=33 for troubleshooting but I get the same result without these flags):

                Code:
                /home/xub/host/opt/bbmap/bbmap/reformat.sh qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
                java -ea -Xmx200m -cp /home/xub/host/opt/bbmap/bbmap/current/ jgi.ReformatReads qin=64 qout=33 requiredbits=16 overwrite=t in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz
                Executing jgi.ReformatReads [qin=64, qout=33, requiredbits=16, overwrite=t, in=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.1.bam, out=/home/xub/host/opt/findMatesAndRepair/output/150901_80_small/150901_80_small.bothEndsMapped.reverse.1.fq.gz]
                
                Could not find sambamba.
                Found samtools 1.8
                Input is being processed as unpaired
                Input:                  	464 reads          	58000 bases
                Output:                 	230 reads (49.57%) 	28750 bases (49.57%)
                
                Time:                         	0.634 seconds.
                Reads Processed:         464 	0.73k reads/sec
                Bases Processed:       58000 	0.09m bases/sec


                Here's some of the output:

                Code:
                @HISEQ:378:C7F64ANXX:3:2205:16922:87749
                CAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTTGAGCCTATT
                +
                JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
                @HISEQ:378:C7F64ANXX:3:1210:20568:23121
                AAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGGTCCGATTTTATAATTT
                +
                JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
                @HISEQ:378:C7F64ANXX:3:2212:11893:40357
                GGTTTATGGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAAT
                +
                JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
                @HISEQ:378:C7F64ANXX:3:2210:7117:7877
                GGTTTGACTTGGTTTGAAATTTATCCAAATGACAACCTAAATTGTAAACTGTTTTTTTAAAATCTACTAACCCAAACTGAATCATTTTATAAACCAAATCAAACTATAATTTTTAAATGGTTTGG
                +
                JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

                Thanks for any advice.

                Cheers.

                Comment


                • #38
                  Did you move any of the bbmap folder contents after you downloaded and uncompressed bbmap code?

                  Make sure the top level directory with BBMap is in your $PATH. Something like
                  Code:
                  export PATH=$PATH:/opt/science/BBMap
                  would work.


                  Originally posted by Oomjah View Post
                  Hi,

                  first time poster so my apologies for any horrid faux pas', and also for the thread necromancy!

                  I'm trying to convert fastq's to an unmapped SAM (ultimately to a cram to test space saving and to check no loss of data when it's reconverted back to fastq's again) and I saw a thread suggesting BBMAP. However I'm having the same issue as pepe84.

                  The command:
                  reformat.sh in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam

                  results in:
                  "java -ea -Xm200m -cp /opt/science/BBMap/sh/current/ jgi.ReformatReads in=/opt/science/blah.fastq.gz out=/opt/science/blah.sam
                  Error: Could not find or load main class jgi.ReformatReads"

                  When I copy/paste the full java line quoted in the error but remove the space between "/current/" and "jgi.ReformatReads" I instead get:
                  "Error: Could not find or load main class in=.opt.science.blah.fastq.gz"

                  I've tried it with the fastq file both in and out of the BBMap directory to see if it would help, but got the same error.

                  Any advice would be gratefully accepted

                  Roy.

                  Comment


                  • #39
                    Inaccurate sampling rate

                    Hello,

                    I have been using the reformat.sh script for a while (nice stuff!) but am running into an issue.

                    I need to get a specific number of reads from a file and am using the `--samplerate` option to do that. For example, if a file has 100 reads, and I need 10, I set the sample rate to 0.1. Unfortunately, it seems that for large files with very specific sample rates, the actual number of reads returned is not the product of the total reads and the sample rate. Here is an example output:

                    Code:
                    Executing jgi.ReformatReads [samplerate=0.5582187961,
                     in1=SRR2976833.fastq.gz, in2=, out=tempFile1.fastq.gz]
                    
                    Input is being processed as unpaired
                    Input:                          509774 reads            122293939 bases
                    Processed:                      284893 reads            68330214 bases
                    Output:                         284893 reads (55.89%)   68330214 bases (55.87%)
                    
                    Time:                           2.295 seconds.
                    Reads Processed:        284k    124.12k reads/sec
                    Bases Processed:      68330k    29.77m bases/sec
                    As you can see, the sample rate is set at 0.5582187961. Using the total number of reads, that would be 509774 * 0.5582187961 = 284565 (rounded down) reads once finished. However, the total read count is 284893, with the percentage being 55.89% instead of 55.82 as set.

                    Please let me know why this is happening and if there is a solution.

                    Thanks!
                    PJ

                    Comment


                    • #40
                      Inaccurate sampling rate output - reformat.sh

                      Hello,

                      I have been using the reformat.sh script for a while (nice stuff!) but am running into an issue.

                      I need to get a specific number of reads from a file and am using the `--samplerate` option to do that. For example, if a file has 100 reads, and I need 10, I set the sample rate to 0.1. Unfortunately, it seems that for large files with very specific sample rates, the actual number of reads returned is not the product of the total reads and the sample rate. Here is an example output:

                      Code:
                      Executing jgi.ReformatReads [samplerate=0.5582187961,
                       in1=SRR2976833.fastq.gz, in2=, out=tempFile1.fastq.gz]
                      
                      Input is being processed as unpaired
                      Input:                          509774 reads            122293939 bases
                      Processed:                      284893 reads            68330214 bases
                      Output:                         284893 reads (55.89%)   68330214 bases (55.87%)
                      
                      Time:                           2.295 seconds.
                      Reads Processed:        284k    124.12k reads/sec
                      Bases Processed:      68330k    29.77m bases/sec
                      As you can see, the sample rate is set at 0.5582187961. Using the total number of reads, that would be 509774 * 0.5582187961 = 284565 (rounded down) reads once finished. However, the total read count is 284893, with the percentage being 55.89% instead of 55.82 as set.

                      Please let me know why this is happening and if there is a solution.

                      Thanks!
                      PJ

                      Comment


                      • #41
                        Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0

                        Hi everyone!

                        I'm trying to use reformat.sh because I want to merge my forward.fastq and reverse.fastq sequences in an interleaved paired-end file. When using the following command I get this error, what is happening?

                        reformat.sh in1 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R1_001.fastq in2 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R2_001.fastq out = T3_interleaved.fastq
                        java -ea -Xmx200m -cp /home/alba/miniconda3/opt/bbmap-38.18/current/ jgi.ReformatReads in1 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R1_001.fastq in2 = /media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R2_001.fastq out = T3_interleaved.fastq
                        Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
                        at shared.PreParser.<init>(PreParser.java:71)
                        at shared.PreParser.<init>(PreParser.java:30)
                        at jgi.ReformatReads.<init>(ReformatReads.java:55)
                        at jgi.ReformatReads.main(ReformatReads.java:45)

                        Thanks in advance!
                        Alba

                        Comment


                        • #42
                          Alba: You need to make sure there are no spaces between input options. Try the following.

                          Code:
                          reformat.sh in1=/media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seqs/T3/f_P_171901_171082_T3_1822_S12_R1_001.fastq in2=/media/alba/TOSHIBA/Metagenomes_SP/PS_trimmed_seq/T3/f_P_171901_171082_T3_1822_S12_R2_001.fastq out=T3_interleaved.fastq

                          Comment


                          • #43
                            Chastity filter processing

                            I posted this message as a ticket in BBmap repository, but given the fact that I saw very little movement there I'm crossposting the same issue here. I hope I'm not bothering anyone.

                            When processing Illumina >1.8 reads, the reads are marked as filtered out or not. This is known as chastity filter. Usually, those reads are taken away and not used, but some times they are found in the FastQ files for some reason.

                            When using the reformat.sh tool to convert FastQ files to SAM files, there's a parameter that allows us to discard reads that contains ' 1:Y:' or ' 2:Y:'. But when the reads are not discarded, they are included in the SAM file and this information is lost. And this is a bug, as there is a place in the SAM file to keep this information and with the current implementation the information is wrong.

                            All reads whose chastity filter is 'Y' should have the SAM flag 512 set (which means that "read fails platform/vendor quality checks"). All other reads should have this flag not set. This should work also in the opposite direction, where a read with this flag set should generate a FastQ file with an 'Y'.

                            Related to this bug I have another comment. Documentation for the chastityfilter parameter says that it will discard all reads with ' 1:Y:' or ' 2:Y:'. That's good, but what happens with reads with other numbers like ' 3:Y:'? I'm having files with this nomenclature, so it would be better to really parse the fields and discard reads with an 'Y' in the second field, keeping the first field as is.

                            Did anyone had those issues? How did you overcome them?

                            Comment


                            • #44
                              That's good, but what happens with reads with other numbers like ' 3:Y:'? I'm having files with this nomenclature
                              Files with this nomenclature have become available in last few years where technologies like 10x are creating separate files for index reads. While this does not solve the problem permanently, you could change "3:Y" to "2:Y" temporarily and then change it back after using reformat.sh.

                              Comment


                              • #45
                                Originally posted by GenoMax View Post
                                Files with this nomenclature have become available in last few years where technologies like 10x are creating separate files for index reads. While this does not solve the problem permanently, you could change "3:Y" to "2:Y" temporarily and then change it back after using reformat.sh.
                                Sure. In fact, this is not a big deal to me. We decided that we will ignore this number and generate it in a more standard way (first end -> 1, second end -> 2, first UMI -> 3, second UMI -> 4), independently of their original numbering.

                                For context: we are converting FastQ files into unmapped CRAMs for storage, and the FastQ to SAM intermediate conversion is done with reformat.sh. My main issue here is keeping the QC vendor flag in place.

                                I already have a workaround, as I also have to keep other things like the UMIs (if present) and the barcode. But these bits of information imply adding tags, which are optional, so I'm not complaining about them. But the QC vendor flag is not optional. It is there. And not filling it means you are assigning a "QC vendor pass" independently of the information in the input.

                                In any case, if someone wants to take a look at how to keep all information from the FastQ file into a SAM file, the four fields if the comment in the ID line are candidates:
                                • The read end, which could be deducted later if you accept to standardize the output
                                • The QC vendor flag, which will be coded in the FLAGS field
                                • The control bits, which should be zero and potentially ignored (with no clear place to store them in case it is needed)
                                • The index barcode, which should be stored as the BC:Z: tag


                                The other data present in the FastQ file is already present in the SAM (read name, sequence and qualities).

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X