Header Leaderboard Ad

Collapse

BFAST input format for paired end reads

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BFAST input format for paired end reads

    I would like to know what I should do to input paired end reads to the BFAST software. The script provided, qseq2fastq.pl requires qseq files but I only have the sequence files from the Illumina machine named
    e.g.
    s_1_1_sequence.txt containing reads from 1st read in lane 1
    s_1_2_sequence.txt containing reads from 2nd read in lane 1

    I can easily convert to fastq using maq sol2sanger, but I understand that the BFAST FASTQ format is different from the standard FASTQ. The manual states it requires the pairs to be listed in order of 5' to 3' and on the same strand. It also requires them to have the same name whereas at the moment they have the same names but slightly different suffixes of #0/1 for read 1 and #0/2 for read 2.

    Does anyone know a quick way to get my reads into the right format? Thanks

  • #2
    Just to clarify - you want to interleave the two paired FASTQ files (F1,F2,F3,... and R1,R2,R3,...) into one file where they alternate (F1,R1,F2,R2,F3,R3,...) but also remove the "/1" and "/2" suffices on the forward and reverse read identifiers to make them the same?

    I'd write a script to do this in your language of choice (e.g. Perl perhaps with BioPerl, or Python with Biopython, etc).

    Comment


    • #3
      If you like Python, you could try something like this (untested):

      HTML Code:
      #This Python script requires Biopython 1.51 or later
      from Bio import SeqIO
      import itertools
      
      #Setup variables (could parse command line args instead)
      file_f = "s_1_1_sequence.txt"
      file_r = "s_1_2_sequence.txt"
      file_out = "interleaved.fastq"
      
      def interleave(iter1, iter2) :
          for (forward, reverse) in itertools.izip(iter1,iter2):
              assert forward.id.endswith("/1")
              assert reverse.id.endswith("/2")
              #Remove the /1 and /2 from the identifiers,
              forward.id = forward.id[:-2]
              reverse.id = reverse.id[:-2]
              assert forward.id == reverse.id
              yield forward
              yield reverse
      
      records_f = SeqIO.parse(open(file_f,"rU"), "fastq-illumina")
      records_r = SeqIO.parse(open(file_r,"rU"), "fastq-illumina")
      
      handle = open(file_out, "w")
      count = SeqIO.write(interleave(records_f, records_r), handle, "fastq-sanger")
      handle.close()
      print "%i records written to %s" % (count, file_out)
      Based on the Biopython example here:
      http://news.open-bio.org/news/2009/1...ith-biopython/

      Note - I'm assuming you have Illumina 1.3+ FASTQ files, not Solexa style FASTQ files. See http://en.wikipedia.org/wiki/FASTQ_format and http://nar.oxfordjournals.org/cgi/co...stract/gkp1137 or for search the forum for details.
      Last edited by maubp; 12-16-2009, 07:18 AM. Reason: Adding link

      Comment


      • #4
        ok, thanks, I am learning to use perl, not familiar with python, so I will write a script to do this in perl.

        Just to clarify, do you know whether I need to alter the reverse read to make the sequence on the same strand as the forward read?

        eg if I have in the sequence.txt file
        Forward read AAAATTT
        Reverse read CCGGGG

        I need to interleave them as:
        AAAATTT
        CCCCGG
        is this correct?

        Comment


        • #5
          Originally posted by lindseyjane View Post
          ok, thanks, I am learning to use perl, not familiar with python, so I will write a script to do this in perl.
          Fair enough - you may find BioPerl helpful, it has built in FASTQ support.
          Originally posted by lindseyjane View Post
          Just to clarify, do you know whether I need to alter the reverse read to make the sequence on the same strand as the forward read?
          Maybe. I haven't used BFAST so don't know. It shouldn't be too hard to do it if required. Again, BioPerl will have built in reverse complement code, but don't forget to reverse the qualities too.

          Comment


          • #6
            Thanks for all your help and rapid responses to my queries

            Comment

            Latest Articles

            Collapse

            • seqadmin
              A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
              by seqadmin


              ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

              01-24-2023, 01:19 PM
            • seqadmin
              Introduction to Single-Cell Sequencing
              by seqadmin
              Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

              The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
              ...
              01-09-2023, 03:10 PM
            • seqadmin
              AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
              by seqadmin
              Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

              Read type and length
              AVITI is a short-read benchtop sequencer that also offers an innovative...
              12-29-2022, 10:44 AM

            ad_right_rmr

            Collapse
            Working...
            X