Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with cmpfastq, can't process my .fastq /1 and /2 files

    Hi,

    I am having a problem using cmpfastq, even if I've been using it reliably for months.

    Normally, I can grab my trimmed 1_1.fastq and 1_2.fastq, process it through cmpfastq, and get out my .common.out and .unique.out files for downstream processes. However, a couple data sets are really giving my trouble... the cmpfastq spits out all error messages for every line of .fastq and fails to generate the appropriate files.

    Here is a sample of the output data:

    BEGIN cmpfastq3 on TpruniS3_1.trimmed TpruniS3_2.trimmed at Wed Oct 10 15:00:08 EDT 2012
    Could not match the sequence ID from the name: @M00649:2:000000000-A1721:1:1101:17085:1532/2
    Could not match the sequence ID from the name: TACTCCTACTGCGCAGCAATATTATTCTTTCGTTAGAGCTAAAAGGCAGAGTGGGAATCGAACCCACTTCGTTAGATTTGCAATC
    Could not match the sequence ID from the name: +
    Could not match the sequence ID from the name: 555??BBDDDDDDBDCCFFFFEFI;BEFHIIHFHFHH@@GHHIFHHHFEFH8CD@@BFD@EFHCEEHECFFHIIFHHDFGHIIHH
    Could not match the sequence ID from the name: @M00649:2:000000000-A1721:1:1101:16787:1535/2
    Could not match the sequence ID from the name: TAGACGTTTAAGTGACACCGAAAGAAGAAAGAGCTTTGTAGATGCTTAGCGCGGTCTACGAGCCTGGCGGATCAGAAAGCGGAAG
    Could not match the sequence ID from the name: +
    Could not match the sequence ID from the name: 5<?????DDDDDBDBFFFFFFHDACFHFHHB=CFDGHHHEDGGFGFGGHIHHC>EDEHHHHHHHB@?DHHCHHFFHHD=F;A@EE
    Could not match the sequence ID from the name: @M00649:2:000000000-A1721:1:1101:14795:1537/2
    Could not match the sequence ID from the name: AACGGAGCGAAGGATTTTAGCTTCACGAATTTCCCAAACTTGGCGAGGTCCTGTGTCGATTCCCGGACTTCCTTGGTCTTTGCGCC
    Could not match the sequence ID from the name: +
    Could not match the sequence ID from the name: 5<????@DDDDBDDBFFFFFFIIIHIIHHEHIHIIIFHHH/AFFCH++?EE?EFGGHHFF-CA-5CEEAGH,CCDF@DBGDFFCEE


    Does anyone have an idea?

    Thanks for the help!

  • #2
    Neverending Illumina format changes

    I don't really know anything about 'cmpfastq' but I've had a look at the source code:


    From what I can tell, it expects the ID line to match this pattern /^@(.*)#.*/
    which means an @ followed by some chars, then a # followed by some chars.

    Your IDs do not fit this pattern, because you don't have the #xxxxx part.

    Illumina used to use #AGCTCG to denote barcodes in multiplex samples. These days it uses a different format, or doesn't print it at all.

    To make it work with your data, change it to /^@(.*)(#.*)?/ or /^@(.*)/

    Good luck.

    Comment


    • #3
      Thank you very much for the reply. You have correctly identified the problem, and I can now resolve it to work with MiSeq reads. Thanks again for the insight!

      Comment


      • #4
        Hello. Im having the same probl;em and i tried changing the pattern to match my header but it posted all my reads to a unique file where as common files remains empty. please help

        Comment


        • #5
          What exactly are you trying to do? I have a program called "filterbyname" that can probably do it...

          Comment


          • #6
            Pairing of fastq files(F/R)

            Im trying to pair my fastq files after quality filtering and trimming of those files via FASTQC. My files look like these:

            mexD1B_filt_trim_1.fastq <==
            @MexD1BSRR1562087.10.1/1
            GAGCTAGATCAGCACCATATATTACACGATGATCAGCTGTAACATTTACCTGCATCTGGTTCTTCATTCCTATCCGACCATCCTTGG
            +SRR1562087.10.1/1
            JJJJJJIIJJJJJJJJIJJJJJJJJJJJJJJJIJJJJJJJGIIJJJJIJJJJJJJJJIJJJJDHIHHHHHHHFDFFDDDDDDDDD>C
            @MexD1BSRR1562087.11.1/1
            AGGTTGACTATGGTCCAGGCCATGCCAGGAGAGCAACCGAAAACAGAGAGAACGGTAAGCCAGGAGAAGAACAGTATGAGTATATAG
            +SRR1562087.11.1/1
            IJJGHIJIIIFIBHHGAFHGGIHJIJGJEGIGGGHGIJJJJHHGFEFEDACEEDDBDBCCCDDDDDDBDDDCDDCADDDCCCDDDDD
            @MexD1BSRR1562087.15.1/1
            TAACATCCACAATCTCCTTCTACCCAAGAAGTCTGGAACTTCAGCATCAAAGGCTGGTGATGACGACAACTAATCCATTTACTGAAT



            ==> mexD1B_filt_trim_2.fastq <==
            @MexD1BSRR1562087.7.2/2
            CCTGTAGATATACGTACTGCCAAAGGGTAGATAGTTGCCCATCTCAGAAAACACAACTTCAACAGCCAAGATTAATATCCATGTGAT
            +SRR1562087.7.2/2
            IJJJGGJBHIJJGHHHIIHJJGJGJIIDFHIJIJJJGHJJJJJJJIJGIGH@FHJIJIHIIIHHH=BDFFAEECCEEFDEDDCDCA>
            @MexD1BSRR1562087.9.2/2
            GTAATCCAAATAAGGTATACTCACTCATCGGAGGATTTTGTGCTTCCCCTGTGAATTTCCACGCTAAGGATGGCTCCGGCTATAAAT
            +SRR1562087.9.2/2
            JIJIIJJJGGIIJIBC@FH@HHJGIJGCHGIEGIFHDFHJIJIJIHHIIIIJGGHHHHHCDDFDDDBDDDDDDDCDBDDBD@CDCEE
            @MexD1BSRR1562087.11.2/2
            GAAACACTGATTGGTTCACGTATCCAGGTGTATGGACCACCTATATACTCATACTGTTCTTCTCCTGGCTTACCGTTCTCTCTGTTT

            Comment


            • #7
              @safina: You should use a program called repair.sh that is part of BBMap package. Brian has an example posted here: http://seqanswers.com/forums/showpos...0&postcount=45

              Your command would look something like this:
              Code:
              $ repair.sh in1=mexD1B_filt_trim_1.fastq in2=mexD1B_filt_trim_2.fastq out1=mexD1B_filt_trim_1_fixed.fq out2=mexD1B_filt_trim_2_fixed.fq outsingle=single.fq

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 07:03 AM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-10-2024, 06:35 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-09-2024, 02:46 PM
              0 responses
              26 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Working...
              X