Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sp24
    Member
    • May 2013
    • 14

    CAP3 for forward and reverse reads

    Hi Everyone,

    I'm trying to use CAP3. I have two small files of paired reads- one with forward reads and the other with reverse reads. I'm trying to use their manual and figuring out how I can specify forward and reverse reads, but I'm a little confused by it. I want to know if anyone else has used this and could help me out. I just don't understand what they mean by "dots" and how to go about doing that.

    The information from the manual that pertains to what I'm doing:


    Input to CAP3

    CAP3 takes as input a file of sequence reads in FASTA format.
    If the names of reads contain a dot ('.'), CAP3 requres that
    the names of reads sequenced from the same subclone contain
    the same substring up to the first dot.
    CAP3 takes two optional files: a file of quality values
    in FASTA format and a file of forward-reverse constraints.

    The file of quality values must be named "xyz.qual", and
    the file of forward-reverse constraints must be named "xyz.con",
    where "xyz" is the name of the sequence file.
    CAP3 uses the same format of a quality file as Phrap.

    Each line of the constraint file specifies one forward-reverse constraint
    of the form:

    ReadA ReadB MinDistance MaxDistance

    where ReadA and ReadB are names of two reads, and
    MinDistance and MaxDistance are distances (integers) in base pairs.
    The constraint is satisfied if ReadA in forward orientation occurs
    in a contig before ReadB in reverse orientation, or
    ReadB in forward orientation occurs in a contig before ReadA
    in reverse orientation, and their distance is between MinDistance
    and MaxDistance.
    CAP3 works better if a lot more constraints are used.

    We have a separate program named "formcon" to generate
    a constraint file from the sequence file.
    The program takes an input file of fragments in FASTA format
    and two integers (minimum distance and maximum distance in bp).
    The minimum distance and maximum distances specify a lower and
    a upper limit on the subclone length, respectively.
    It produces a file of forward-reverse constraints for CAP3.
    It is assumed that a pair of forward and reverse reads must
    contain a dot in their names and a pair of forward and reverse reads
    have a common name up to the first dot.
    Because CAP3 uses reads whose ends are clipped, instead of raw reads,
    to measure their distance, the distance seen by CAP3 could be different
    from the insert size by 1000 to 1500 bp. For example,
    if the insert size is 2000 to 3000 bp, we recommend that you use
    500 for the minimum distance and 4000 for the maximum distance.
    The results are in the file with name ending in ".con".

    Any help would be appreciated, thanks!
  • sklages
    Senior Member
    • May 2008
    • 628

    #2
    Hmm, somethings like this:

    cat mySeq.fasta
    >MyReadA.f
    ACGT
    >MyReadA.r
    TCGA
    >MyReadB.f
    ACGT
    >MyReadB.r
    TCGA
    plus the quality file.

    cat mySeq.con
    MyReadA.f MyReadA.r 1000 2000
    MyReadB.f MyReadB.r 1000 2000
    You can use for forward/reverse whatever is approbiate; I just used 'f' and 'r' as an example.

    Comment

    • sp24
      Member
      • May 2013
      • 14

      #3
      Thanks! Now I'm having issues creating that .con file.

      If I have those reads in fasta format, is there a script that would create the .con file for me? And would I need to change these headers? The headers are like this, I just have it simplified below.

      >D3NH4HQ1:107:C0LN7ACXX:1:1101:10356:54822 1:N:0:ATCACG

      My fasta file is:
      >xyz /1
      ATGC
      >xyz /2
      GCCC
      >abc /1
      TAAT
      >abc /2
      GGGC

      so with a file with hundreds of reads how can I extract that information?

      I would want:

      xyz /1 xyz /2 200 500
      abc /1 abc /2 200 500

      Comment

      • sklages
        Senior Member
        • May 2008
        • 628

        #4
        You have Illumina reads .. sure you want to use cap3?

        Maybe you should have a look at MIRA (http://sourceforge.net/apps/mediawiki/mira-assembler/).

        If you still want to use cap3 you have to write a tiny perl script (or sh or awk) to do this job for you.

        Comment

        • sp24
          Member
          • May 2013
          • 14

          #5
          Actually the reads represent genes. So one file of /1 reads and the other of /2 reads represent one gene, not a whole transcriptome.

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM
          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 05:37 AM
          0 responses
          6 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          17 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          51 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          110 views
          0 reactions
          Last Post SEQadmin2  
          Working...