Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SSPACE help

    Hi everyone,

    I was just wondering what an f_tig meant in the .evidence file produced by SSPACE. Is it the original contigs that are put into a scaffold or something else? Thanks!

  • #2
    These are the original contigs (order is based on the order of the contig in the fasta file). The 'f' indicates that it has forward orientation in the final scaffold, the 'r' means the reverse orientation.

    Comment


    • #3
      Hi, boetsie, I have seen your reply. I am recently going to construct contig graph that is produced by SSPACE. Because SSPACE produce a file that store conitg link message. for exmple, f3545 has 23 links with r3245 and gap of -68 bases. It say that there are 68 gaps between f3545 and r3245, but what meaning the " - " in the front of 68? I do not understand its meaning.
      And I want to ask you other questions about scaffold. This is a record below. From it, I know that there are gap between r_tig3042 and f_tig3539, and its size is 615, but why merged 15? How do I understand it?
      scaffold6|size68841|tigs7
      f_tig3325|size6927|links7|gaps-694|merged25
      f_tig3146|size3331|links6|gaps-623
      r_tig3405|size10398|links5|gaps-621
      f_tig3266|size5457|links15|gaps-649
      f_tig3358|size8089|links8|gaps383
      r_tig3042|size2074|links5|gaps-615|merged15
      f_tig3539|size32219

      Thank you very much. I am looking forward from you.
      Best wishes for you.

      Yue Xu

      Comment


      • #4
        The negative gap indicates a potential overlap between the two contigs. However, it seems unlikely that there is 615bp overlap between the contigs, indicating that the insert size you've provided in the library file is not correct.

        To illustrate how this is estimated;

        Say you have a two contigs, contig1 of 1000bp and contig2 of 2000bp, one of your paired-read aligns at position 900 at contig1 and the other at position 100 on contig 2.

        If you set the insert size to 210bp, the estimated gap is;
        Provided insert size - ((size of contig1)-(position of read1 on contig1)) + (position of read2 on contig2). In this case it is;
        210 - (1000-900) + 100 = 10

        So a gap of 10bp. If we change the insert size to 2000, it is;

        2000 - (1000-900) + 100 = 1800

        If we change the insert size to 100, it is;

        100 - (1000-900) + 100 = -100

        As you can see, the estimated gap really depends on the provided insert size by the user.

        In your case I see a number of large negative gaps, this is highly unusual. Probably you should lower your insert-size by 600 bases.

        Regards,
        Boetsie

        Comment


        • #5
          Hi, thank your detailed reply, because of your reply, I understand how to calculate the gap between contigs in SSPACE. Thank you very much.
          But I have seen your writing formula:
          Provided insert size - (((size of contig1)-(position of read1 on contig1)) + (position of read2 on contig2))
          whether is it lack of a pair of bracket that I mark it in the type of bold and italic?

          yours sincercely,
          Yue Xu

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          57 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          56 views
          0 likes
          Last Post seqadmin  
          Working...
          X