Header Leaderboard Ad

Collapse

extracting predicted gene from scaffold: end position precedes start position

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • extracting predicted gene from scaffold: end position precedes start position

    I am trying to extract sequences for a list of predicted genes from genomic scaffolds. The list of predicted genes with Scaffold IDs, start and end positions, and other info comes from published supplementary data. My script to extract the sequences doesn't work because for some genes, the start position is a larger number than the end position (fourth-to-last and third-to-last columns below). Here is an example (numbers have been changed from original):
    geneID Gene_family Class ScaffoldID start_position end_position Number_of_exons Annotation_status
    CSP1 cs Protein candidate gi|294506227|gb|GL650210.1| 61498 52100 2 intact
    CSP10 cs Protein candidate gi|294507212|gb|GL649715.1| 293074 297989 2 intact
    CSP2 cs Protein candidate gi|294507210|gb|GL650017.1| 234944 236074 2 intact
    CSP3 cs Protein candidate gi|294507295|gb|GL649612.1| 323100 323743 2 intact
    CSP4 cs Protein candidate gi|294506227|gb|GL650210.1| 41911 40888 2 intact
    CSP5 cs Protein candidate gi|294507205|gb|GL649712.1| 274408 272617 2 intact
    I am new to working with annotated genomes. Does it make sense that the some "starts" come after the "ends"? Is this because the ORF for this gene is on the opposite strand of the scaffold? If so, and if I want to obtain that sequence, what's the best way to get it--should I extract the sequence in the scaffold between the two numbers and then find the reverse complement?

    Thanks for any pointers.

  • #2
    Some genes transcribed from opposite strand of DNA, resulting in reverse coordinates. You can add additional column (i.e. strand) adding '+' in cases when start_position < end_position and '-' start_position > end_position.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
      by seqadmin



      Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
      03-21-2023, 01:49 PM
    • seqadmin
      Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
      by seqadmin




      Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
      03-10-2023, 05:31 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 01:40 PM
    0 responses
    7 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-29-2023, 11:44 AM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-24-2023, 02:45 PM
    0 responses
    20 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 03-22-2023, 12:26 PM
    0 responses
    28 views
    0 likes
    Last Post seqadmin  
    Working...
    X