Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Annotation for contigs from de novo assembly

    Hi,

    I want to annotate my assembled contigs (from de novo assembly). I used BLASTX and only got 10~20% percentage of hits(evalue=1e-5). Now all my differentially expressed contigs (genes) have no annotation. At least I want to know what these genes are, e.g, signaling, transmembrane etc.

    Thanks a lot!
    Victoria

  • #2
    I'd give Prokka a try:

    Comment


    • #3
      Provided Victoria is working with a prokaryotic genome

      NCBI has a eukaryotic annotation pipeline: http://www.ncbi.nlm.nih.gov/genome/a...n_euk/process/ and a prokaryotic one: https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ If I recall right, you will have to make the sequence public though at some point in time if you use these.

      Other eukaryotic options (have not used myself):

      Pasa: http://pasa.sourceforge.net/
      Maker: http://www.gmod.org/wiki/MAKER

      Comment


      • #4
        I think Blast2GO would also be useful

        Comment


        • #5
          I've also had good experience with Blast2GO, it doesn't require installation and is quite easy to handle. Also, they updated the quite ugly colours of their pie charts

          Comment


          • #6
            Hi,

            Thank you for your reply. I understand that blast2go (see the below link) just used blast result so basically it won't provide more annotated contigs than BLASTX that I did, is it correct?

            Provides bioinformatics support mainly in second generation DNA sequencing data analysis.


            The organism I want to annotate is the protist, Oxyrrhis Marina.

            Thank you!
            Victoria

            Comment


            • #7
              RAST annotation.
              Krishna

              Comment


              • #8
                Hi Victoria, I guess you could use several databases to increase your chances of annotation. What databases have you used? I don't have experience with protists but in general a good start could be to compare against GenBank and Uniprot's Swiss-Prot and TrEMBL protein databases. Have you tried a less conservative e-value? Also try to download similar species that are annotated to compare directly. This reference may help you

                Background Anopheles funestus is one of the primary vectors of human malaria, which causes a million deaths each year in sub-Saharan Africa. Few scientific resources are available to facilitate studies of this mosquito species and relatively little is known about its basic biology and evolution, making development and implementation of novel disease control efforts more difficult. The An. funestus genome has not been sequenced, so in order to facilitate genome-scale experimental biology, we have sequenced the adult female transcriptome of An. funestus from a newly founded colony in Burkina Faso, West Africa, using the Illumina GAIIx next generation sequencing platform. Methodology/Principal Findings We assembled short Illumina reads de novo using a novel approach involving iterative de novo assemblies and “target-based” contig clustering. We then selected a conservative set of 15,527 contigs through comparisons to four Dipteran transcriptomes as well as multiple functional and conserved protein domain databases. Comparison to the Anopheles gambiae immune system identified 339 contigs as putative immune genes, thus identifying a large portion of the immune system that can form the basis for subsequent studies of this important malaria vector. We identified 5,434 1∶1 orthologues between An. funestus and An. gambiae and found that among these 1∶1 orthologues, the protein sequence of those with putative immune function were significantly more diverged than the transcriptome as a whole. Short read alignments to the contig set revealed almost 367,000 genetic polymorphisms segregating in the An. funestus colony and demonstrated the utility of the assembled transcriptome for use in RNA-seq based measurements of gene expression. Conclusions/Significance We developed a pipeline that makes de novo transcriptome sequencing possible in virtually any organism at a very reasonable cost ($6,300 in sequencing costs in our case). We anticipate that our approach could be used to develop genomic resources in a diversity of systems for which full genome sequence is currently unavailable. Our An. funestus contig set and analytical results provide a valuable resource for future studies in this non-model, but epidemiologically critical, vector insect.


                Dave

                Comment


                • #9
                  You can try the Trinotate pipeline. It involves several tools (TransDecoder to get plausible ORFs, PFAM, HMMER, signalIP, tmHMM, RNAmmer) to obtain a quite complete annotation report. They give a lot of details on the website on how to use it.

                  Comment


                  • #10
                    Run a gene prediction tool (e.g. prodigal) over it, throw the proteins in InterproScan, and check if you get anything interesting for your analysis.

                    Might as well be good to know how long the contigs are.
                    Will not be of much use to annotate stuff, which is considerable less long than 900 bp.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Advanced Tools Transforming the Field of Cytogenomics
                      by seqadmin


                      At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
                      09-26-2023, 06:26 AM
                    • seqadmin
                      How RNA-Seq is Transforming Cancer Studies
                      by seqadmin



                      Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
                      09-07-2023, 11:15 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 07:14 AM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-29-2023, 09:38 AM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-27-2023, 06:57 AM
                    0 responses
                    14 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-26-2023, 07:53 AM
                    0 responses
                    31 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X