Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • De-Novo Transcript assembly for RNA-Seq

    Hi all,

    I have RNA-Seq data with me. I have run tophat and cufflinks on it. My ultimate goal is to make transcript fasta file using the cufflinks assembly. I have the gtf output now and am stuck at that point. Please help how to continue from here.

    Thank you in advance.
    Deepak

  • #2
    You should check out bedtools to go from a set of coordinates like in a gtf file to a set of fasta sequences.

    Comment


    • #3
      hi..
      i did not understand what does bedtools meant by and can you please name some of them..??
      Thank you..
      Deepak

      Comment


      • #4
        Sure. Bedtools is a program for comparing and manipulating genomic coordinates in various ways.



        It works with gtf files as well and the command fastaFromBed will take a set of coordinates and extract the specific sequences from another fasta file. In your case, you'd use the gtf file from cufflinks and the genome fasta file you used for mapping with Tophat.

        Although, if you are really trying to do a de novo assembly, a program like Trinity or Oases might be better depending on your organism, genome size and complexity, read depth, etc.

        Comment


        • #5
          Thanks alot..
          i will go through oases and the link you mentioned.
          And repost if i have any further doubt.

          Comment


          • #6
            hi i tried using bedtools fastafromBed to make transcripts from gtf file and when i give the fasta file of genome. it gives an error
            index file supercontigs.fa.fai not found, generating...
            ERROR: mismatched line lengths at line 11214 within sequence Contig200
            File not suitable for fasta index generation.
            Please help with this
            thank you..
            Deepak

            Comment


            • #7
              You should post this on the Bedtools discussion group here:

              Comment


              • #8
                Cufflinks package has a very good binary called "gffread" to extract transcript sequences. Ths most common command would be
                "gffread YOURFILE.gtf -g GENOME.fa -s CHROM.size -w YOURFILE.fa"

                Here the CHROM.size file simply contains information about each chromosome name and its size in bp (tab separated). eg
                chr1 2345671
                chr2 6765516

                YOURFILE.fa is your output file containing the sequences of the transcripts. Give a look to the options of gffread for further help "gffread --help".

                Comment


                • #9
                  Hi swaraj,
                  Thank you for the info.
                  I have done as directed. But now i get an error saying.

                  No fasta index found for smed_contigs.fa. Rebuilding, please wait..
                  Error: sequence lines in a FASTA record must have the same length!
                  can anyone please help me address this..
                  Thank you in advance

                  Comment


                  • #10
                    It is a problem with the formatting of your fasta file. You should stick to using the genome fasta file downloaded from UCSC for your organism. If the genome is not available try to format your fasta file where each line in each sequence should have the same number of bases. The problem arises when you have a situation like

                    >SeqA
                    ATTTCAGGGG
                    ATTCGGCGGGATT
                    AGGGCTCTCT
                    >SeqB
                    ATTTCGGAATT
                    ATTCCGGATAG
                    ATTGCTCC

                    Try to use Bioperl SeqIO to reformat your file.

                    Comment


                    • #11
                      I have transcripts from trinity for human data and i also have transcripts from tophat and cufflinks for the same human data.I have to find the novel transcripts from these?how can i proceed can some one help me.

                      Comment


                      • #12
                        Try to use cuffcompare utility from the cufflinks package to compare transcripts gtf file against a gtf of known proteins. The cuffcompare binary gives a tracking file as result which can be parsed to identify the novel transcripts. Look into the cuffcompare documentation for more details.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Exploring the Dynamics of the Tumor Microenvironment
                          by seqadmin




                          The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                          07-08-2024, 03:19 PM
                        • seqadmin
                          Exploring Human Diversity Through Large-Scale Omics
                          by seqadmin


                          In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                          06-25-2024, 06:43 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 07-16-2024, 05:49 AM
                        0 responses
                        26 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 07-15-2024, 06:53 AM
                        0 responses
                        32 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 07-10-2024, 07:30 AM
                        0 responses
                        40 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 07-03-2024, 09:45 AM
                        0 responses
                        205 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X