Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    It turns out that a rather nice one is shipped with EVidence Modeler:


    .. in the utilities directory.

    Comment


    • #17
      There is a new utility included with cufflinks now, called "gffread", which should be able to extract spliced transcript sequences from a GFF/GTF. This is now documented on this page (scroll down to the "Extracting transcript sequences" section at the end):

      Comment


      • #18
        Originally posted by adarob View Post
        There is a new utility included with cufflinks now, called "gffread", which should be able to extract spliced transcript sequences from a GFF/GTF. This is now documented on this page (scroll down to the "Extracting transcript sequences" section at the end):

        http://cufflinks.cbcb.umd.edu/gff.html
        The problem with this script is that it returns the reference DNA sequence (from the genome fasta file used for mapping) for the transcripts and not the assembly from the mapped reads.

        Does anyone know of a tool that can give the assembled transcript sequence from the RNA-seq data? Or can someone help me if I'm misunderstanding GFFRead?

        Thanks!

        Comment


        • #19
          Originally posted by Anelda View Post
          The problem with this script is that it returns the reference DNA sequence (from the genome fasta file used for mapping) for the transcripts and not the assembly from the mapped reads.

          Does anyone know of a tool that can give the assembled transcript sequence from the RNA-seq data? Or can someone help me if I'm misunderstanding GFFRead?
          Here is the process that I am using:

          samtools mpileup -uf ref.fa accepted_hits.bam | bcftools view -cg - | vcfutils.pl vcf2fq | fq2fa.pl > new_ref.fa
          gffread -w transcripts.fa -g new_ref.fa transcripts.gtf

          where fq2fa.pl is a bioperl script to convert from fq to fasta

          I also have an email into the cufflinks developers to see if there is a way that the gffread utility can be enhanced to get the consensus sequence from the bam file and not the reference genome.

          Comment


          • #20
            the cufflinks only give us the rebuild assembled sequence based on the reference genome, but i want to know how to get unmapped contigs also , only that we could get some new transcripts.

            Comment


            • #21
              Originally posted by huangjun View Post
              the cufflinks only give us the rebuild assembled sequence based on the reference genome, but i want to know how to get unmapped contigs also , only that we could get some new transcripts.
              Its my understanding that nothing tophat/cufflinks will do this. You'll something like trinity, ABySS/trans-ABySS, or Velvet/Oases to do de novo transcriptome assembly. Tophat and cufflinks require the reference sequence.

              Comment


              • #22
                Hi guys,
                this thread was really useful for creating my script for extracting consensus sequences from the bam file. I tried to use the new samtools mpileup but it didnt do the job that i wanted. I tried to use vcf-tools's vcf-consensus but it can only extract the whole chromosome. I couldnt make it extract only a region. So went back to samtools-0.1.16 'pileup -c' option. I didnt use the reference sequnce tho cos is more complicated. So i just parse the result of the command:

                samtools pileup -c bam.file

                I take the transcript.gtf from cufflinks and for every transcript extract the consensus from the pileup. Well this task by it self sounds simple enough but its not. Cos cufflinks doesnt not provide the starts and ends as found in the bam for each found transcript but the start and end found in the annotation. So one has to search for the 'real' coordinates of the transcripts in the pile up first then extractions is easy. Well i have to admit i didnt try to run cufflinks without reference and then see the output. Maybe then transcripts.gtf will contain the true coordinates.

                Now my script is working but is slow for big bams cos then the pileup is also big and parsing it takes time. Reading about multi-threading at the moment. Or using PDL's PP programming to make it a bit faster.

                Comment


                • #23
                  Originally posted by kenietz View Post
                  Hi guys,
                  this thread was really useful for creating my script for extracting consensus sequences from the bam file. I tried to use the new samtools mpileup but it didnt do the job that i wanted. I tried to use vcf-tools's vcf-consensus but it can only extract the whole chromosome. I couldnt make it extract only a region. So went back to samtools-0.1.16 'pileup -c' option. I didnt use the reference sequnce tho cos is more complicated. So i just parse the result of the command:

                  samtools pileup -c bam.file

                  I take the transcript.gtf from cufflinks and for every transcript extract the consensus from the pileup. Well this task by it self sounds simple enough but its not. Cos cufflinks doesnt not provide the starts and ends as found in the bam for each found transcript but the start and end found in the annotation. So one has to search for the 'real' coordinates of the transcripts in the pile up first then extractions is easy. Well i have to admit i didnt try to run cufflinks without reference and then see the output. Maybe then transcripts.gtf will contain the true coordinates.

                  Now my script is working but is slow for big bams cos then the pileup is also big and parsing it takes time. Reading about multi-threading at the moment. Or using PDL's PP programming to make it a bit faster.
                  Hi Kenietz

                  Can you upload your script for doing the things it will be great as I am also trying the same thing to do

                  Regards

                  Comment


                  • #24
                    Hi figo1019,
                    here it is. Hope it does the job for you.

                    I've put some explanations inside the script. Please take a look before using it.

                    Comments and questions are also welcomed

                    Cheers
                    Attached Files

                    Comment


                    • #25
                      Hi,
                      just figured out that my work around does not work for paired-end reads
                      Somehow pileup -c removes some reads except these ones:

                      0x0100 s the alignment is not primary
                      0x0200 f the read fails platform/vendor quality checks
                      0x0400 d the read is either a PCR or an optical duplicate

                      I suppose i will have to use directly the command: samtools pileup BAM > pileup.out, then parse the output. Its a bit more tricky but i think will do the job.
                      Will update when im done.

                      Comment


                      • #26
                        Originally posted by kenietz View Post
                        Hi,
                        just figured out that my work around does not work for paired-end reads
                        Somehow pileup -c removes some reads except these ones:

                        0x0100 s the alignment is not primary
                        0x0200 f the read fails platform/vendor quality checks
                        0x0400 d the read is either a PCR or an optical duplicate

                        I suppose i will have to use directly the command: samtools pileup BAM > pileup.out, then parse the output. Its a bit more tricky but i think will do the job.
                        Will update when im done.
                        Hi
                        I am just wondering have you fixed your script for paired end reads? I would like to try your script to extract the consensus sequences from bam file based on cufflinks gtf coordinates.

                        Comment


                        • #27
                          Hi,
                          well i created a script which extracts a consensus seq. I think it works but you may try and see if it works for you. Comments and suggestions are welcomed.

                          I attach here the script.

                          But you need to modify bam_pileup.c in samtools 0.1.18 directory and then recompile, then rename the compiled binary to samtools_ndsm(the name i used in the script) then put the binary in your PATH.

                          In BAM_PILEUP.C you have to modify the function bam_plp_t bam_plp_init. Just after the line:

                          iter->flag_mask = BAM_DEF_MASK

                          you have to add the following lines:

                          iter->flag_mask &= -BAM_FSECONDARY;
                          iter->flag_mask &= -BAM_FDUP;

                          Else provide me with email and i will send all of it to you. I tried to upload but the binary is bigger than the allowed size.

                          Hope it does the job for you
                          Cheers
                          Attached Files

                          Comment


                          • #28
                            Originally posted by kenietz View Post
                            Hi,
                            well i created a script which extracts a consensus seq. I think it works but you may try and see if it works for you. Comments and suggestions are welcomed.

                            I attach here the script.

                            But you need to modify bam_pileup.c in samtools 0.1.18 directory and then recompile, then rename the compiled binary to samtools_ndsm(the name i used in the script) then put the binary in your PATH.

                            In BAM_PILEUP.C you have to modify the function bam_plp_t bam_plp_init. Just after the line:

                            iter->flag_mask = BAM_DEF_MASK

                            you have to add the following lines:

                            iter->flag_mask &= -BAM_FSECONDARY;
                            iter->flag_mask &= -BAM_FDUP;

                            Else provide me with email and i will send all of it to you. I tried to upload but the binary is bigger than the allowed size.

                            Hope it does the job for you
                            Cheers
                            Thanks. I have sent a PM to you........

                            Comment


                            • #29
                              Originally posted by kenietz View Post
                              Hi,
                              well i created a script which extracts a consensus seq. I think it works but you may try and see if it works for you. Comments and suggestions are welcomed.

                              I attach here the script.

                              But you need to modify bam_pileup.c in samtools 0.1.18 directory and then recompile, then rename the compiled binary to samtools_ndsm(the name i used in the script) then put the binary in your PATH.

                              In BAM_PILEUP.C you have to modify the function bam_plp_t bam_plp_init. Just after the line:

                              iter->flag_mask = BAM_DEF_MASK

                              you have to add the following lines:

                              iter->flag_mask &= -BAM_FSECONDARY;
                              iter->flag_mask &= -BAM_FDUP;

                              Else provide me with email and i will send all of it to you. I tried to upload but the binary is bigger than the allowed size.

                              Hope it does the job for you
                              Cheers
                              Hi Kenietz

                              I have send you a personal message

                              Regard

                              Comment


                              • #30
                                Originally posted by figo1019 View Post
                                Hi Kenietz

                                I have send you a personal message

                                Regard
                                Hi Kenietz

                                I've tried to implement what you've suggested by adding the extra lines to samtools-0.1.18, saving it as samtools_ndsm and re-compiling it by cd into the bin where I keep samtools, typing
                                make
                                make[2]: Nothing to be done for `lib'.
                                make[2]: Nothing to be done for `lib'.
                                make[2]: Nothing to be done for `lib'.
                                gcc -g -Wall -O2 -o samtools bam_tview.o bam_plcmd.o sam_view.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o kaln.o bam2bcf.o bam2bcf_indel.o errmod.o sample.o cut_target.o phase.o bam2depth.o -Lbcftools libbam.a -lbcf -lcurses -lm -lz
                                gcc -g -Wall -O2 -o bcftools call1.o main.o ../kstring.o ../bgzf.o ../knetfile.o ../bedidx.o -L. -lbcf -lm -lz
                                make[1]: Nothing to be done for `all'.

                                I don't know what this means?

                                I continued with this command: samtools mpileup -m 100000 accepted_hits.bam > parsed.bam
                                [mpileup] 1 samples in 1 input files
                                <mpileup> Set max per-file depth to 8000

                                I do get a file parsed.bam, which I then used with the script.

                                Followed by this script which I've downloaded:
                                perl get_consensus_bam_batch_v3.pl -b parsed.bam -t transcripts.gtf > consensus_sequences
                                sh: samtools-0.1.18/samtools_ndsm: Permission denied

                                I do get a file with this in it:
                                E2_Tophat/parsed.bam;MDC000004.264:487-777;/Data_Analysis/E2_data/E2_Tophat/parsed.bam.v3.pileup

                                Clearly there's something wrong somewhere? Any pointers???

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM
                                • seqadmin
                                  Recent Developments in Metagenomics
                                  by seqadmin





                                  Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                  09-23-2024, 06:35 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 10-02-2024, 04:51 AM
                                0 responses
                                104 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-01-2024, 07:10 AM
                                0 responses
                                112 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 09-30-2024, 08:33 AM
                                1 response
                                115 views
                                0 likes
                                Last Post EmiTom
                                by EmiTom
                                 
                                Started by seqadmin, 09-26-2024, 12:57 PM
                                0 responses
                                21 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X