Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • camelbbs
    Member
    • Jun 2011
    • 49

    Two questions

    1. I want to ask a question about bam files.

    I have 2 sequencing library in a same sample, and get 2 fastq files, the length of reads are 50bp and 36bp separately.
    When I do tophat, because I need to specify the -r, I cannot merge the two fastq files. But after I got the accepted.bam files, can I merge them (bam files) with the samtools merge?

    I need to do cufflinks and cuffdiff using the merged bam files.

    2. I see the parameter of cuffdiff is
    cuffdiff transcripts.gtf 1.bam 2.bam

    Does this transcritpts.gtf is the output of cufflinks or just the reference transcript annotation?


    thanks everyone.
  • cjp
    Member
    • Jun 2011
    • 58

    #2
    Originally posted by camelbbs View Post
    1. I want to ask a question about bam files.

    I have 2 sequencing library in a same sample, and get 2 fastq files, the length of reads are 50bp and 36bp separately.
    When I do tophat, because I need to specify the -r, I cannot merge the two fastq files. But after I got the accepted.bam files, can I merge them (bam files) with the samtools merge?

    I need to do cufflinks and cuffdiff using the merged bam files.

    2. I see the parameter of cuffdiff is
    cuffdiff transcripts.gtf 1.bam 2.bam

    Does this transcritpts.gtf is the output of cufflinks or just the reference transcript annotation?


    thanks everyone.
    I guess the sequences are not paired end, so you can't align the FastQ files in the same TopHat command. In that case, you can always merge two BAM files with 'samtools merge' or 'picard MergeSamFiles':



    You can use either a reference GTF file or the output from cufflinks. If you want novel transcripts, then do cufflinks first, but if you only want expression from known genes, you can just do cuffdiff with a GTF file downloaded from ensembl, UCSC, etc.

    Chris

    Comment

    • camelbbs
      Member
      • Jun 2011
      • 49

      #3
      Thanks very much. But the sequences are paried end. Because one sample have several libraries, and the sequencing length is different between the libraries. So we just first to get the bam files by tophat -r xxx -G hg19_ucsc.gtf ERR001_1.fastq ERR001_2.fastq

      and then merge all the bam files that not belong to the sample library, but belong to the same sample. Is that right? Thanks
      Last edited by camelbbs; 10-24-2011, 12:01 PM.

      Comment

      • camelbbs
        Member
        • Jun 2011
        • 49

        #4
        Originally posted by cjp View Post
        I guess the sequences are not paired end, so you can't align the FastQ files in the same TopHat command. In that case, you can always merge two BAM files with 'samtools merge' or 'picard MergeSamFiles':



        You can use either a reference GTF file or the output from cufflinks. If you want novel transcripts, then do cufflinks first, but if you only want expression from known genes, you can just do cuffdiff with a GTF file downloaded from ensembl, UCSC, etc.

        Chris
        And If we use the output from cufflinks, there will be two gtf files when we work on two samples. So how to input these two files into the cuffdiff. thanks very much for your help

        Comment

        • cjp
          Member
          • Jun 2011
          • 58

          #5
          Originally posted by camelbbs View Post
          Thanks very much. But the sequences are paried end. Because one sample have several libraries, and the sequencing length is different between the libraries. So we just first to get the bam files by tophat -r xxx -G hg19_ucsc.gtf ERR001_1.fastq ERR001_2.fastq

          and then merge all the bam files that not belong to the sample library, but belong to the same sample. Is that right? Thanks
          Yes, you can merge BAM files from multiple sequencing runs if they are the same sample even if they have a different read length.

          Comment

          • cjp
            Member
            • Jun 2011
            • 58

            #6
            Originally Posted by camelbbs

            And If we use the output from cufflinks, there will be two gtf files when we work on two samples. So how to input these two files into the cuffdiff. thanks very much for your help

            Cufflinks provides some software called gffread - from gffread -h, there are these options:

            -M/--merge : cluster the input transcripts into loci, collapsing matching
            transcripts (those with the same exact introns and fully contained)
            --cluster-only: same as --merge but without collapsing matching transcripts
            -K for -M option: also collapse shorter, fully contained transcripts
            with fewer introns than the container
            -Q for -M option, remove the containment restriction:
            (multi-exon transcripts will be collapsed if just their introns match,
            while single-exon transcripts can partially overlap (80%))

            I've never used myself, so am not sure if it does what you want. You could also convert to bed format and then use BEDtools, which has something called intersectBed that will get one bed file from combining two input bed files. To get a final GTF file from this bed file, I found this link on seqAnswers:

            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


            But converting between GTF and bed is not always so easy, as you can lose data.

            Chris

            Comment

            • camelbbs
              Member
              • Jun 2011
              • 49

              #7
              Originally posted by cjp View Post
              I guess the sequences are not paired end, so you can't align the FastQ files in the same TopHat command. In that case, you can always merge two BAM files with 'samtools merge' or 'picard MergeSamFiles':



              You can use either a reference GTF file or the output from cufflinks. If you want novel transcripts, then do cufflinks first, but if you only want expression from known genes, you can just do cuffdiff with a GTF file downloaded from ensembl, UCSC, etc.

              Chris
              Thanks a lot Chris,
              Actually my purpose is to search and compare the alternative splicing events between two samples.

              My workflow is like this:

              First I got the two merged bam files from the two samples by tophat. Then I run

              cuffdiff hg19_ucsc.gtf sample1.bam sample2.bam

              And I got some results. But they don't contain the novel transcript assembled by cufflinks.

              So I run cufflinks in order to get the novel transcript

              cufflinks -g hg19_ucsc.gtf sample1.bam
              cufflinks -g hg19_ucsc.gtf sample2.bam

              I got two transcript.gtf files in the two samples.

              Then I merged the two transcript.gtf files, transcript1.gtf and transcript2.gtf with the reference annotation

              cuffmerge -o merged gtf_list (hg19_ucsc.gtf, transcript1.gtf, transcript2.gtf)

              Then run cuffdiff:

              cuffdiff merged.gtf sample1.bam sample2.bam

              Is that the right workflow for comparing the novel alternative splicing transcripts and their expression between the two samples.

              But I see there is a script called cuffcompare. If I run

              cuffcompare hg19_ucsc.gtf transcript1.gtf transcript2.gtf

              I can also get the different alternative splicing transcripts. So does that mean

              cufflinks + cuffcompare == cuffdiff ?

              Thanks a lot!!!
              Last edited by camelbbs; 10-25-2011, 01:35 PM.

              Comment

              • cjp
                Member
                • Jun 2011
                • 58

                #8
                Sounds like you've got a better method than I suggested as have never used cuffcompare or cuffmerge before.

                cuffdiff seems to be always the last program to run whether you want FPKM's (expression levels) for known or novel transcripts. It gives the data in nice spreadsheet (.csv) formats and does some useful stats tests as well.

                Chris

                Comment

                • tiffany081126
                  Member
                  • Jun 2010
                  • 10

                  #9
                  Originally posted by camelbbs View Post
                  Thanks a lot Chris,
                  Actually my purpose is to search and compare the alternative splicing events between two samples.

                  My workflow is like this:

                  First I got the two merged bam files from the two samples by tophat. Then I run

                  cuffdiff hg19_ucsc.gtf sample1.bam sample2.bam

                  And I got some results. But they don't contain the novel transcript assembled by cufflinks.

                  So I run cufflinks in order to get the novel transcript

                  cufflinks -g hg19_ucsc.gtf sample1.bam
                  cufflinks -g hg19_ucsc.gtf sample2.bam

                  I got two transcript.gtf files in the two samples.

                  Then I merged the two transcript.gtf files, transcript1.gtf and transcript2.gtf with the reference annotation

                  cuffmerge -o merged gtf_list (hg19_ucsc.gtf, transcript1.gtf, transcript2.gtf)

                  Then run cuffdiff:

                  cuffdiff merged.gtf sample1.bam sample2.bam

                  Is that the right workflow for comparing the novel alternative splicing transcripts and their expression between the two samples.

                  But I see there is a script called cuffcompare. If I run

                  cuffcompare hg19_ucsc.gtf transcript1.gtf transcript2.gtf

                  I can also get the different alternative splicing transcripts. So does that mean

                  cufflinks + cuffcompare == cuffdiff ?

                  Thanks a lot!!!
                  I have done the same a few days ago, and in my project, I only used the merged.gtf for cuffdiff, and it goes well(there are "u" in the class code ), while for my workmate, she found there were not any "u" in the class code from merged.gtf, so she then run cuffcompare with merged.gtf and known.gtf(the species was not human), and last she used the combined.gtf as well for cuffdiff.

                  So, I am still a littlte confused for the difference of the merged.gtf and the combined.gtf. Any help will be grateful.

                  Comment

                  • camelbbs
                    Member
                    • Jun 2011
                    • 49

                    #10
                    hi, i just want to know what do you mean the combine.gtf

                    Comment

                    • camelbbs
                      Member
                      • Jun 2011
                      • 49

                      #11
                      Originally posted by tiffany081126 View Post
                      I have done the same a few days ago, and in my project, I only used the merged.gtf for cuffdiff, and it goes well(there are "u" in the class code ), while for my workmate, she found there were not any "u" in the class code from merged.gtf, so she then run cuffcompare with merged.gtf and known.gtf(the species was not human), and last she used the combined.gtf as well for cuffdiff.

                      So, I am still a littlte confused for the difference of the merged.gtf and the combined.gtf. Any help will be grateful.
                      I want to ask what do you mean combined.gtf

                      Comment

                      Latest Articles

                      Collapse

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      14 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      24 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      29 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      23 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...