Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two questions

    1. I want to ask a question about bam files.

    I have 2 sequencing library in a same sample, and get 2 fastq files, the length of reads are 50bp and 36bp separately.
    When I do tophat, because I need to specify the -r, I cannot merge the two fastq files. But after I got the accepted.bam files, can I merge them (bam files) with the samtools merge?

    I need to do cufflinks and cuffdiff using the merged bam files.

    2. I see the parameter of cuffdiff is
    cuffdiff transcripts.gtf 1.bam 2.bam

    Does this transcritpts.gtf is the output of cufflinks or just the reference transcript annotation?


    thanks everyone.

  • #2
    Originally posted by camelbbs View Post
    1. I want to ask a question about bam files.

    I have 2 sequencing library in a same sample, and get 2 fastq files, the length of reads are 50bp and 36bp separately.
    When I do tophat, because I need to specify the -r, I cannot merge the two fastq files. But after I got the accepted.bam files, can I merge them (bam files) with the samtools merge?

    I need to do cufflinks and cuffdiff using the merged bam files.

    2. I see the parameter of cuffdiff is
    cuffdiff transcripts.gtf 1.bam 2.bam

    Does this transcritpts.gtf is the output of cufflinks or just the reference transcript annotation?


    thanks everyone.
    I guess the sequences are not paired end, so you can't align the FastQ files in the same TopHat command. In that case, you can always merge two BAM files with 'samtools merge' or 'picard MergeSamFiles':



    You can use either a reference GTF file or the output from cufflinks. If you want novel transcripts, then do cufflinks first, but if you only want expression from known genes, you can just do cuffdiff with a GTF file downloaded from ensembl, UCSC, etc.

    Chris

    Comment


    • #3
      Thanks very much. But the sequences are paried end. Because one sample have several libraries, and the sequencing length is different between the libraries. So we just first to get the bam files by tophat -r xxx -G hg19_ucsc.gtf ERR001_1.fastq ERR001_2.fastq

      and then merge all the bam files that not belong to the sample library, but belong to the same sample. Is that right? Thanks
      Last edited by camelbbs; 10-24-2011, 12:01 PM.

      Comment


      • #4
        Originally posted by cjp View Post
        I guess the sequences are not paired end, so you can't align the FastQ files in the same TopHat command. In that case, you can always merge two BAM files with 'samtools merge' or 'picard MergeSamFiles':



        You can use either a reference GTF file or the output from cufflinks. If you want novel transcripts, then do cufflinks first, but if you only want expression from known genes, you can just do cuffdiff with a GTF file downloaded from ensembl, UCSC, etc.

        Chris
        And If we use the output from cufflinks, there will be two gtf files when we work on two samples. So how to input these two files into the cuffdiff. thanks very much for your help

        Comment


        • #5
          Originally posted by camelbbs View Post
          Thanks very much. But the sequences are paried end. Because one sample have several libraries, and the sequencing length is different between the libraries. So we just first to get the bam files by tophat -r xxx -G hg19_ucsc.gtf ERR001_1.fastq ERR001_2.fastq

          and then merge all the bam files that not belong to the sample library, but belong to the same sample. Is that right? Thanks
          Yes, you can merge BAM files from multiple sequencing runs if they are the same sample even if they have a different read length.

          Comment


          • #6
            Originally Posted by camelbbs

            And If we use the output from cufflinks, there will be two gtf files when we work on two samples. So how to input these two files into the cuffdiff. thanks very much for your help

            Cufflinks provides some software called gffread - from gffread -h, there are these options:

            -M/--merge : cluster the input transcripts into loci, collapsing matching
            transcripts (those with the same exact introns and fully contained)
            --cluster-only: same as --merge but without collapsing matching transcripts
            -K for -M option: also collapse shorter, fully contained transcripts
            with fewer introns than the container
            -Q for -M option, remove the containment restriction:
            (multi-exon transcripts will be collapsed if just their introns match,
            while single-exon transcripts can partially overlap (80%))

            I've never used myself, so am not sure if it does what you want. You could also convert to bed format and then use BEDtools, which has something called intersectBed that will get one bed file from combining two input bed files. To get a final GTF file from this bed file, I found this link on seqAnswers:

            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


            But converting between GTF and bed is not always so easy, as you can lose data.

            Chris

            Comment


            • #7
              Originally posted by cjp View Post
              I guess the sequences are not paired end, so you can't align the FastQ files in the same TopHat command. In that case, you can always merge two BAM files with 'samtools merge' or 'picard MergeSamFiles':



              You can use either a reference GTF file or the output from cufflinks. If you want novel transcripts, then do cufflinks first, but if you only want expression from known genes, you can just do cuffdiff with a GTF file downloaded from ensembl, UCSC, etc.

              Chris
              Thanks a lot Chris,
              Actually my purpose is to search and compare the alternative splicing events between two samples.

              My workflow is like this:

              First I got the two merged bam files from the two samples by tophat. Then I run

              cuffdiff hg19_ucsc.gtf sample1.bam sample2.bam

              And I got some results. But they don't contain the novel transcript assembled by cufflinks.

              So I run cufflinks in order to get the novel transcript

              cufflinks -g hg19_ucsc.gtf sample1.bam
              cufflinks -g hg19_ucsc.gtf sample2.bam

              I got two transcript.gtf files in the two samples.

              Then I merged the two transcript.gtf files, transcript1.gtf and transcript2.gtf with the reference annotation

              cuffmerge -o merged gtf_list (hg19_ucsc.gtf, transcript1.gtf, transcript2.gtf)

              Then run cuffdiff:

              cuffdiff merged.gtf sample1.bam sample2.bam

              Is that the right workflow for comparing the novel alternative splicing transcripts and their expression between the two samples.

              But I see there is a script called cuffcompare. If I run

              cuffcompare hg19_ucsc.gtf transcript1.gtf transcript2.gtf

              I can also get the different alternative splicing transcripts. So does that mean

              cufflinks + cuffcompare == cuffdiff ?

              Thanks a lot!!!
              Last edited by camelbbs; 10-25-2011, 01:35 PM.

              Comment


              • #8
                Sounds like you've got a better method than I suggested as have never used cuffcompare or cuffmerge before.

                cuffdiff seems to be always the last program to run whether you want FPKM's (expression levels) for known or novel transcripts. It gives the data in nice spreadsheet (.csv) formats and does some useful stats tests as well.

                Chris

                Comment


                • #9
                  Originally posted by camelbbs View Post
                  Thanks a lot Chris,
                  Actually my purpose is to search and compare the alternative splicing events between two samples.

                  My workflow is like this:

                  First I got the two merged bam files from the two samples by tophat. Then I run

                  cuffdiff hg19_ucsc.gtf sample1.bam sample2.bam

                  And I got some results. But they don't contain the novel transcript assembled by cufflinks.

                  So I run cufflinks in order to get the novel transcript

                  cufflinks -g hg19_ucsc.gtf sample1.bam
                  cufflinks -g hg19_ucsc.gtf sample2.bam

                  I got two transcript.gtf files in the two samples.

                  Then I merged the two transcript.gtf files, transcript1.gtf and transcript2.gtf with the reference annotation

                  cuffmerge -o merged gtf_list (hg19_ucsc.gtf, transcript1.gtf, transcript2.gtf)

                  Then run cuffdiff:

                  cuffdiff merged.gtf sample1.bam sample2.bam

                  Is that the right workflow for comparing the novel alternative splicing transcripts and their expression between the two samples.

                  But I see there is a script called cuffcompare. If I run

                  cuffcompare hg19_ucsc.gtf transcript1.gtf transcript2.gtf

                  I can also get the different alternative splicing transcripts. So does that mean

                  cufflinks + cuffcompare == cuffdiff ?

                  Thanks a lot!!!
                  I have done the same a few days ago, and in my project, I only used the merged.gtf for cuffdiff, and it goes well(there are "u" in the class code ), while for my workmate, she found there were not any "u" in the class code from merged.gtf, so she then run cuffcompare with merged.gtf and known.gtf(the species was not human), and last she used the combined.gtf as well for cuffdiff.

                  So, I am still a littlte confused for the difference of the merged.gtf and the combined.gtf. Any help will be grateful.

                  Comment


                  • #10
                    hi, i just want to know what do you mean the combine.gtf

                    Comment


                    • #11
                      Originally posted by tiffany081126 View Post
                      I have done the same a few days ago, and in my project, I only used the merged.gtf for cuffdiff, and it goes well(there are "u" in the class code ), while for my workmate, she found there were not any "u" in the class code from merged.gtf, so she then run cuffcompare with merged.gtf and known.gtf(the species was not human), and last she used the combined.gtf as well for cuffdiff.

                      So, I am still a littlte confused for the difference of the merged.gtf and the combined.gtf. Any help will be grateful.
                      I want to ask what do you mean combined.gtf

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Best Practices for Single-Cell Sequencing Analysis
                        by seqadmin



                        While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                        06-06-2024, 07:15 AM
                      • seqadmin
                        Latest Developments in Precision Medicine
                        by seqadmin



                        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                        Somatic Genomics
                        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                        05-24-2024, 01:16 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 06-07-2024, 06:58 AM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 06-06-2024, 08:18 AM
                      0 responses
                      21 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 06-06-2024, 08:04 AM
                      0 responses
                      20 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 06-03-2024, 06:55 AM
                      0 responses
                      14 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X