Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A script for Cuffdiff: extract differential expression gene sequences from genome.fa

    Hi all,

    I uploaded a script that I used for my own research just in case anyone else feel it is useful.
    This script takes two input file: one is gene_exp.diff generated by cuffdiff; the other is multifasta file (genome file) that your reads were mapped to. And the output is differential expression gene sequences (in fasta format). It takes advantage of the gene positions information of gene_exp.diff to extract the sequence in genome file.

    usage: perl extractGene.pl [-d gene_exp.diff] [-g genome.fasta] [--help]
    Attached Files

  • #2
    Hi,

    First of all, thanks a lot for posting this script! It is exactly what I am looking for, however I am having issues getting it working. I would try to fix it myself and post the results, but I use Python and am completely clueless with Perl. Anyway, I get this error.

    Code:
    $ perl ~/Downloads/extractGene.pl -d ../diff_out/gene_exp.diff -g genome_file.fa
    Use of uninitialized value in split at /home/vinay/Downloads/extractGene.pl line 43, <DIFFGENE> line 1.
    It just hangs there. Do you have any idea what might be causing this issue? I looked at line 43, but as stated above, ended up quite confused without being able to understand any Perl syntax.

    Comment


    • #3
      Hi,

      Sorry I didn't make it clear. But you need to do a little filtering before running this script.
      Try this:

      grep yes ../diff_out/gene_exp.diff > ../diff_out/gene_exp.diff.yes
      $ perl ~/Downloads/extractGene.pl -d ../diff_out/gene_exp.diff.yes -g genome_file.fa

      Let me know if it works or not.

      Comment


      • #4
        Thanks for the quick reply! In fact, I had made a "yes" file using grep for a Python script I was trying to make. Anyway, I tried that, and after a while of working (at least more than 20 minutes) it gave me this:
        Code:
        Write failed: Broken pipe

        Comment


        • #5
          Originally posted by vinay427 View Post
          Thanks for the quick reply! In fact, I had made a "yes" file using grep for a Python script I was trying to make. Anyway, I tried that, and after a while of working (at least more than 20 minutes) it gave me this:
          Code:
          Write failed: Broken pipe
          How large is your *_gene.diff? It shouldn't have taken 20 minutes. Can you send me part of your input files? Just for a testing. [email protected]

          Comment


          • #6
            Hi foxriverlin,

            Did you make any changes after extracting "yes" using "grep" command like header or some thing else. Script was executing but still it seems an empty file.

            Thanks.
            Krishna

            Comment


            • #7
              Originally posted by foxriverlin View Post
              How large is your *_gene.diff? It shouldn't have taken 20 minutes. Can you send me part of your input files? Just for a testing. [email protected]
              Sent, thanks again!

              Comment


              • #8
                Originally posted by vinay427 View Post
                Sent, thanks again!
                I see the problem. Your Refbeet.fa is a muti-scaffold fasta file. In other words, it uses scaffold as each sequence unit. However, in your gene_exp_yes.diff the locus is contig100180:871-997. The format is different. Did you use the same file, "Refbeet.fa" for alignment? In my script it will use exactly the locus information in gene_diff to extract the nucleotide sequence in reference.fa. If they do not match with each other, it would fail.

                Comment


                • #9
                  Originally posted by Krish_143 View Post
                  Hi foxriverlin,

                  Did you make any changes after extracting "yes" using "grep" command like header or some thing else. Script was executing but still it seems an empty file.

                  Thanks.
                  No I didn't make any changes after "grep". If you use "grep" before running the extract.pl, the header line should have been removed already. But if the header line exist, it also works. Maybe you ran into the same issue as vinay427 did. If not, please send me part of your input file to have a look at.

                  Comment


                  • #10
                    I have mutli-scaffold fasta in my genome, then how to get fasta in this case.. i have to give each scaffold at every time !!
                    Krishna

                    Comment


                    • #11
                      Originally posted by foxriverlin View Post
                      I see the problem. Your Refbeet.fa is a muti-scaffold fasta file. In other words, it uses scaffold as each sequence unit. However, in your gene_exp_yes.diff the locus is contig100180:871-997. The format is different. Did you use the same file, "Refbeet.fa" for alignment? In my script it will use exactly the locus information in gene_diff to extract the nucleotide sequence in reference.fa. If they do not match with each other, it would fail.
                      If I understand what you're saying, yes, I used the same genome for alignment. My .diff and .fa files have both contigs and scaffolds in them, but the parts I trimmed from the top may have looked like that. For example, there are also contigXXXXXX headers in my .fa genome file.

                      Comment


                      • #12
                        Originally posted by vinay427 View Post
                        If I understand what you're saying, yes, I used the same genome for alignment. My .diff and .fa files have both contigs and scaffolds in them, but the parts I trimmed from the top may have looked like that. For example, there are also contigXXXXXX headers in my .fa genome file.
                        The problem was caused by header line of .fa genome file. Mine is little bit different than yours, doesn't including length=***. I fixed that problem and uploaded a new version of script which should work. Thanks for pointing out the bug in my script!

                        Ruolin
                        Attached Files

                        Comment


                        • #13
                          Originally posted by Krish_143 View Post
                          I have mutli-scaffold fasta in my genome, then how to get fasta in this case.. i have to give each scaffold at every time !!
                          I posted a new version. Please let me know if the new one is still not working.

                          Comment


                          • #14
                            Originally posted by foxriverlin View Post
                            The problem was caused by header line of .fa genome file. Mine is little bit different than yours, doesn't including length=***. I fixed that problem and uploaded a new version of script which should work. Thanks for pointing out the bug in my script!

                            Ruolin
                            I should be thanking you! Anyway, let me try this out. I'll report back soon. Thanks again!

                            Comment


                            • #15
                              It worked perfectly! Thank you so much, it really saved my day.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X