Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • StephaniePi83
    Member
    • Sep 2011
    • 52

    read annotation with bowtie

    Hello everyone,
    I am a beginner Bioinformatician, I have to analyse iCLIP data in order to identify mRNAs and piRNAs that are CLIPed to a protein. I have .bed file and I would like to annotate the reads in those .bed file. In a first time, I’d like to remove reads annotated as tRNA, rRNA, snoRNA and snRNA. To do so I download .fasta file from http://intermine.modencode.org/relea...customQuery.do and I create a file that contain all the data for tRNA, rRNA etc… I was thinking to use Bowtie (I’ve never use bowtie) to do the annotation, but is it possible with .bed file and .fasta file ? How can I do the alignment ?
    Thanks in advance for your response,
    Stéphanie
  • cascoamarillo
    Senior Member
    • Oct 2010
    • 164

    #2
    Hi,

    I'm not sure about the iCLIP data analysis, but bowtie is an alignment tool:

    Does it take .bed files? I don't think so. It takes your reads (fasta, fastq) and your reference file (indexed fasta) and generates and output aligment file. Then, you probably can take this output and .bed file into a annotation program or similar (I'm don't know too much about this part).
    What you can DO with bowtie is take your tRNA, rRNA, snoRNA and snRNA fasta file, and eliminate the reads that match those sequences (-un option).

    Hope it helps.

    Comment

    • StephaniePi83
      Member
      • Sep 2011
      • 52

      #3
      Dear cascoamarillo,
      Thanks for your reply.
      This is exactly what i want do : eliminate reads that match to tRNA,rRNA ... but is it possible with my .bed file ? Or i have to change it in another format ?

      Comment

      • arvid
        Senior Member
        • Jul 2011
        • 156

        #4
        Having the reads in .bed format indicates that they probably already are aligned? Anyway, then you could go a couple of different routes;

        1. If the genome is well annotated, get a GFF file and filter out the lines with the descriptions of tRNA/rRNA etc. locations and filter the bed file against that (get bedtools and check out "intersectBed -v").
        2. You can extract fasta for each bed feature (read, I suppose in your case) from the bed/fasta-combo with the fastaFromBed tool in bedtools. Then align the reads in fasta format to your ncRNAs with Bowtie or whatever aligner.

        Good luck,
        Samuel

        Comment

        • StephaniePi83
          Member
          • Sep 2011
          • 52

          #5
          Dear Arvid,
          Thanks for your reply.
          I have a question concerning the 2nd solution : i have to create a fasta file that contrain the sequence for each chromosome ( i work on fly D. melanogaster so i need the sequence for the 4 chromosomes) ?
          I tried the 1st solution but the .gff fly are corrupted, i can't unzip it
          Last edited by StephaniePi83; 10-25-2011, 12:23 AM.

          Comment

          • arvid
            Senior Member
            • Jul 2011
            • 156

            #6
            For the 2nd solution:
            Yes, you need the FASTA file with the chromosome sequences that the bed is associated with (to which the reads were aligned). Then issue the following bedtools (http://code.google.com/p/bedtools/) command:

            fastaFromBed -name -fi [your_fasta_file] -bed [your_bed_file] -fo output.fasta

            That should give you the reads in "output.fasta".

            For the first solution, grab the compressed gff from FlyBase (ftp://ftp.flybase.net/releases/FB201...l-r5.41.gff.gz).

            Then grep the file for rRNA, tRNA, snoRNA etc. E.g.:
            gzip -cd dmel-all-r5.41.gff.gz | grep rRNA > dmel-rRNA-r5.41.gff
            gzip -cd dmel-all-r5.41.gff.gz | grep tRNA > dmel-tRNA-r5.41.gff
            and so on.

            Then intersect your bed with these gffs:
            intersectBed -wa -a [your_bed_file] -b dmel-rRNA-r5.41.gff > rRNA_reads.bed
            intersectBed -wa -a [your_bed_file] -b dmel-tRNA-r5.41.gff > tRNA_reads.bed
            etc.

            Or if you need to filter out those ncRNA reads:
            intersectBed -v -a [your_bed_file] -b dmel-rRNA-r5.41.gff | intersectBed -v -a stdin -b dmel-tRNA-r5.41.gff > no-rRNA-no-tRNA_reads.bed

            Look through the bedtools web site for more examples...

            Enjoy,
            Samuel

            Comment

            • StephaniePi83
              Member
              • Sep 2011
              • 52

              #7
              Originally posted by arvid View Post
              For the first solution, grab the compressed gff from FlyBase (ftp://ftp.flybase.net/releases/FB201...l-r5.41.gff.gz).
              l
              This is exactly the file i can't unzip ...

              Comment

              • arvid
                Senior Member
                • Jul 2011
                • 156

                #8
                It is in gzip format, so you shouldn't "unzip" it. If you like to decompress it, run "gunzip dmel-all-r5.41.gff.gz". You don't need to do that for the commands I suggested, though, as they would decompress the file on-the-fly.

                If it is corrupted, download it again. Also make sure that your reads were aligned to the same version of the genome, in case the chromosomes might have changed (or check the readmes on FlyBase for such information).

                Comment

                • StephaniePi83
                  Member
                  • Sep 2011
                  • 52

                  #9
                  for the 2nd solution, do i need the same chromosome sequences that the bed is associated with? Because i think that the data from flybase are not the one used because "intersectBed" only remove 1 entity ...

                  Comment

                  • arvid
                    Senior Member
                    • Jul 2011
                    • 156

                    #10
                    Yes, you need the same chromosome sequences... Can't you find out what produced that bed file? Then you should be able to get the needed information, and possibly raw fastq files to do your own alignments...

                    Comment

                    • StephaniePi83
                      Member
                      • Sep 2011
                      • 52

                      #11
                      yes i'd like too but the person that perform the primary analysis don't answer me !! Thank you for your reply, it help me

                      Comment

                      Latest Articles

                      Collapse

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Today, 06:09 AM
                      0 responses
                      15 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-09-2026, 11:58 AM
                      0 responses
                      34 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-05-2026, 10:09 AM
                      0 responses
                      39 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-04-2026, 08:59 AM
                      0 responses
                      47 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...