Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Just download the GTF annotation and use gtf2bed from bedops. That should keep the gene names (or some other useful identifier).

    Edit: The GTF annotation is available from the UCSC table browser, in case you weren't aware

    Comment


    • #17
      Originally posted by dpryan View Post
      Just download the GTF annotation and use gtf2bed from bedops. That should keep the gene names (or some other useful identifier).

      Edit: The GTF annotation is available from the UCSC table browser, in case you weren't aware
      Thanks Devon,
      Very helpful so far.
      I downloaded this annotation from ensembl. From biomart i simply left the filters sections empty and used the whole output as my list. This is right, no?

      PS. i just asked a question in biostar about flagstat and it was answered by you i think but i am not sure.

      Comment


      • #18
        If you're aligning against mm10, then don't use an annotation file from Ensembl (the chromosome names are different). That will cause no end of issue If you use the Ensembl annotation, just align against the genome that you can download from Ensembl (the Ensembl annotation is better anyway).

        Yeah, that was probably me, there's a large overlap between the people here and on biostars.

        Comment


        • #19
          I am using mm9. And I downloaded the bowtie index from their webpage. I had to change the chr names as ensembl uses the 1 instead of chr1. But i don't know whether just adding chr in front of the chromosomes names will sort it out. Should I do other changes in addition?

          Comment


          • #20
            That should be sufficient, just make sure that the annotation doesn't mention any chromosomes/contigs missing from the reference fasta file (I can't recall if that's the case or not).

            Comment


            • #21
              Do you mean manually checking if all chromosomes are there? i do not know what contigs are or how to check them.

              I think chrX and Y are missing in the gtf file. But how can I find information for missing chromosomes and complete it?

              Comment


              • #22
                No need to manually do that

                Just:
                Code:
                grep ">" reference.fa
                on the reference fasta file to get a list of the contigs and then:
                Code:
                 cat annotation.gtf | cut -f 1 | sort | uniq
                on the GTF or GFF file. They should be the same, possibly with a different order (and the output from the grep command will all start with ">", which you can ignore).

                Comment


                • #23
                  I did the check.
                  As you suggested the order is different.

                  The only differences are that
                  i have chrM in fa file and MN in the gtf file.
                  Also in the gtf file i have lots of 'NT_123456' type of entries which i do not what it is.

                  Do these differences cause any problem?

                  Comment


                  • #24
                    Originally posted by roll View Post
                    I did the check.
                    As you suggested the order is different.

                    The only differences are that
                    i have chrM in fa file and MN in the gtf file.
                    Also in the gtf file i have lots of 'NT_123456' type of entries which i do not what it is.

                    Do these differences cause any problem?
                    That will generally still cause issues with tophat. If you downloaded your bowtie indices from iGenomes (likely via a link on the bowtie webpage), then they came with an appropriate reference annotation file. Just use that one.

                    Comment


                    • #25
                      Originally posted by dpryan View Post
                      Assuming that you're using the mm10 reference, you can download the repeatmasker output here (mm9 is here). The general idea is to extract the type of feature(s) you want from the repeatmasker .out file and convert that to bed format and use "bedtools intersect ..." to get a count of how many reads align there. There are many other ways to do this, but that should work.

                      In fact, a more straight-forward way might be simply to run cufflinks on your alignments and then intersect the novel transcripts it finds with the repeatmasker output file. That might end up being easier.
                      from the repeat list, i am trying to use the ones on forward strand. do you know how to extract this from the .out file?
                      The column headers are like
                      SW perc perc perc query position in query matching repeat position in repeat
                      score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID

                      687 17.4 0.0 0.0 chr1 3000002 3000156 (194195276) C L1_Mur2 LINE/L1 (4310) 1567 1413 1
                      917 21.4 11.4 4.5 chr1 3000238 3000733 (194194699) C L1_Mur2 LINE/L1 (4488) 1389 913 1
                      215 3.1 0.0 3.0 chr1 3000734 3000766 (194194666) + (TTTG)n Simple_repeat 2 33 (0) 2

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Recent Developments in Metagenomics
                        by seqadmin





                        Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                        09-23-2024, 06:35 AM
                      • seqadmin
                        Understanding Genetic Influence on Infectious Disease
                        by seqadmin




                        During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                        Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                        09-09-2024, 10:59 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 10-02-2024, 04:51 AM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-01-2024, 07:10 AM
                      0 responses
                      21 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 09-30-2024, 08:33 AM
                      0 responses
                      25 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 09-26-2024, 12:57 PM
                      0 responses
                      18 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X