Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I automate the graphing of these data?

    Hi, everybody,

    I have result files generated by blastn which then were sorted based on the second field. A typical file looks like:

    360 miR156a
    1 miR156a
    9 miR156a
    1 miR156a
    10 miR156a
    7 miR156a
    1 miR156a
    705 miR157a
    2 miR157a
    1 miR157a
    5 miR157a
    4 miR157a
    67 miR157a
    5 miR157a
    11 miR157a
    2 miR157a
    34 miR159
    3 miR162
    3 miR166a
    17 miR166a
    4 miR166a
    103 miR167a
    1 miR167a
    ... .....

    The first column is the deepseq read counts for each unique sequence. The 2nd column is the miR IDs that the sequence was aligns to.
    I would like to:
    1)
    Sum the total read counts for each miR IDs (e.g. for miR156a, sum row1-row7);
    Generate a bar graph to show the total read counts for each miR ID.


    I have more than 20 files like this. I would like to use an automated way of doing this. The R package came to my minds.
    But I have not used R before. Can you guys give me some tips or suggestions as about which R package or tools to use? (I can then learn those and figure out)


    2)
    If possible, generate a table that summarize all the total reads info from the 20 files.
    The table that I would like to have is as follows:

    miRID sample1 sample2 sample3 ......... sample 20
    miR156 103 300 450 .......... 33
    miR157 205 300 ..........
    miR167 .....
    .... .......


    Thanks a lot!!

    Jian
    Last edited by yangjianhunt; 06-29-2012, 09:14 AM.

  • #2
    For 1), the bar plot part is easy in R; just use barplot() !

    Summing the counts can be done in a lot of different ways. Here is one that is maybe a bit cryptic but will teach you the table() command. Assume you have the table you pasted in a text file called mirna.txt. Try to run the following in R, with the mirna.txt file in the current working directory:

    m <- read.table("mirna.txt")
    q <- table(m)
    totcounts <- as.numeric(rownames(q)) %*% q
    barplot(totcounts)

    There are of course more transparent ways of summing the counts, but I'm too lazy to type them out :-)

    Comment


    • #3
      Thanks a lot, kopi-o.

      This looks awesome. I will try it out.

      Jian

      Comment


      • #4
        solved

        I eventually used:
        list.files () function to get all the files
        lapply () to achieve processing for multiple functions.
        read.table () to read data.frame from each file
        tapply (SeqCounts, miRNA, sum) to get a counting for each "class"
        write.table () to write data into a file, append=TRUE
        also used paste() and cat () to write a name before each appendage.
        barplot () to draw polt

        It took me a couple of days to learn the introductory basics of R. But it was fun and will be useful in the future I hope.

        Again, thanks to Kopi-o for point the way: I haven't learned how to used the table () function yet...But I feel confident to be able to learn it now.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X