Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I automate the graphing of these data?

    Hi, everybody,

    I have result files generated by blastn which then were sorted based on the second field. A typical file looks like:

    360 miR156a
    1 miR156a
    9 miR156a
    1 miR156a
    10 miR156a
    7 miR156a
    1 miR156a
    705 miR157a
    2 miR157a
    1 miR157a
    5 miR157a
    4 miR157a
    67 miR157a
    5 miR157a
    11 miR157a
    2 miR157a
    34 miR159
    3 miR162
    3 miR166a
    17 miR166a
    4 miR166a
    103 miR167a
    1 miR167a
    ... .....

    The first column is the deepseq read counts for each unique sequence. The 2nd column is the miR IDs that the sequence was aligns to.
    I would like to:
    1)
    Sum the total read counts for each miR IDs (e.g. for miR156a, sum row1-row7);
    Generate a bar graph to show the total read counts for each miR ID.


    I have more than 20 files like this. I would like to use an automated way of doing this. The R package came to my minds.
    But I have not used R before. Can you guys give me some tips or suggestions as about which R package or tools to use? (I can then learn those and figure out)


    2)
    If possible, generate a table that summarize all the total reads info from the 20 files.
    The table that I would like to have is as follows:

    miRID sample1 sample2 sample3 ......... sample 20
    miR156 103 300 450 .......... 33
    miR157 205 300 ..........
    miR167 .....
    .... .......


    Thanks a lot!!

    Jian
    Last edited by yangjianhunt; 06-29-2012, 09:14 AM.

  • #2
    For 1), the bar plot part is easy in R; just use barplot() !

    Summing the counts can be done in a lot of different ways. Here is one that is maybe a bit cryptic but will teach you the table() command. Assume you have the table you pasted in a text file called mirna.txt. Try to run the following in R, with the mirna.txt file in the current working directory:

    m <- read.table("mirna.txt")
    q <- table(m)
    totcounts <- as.numeric(rownames(q)) %*% q
    barplot(totcounts)

    There are of course more transparent ways of summing the counts, but I'm too lazy to type them out :-)

    Comment


    • #3
      Thanks a lot, kopi-o.

      This looks awesome. I will try it out.

      Jian

      Comment


      • #4
        solved

        I eventually used:
        list.files () function to get all the files
        lapply () to achieve processing for multiple functions.
        read.table () to read data.frame from each file
        tapply (SeqCounts, miRNA, sum) to get a counting for each "class"
        write.table () to write data into a file, append=TRUE
        also used paste() and cat () to write a name before each appendage.
        barplot () to draw polt

        It took me a couple of days to learn the introductory basics of R. But it was fun and will be useful in the future I hope.

        Again, thanks to Kopi-o for point the way: I haven't learned how to used the table () function yet...But I feel confident to be able to learn it now.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Genetic Variation in Immunogenetics and Antibody Diversity
          by seqadmin



          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
          11-06-2024, 07:24 PM
        • seqadmin
          Choosing Between NGS and qPCR
          by seqadmin



          Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
          10-18-2024, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 11:09 AM
        0 responses
        23 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Today, 06:13 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-01-2024, 06:09 AM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-30-2024, 05:31 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Working...
        X