Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • UpSet R plot, input data format wrong?

    Hi!

    I processed 3 BAM files that were generated from 3 different pipelines, so in total 9 BAM files by writing scripts in bash and python. I extracted the mapped reads from the BAM files and stored them in python sets. Then, I performed pair-wise intersection operations to see which reads are common in which BAM files (despite different pipelines).

    The output 3x3 matrix was written into a tsv file:

    14659 14659 14647
    14659 15731 15709
    14647 15709 15709

    Numbers correspond to the number of reads that are in one intersection between 2 files.

    Now, I wanted to load the marix into R and create an UpSet R plot. I know that a Venn Diagram would also work, but later on, I will have more pipelines to compare and so I chose UpSet R plots. I tried this code:

    upset(test_df, sets = 'reconstructed', 'shuffled', 'trimmed',
    number.angles = 30, point.size = 3.5, line.size = 2,
    mainbar.y.label = "Read Intersections", sets.x.label = "Blabla",
    text.scale = c(1.3, 1.3, 1, 1, 2, 0.75), mb.ratio = c(0.55, 0.45),
    order.by = 'sets', keep.order = TRUE)

    But an error occured:
    Error in start_col:end_col : argument of length 0

    Unfortunately, I am only a beginner in R w/o experience.
    Maybe, someone has more experience in R or the UpSet package.

    Greetings!

  • #2
    I run UpSetR by inputting individual sets as a list and then the program calculates overlap itself (I am not aware whether it allows you to "manually" input the overlaps, never tried that).

    #make input
    list.Input = list(set1=data1,set2=data2,set3=data3)
    #run upsetr
    upset(fromList(list.Input),sets=c("set1","set2","set3"))

    .. and then just adding additional commands (keep.order, nintersects, etc...) as needed.

    Comment


    • #3
      Tried it out, but...

      Thank you, Meyana.

      I tried your idea, but it still won't work.
      How do your input data look like?

      I just input 3 text files that each contain one column (read identifier from BAM files).
      The upset output plot shows me the three sets, but no intersections.
      Any suggestions?

      Many greetings

      Comment


      • #4
        My data1/data2/data3 are just vectors of the observations, which I then store in the list listInput, nothing special. The data observations themselves can have any format, mine look something like "A344D".

        Did you store your data in the list?

        Comment


        • #5
          This is what I've done:

          #imported
          library(UpSetR)

          #make input
          list.Input = list(set1 = "trimmed_bismark_bt2_pe.bam_mapped_reads.txt",
          set2 = "shuffled_bismark_bt2_pe.bam_mapped_reads.txt",
          set3 = "econstructed_bismark_bt2_pe.bam_mapped_reads.txt")

          upset(fromList(list.Input), sets = c("set1", "set2", "set3"),
          number.angles = 30, point.size = 3.5, line.size = 2,
          mainbar.y.label = "Read Intersections", sets.x.label = "Blabla",
          text.scale = c(1.3, 1.3, 1, 1, 2, 0.75), mb.ratio = c(0.55, 0.45),
          order.by = 'freq', keep.order = TRUE)

          So, I think that I stored the sets in a list. I also checked it with print(class(list.Input)).
          Maybe, the package does not accept my input... three text files, one column each, just read identifier...

          Comment


          • #6
            Your code works fine on my data.
            Could you post a snippet of your data?

            Comment


            • #7
              Works now!

              Hi Meyana,

              it works now!
              But you were absolutely right generating a set list and use the fromList function.
              I was not aware that fromList creates a binary data frame that is compatible with the UpSet package.

              Just for other forum users, my functional code:

              library(UpSetR)

              trimmed_df <- read.csv(file = "tri.txt", header = FALSE, sep = "\n")
              shuffled_df <- read.csv(file = "shu.txt", header = FALSE, sep = "\n")
              reconstructed_df <- read.csv(file = "rec.txt", header = FALSE, sep = "\n")

              trimmed <- as.vector(trimmed_df$V1)
              shuffled <- as.vector(shuffled_df$V1)
              reconstructed <- as.vector(reconstructed_df$V1)

              read_sets = list(
              trimmed_reads = trimmed,
              shuffled_reads = shuffled,
              reconstructed_reads = reconstructed)

              upset(fromList(read_sets),
              sets = c("trimmed_reads", "shuffled_reads", "reconstructed_reads"),
              number.angles = 20, point.size = 2.5, line.size = 1.5,
              mainbar.y.label = "read intersection", sets.x.label = "read set size",
              text.scale = c(1.5, 1.5, 1.25, 1.25, 1.5, 1.5), mb.ratio = c(0.65, 0.35),
              group.by = "freq", keep.order = TRUE)

              Again, thank you Meyana!

              Comment


              • #8
                Great, happy to see it working for you!

                In addition to the UpSetR package, there's also the SuperExactTest package, which you may also find interesting (though the graphical output is not the prettiest)

                Comment


                • #9
                  Upset error

                  hi,

                  I have tried using upset plot for three vcf files from different pipelines. I extracted the variant column (SNPs) and used these csv files (with one column) for R import. I have used this code:

                  set1 <- read.csv("set1.vcf", sep="")
                  set2 <- read.csv("set2.vcf", sep="")
                  set3 <- read.csv("set3.vcf", sep="")

                  set1 <- as.vector(set1$V1)
                  set2 <- as.vector(set2$v1)
                  set3 <- as.vector(set3$V1)

                  read_sets = list(set1_reads = set1,
                  set2_reads = set2,
                  set3_reads = set3)

                  upset(fromList(read_sets),
                  sets = c("set1_reads", "set2_reads", "set3_reads"),
                  number.angles = 20, point.size = 2.5, line.size = 1.5,
                  mainbar.y.label = "read intersection", sets.x.label = "read set size",
                  text.scale = c(1.5, 1.5, 1.25, 1.25, 1.5, 1.5), mb.ratio = c(0.65, 0.35),
                  group.by = "freq", keep.order = TRUE)

                  It gives an intersection plot but when the number of SNPs from upset plot are really low when I compared these with vcf-compare results using same vcf files. I am not sure why I am getting different numbers with upset plot.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Best Practices for Single-Cell Sequencing Analysis
                    by seqadmin



                    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                    06-06-2024, 07:15 AM
                  • seqadmin
                    Latest Developments in Precision Medicine
                    by seqadmin



                    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                    Somatic Genomics
                    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                    05-24-2024, 01:16 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:58 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-06-2024, 08:18 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-06-2024, 08:04 AM
                  0 responses
                  18 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-03-2024, 06:55 AM
                  0 responses
                  13 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X