Hey everyone,
I am working on an R script that will pull contig ID and ranges of differentially expressed transcripts out of the genomic reference I used as input for Cuffdiff. I have the script working, but when it runs into transcripts with the same contig ID (Different ranges) it will only count the first one it encounters and skips the redunancy.
The format of my contig name and range is formatted like this:
NAME START END
gi340012932gbAFJA01012285.1 0 524
gi340012938gbAFJA01012279.1 10334 12405
gi340012938gbAFJA01012279.1 2260 7496
gi340012938gbAFJA01012279.1 69 2164
gi340012938gbAFJA01012279.1 9069 10084
gi340013005gbAFJA01012212.1 5696 9927
gi340013006gbAFJA01012211.1 4079 11139
As you can see, there is redundancy in the contig ID, but the ranges are different.
I am getting warning messages looking something like this:
Warning messages:
1: In St:Ed : numerical expression has 2 elements: only the first used
2: In St:Ed : numerical expression has 2 elements: only the first used
3: In St:Ed : numerical expression has 2 elements: only the first used
4: In St:Ed : numerical expression has 2 elements: only the first used
I have copied my script here:
setwd("C:/Users/Insect1/Documents/Alex/fasta/Treatment_Specific_Analysis")
files <- list.files(".") # Will list all your files
library("seqinr") # load the seqinr package
dnafile <- ("Mrotcontig.fsa") # change the fasta file into an R object
data <- read.csv("R_Index_N6NF_Apr_9_2013.csv") #reads the csv file "Index" into R
myseq <- read.fasta(file = dnafile) # creates a fasta file
for(i in 1:length(data$NAME)){seqname <-data$NAME[i]
#seqnameB <- noquote(seqname)
seqB <- getSequence(eval(parse(text=paste("myseq$", seqname, sep = "")))) # the eval and paste functions are needed to construct the search oommand "myseq$MORT00010950" without spaces or quotes mark which make the command nonfunctional
St <- data[which(data$NAME == seqname), 2]
Ed <- data[which(data$NAME == seqname), 3]
trimmedseq <- seqB[St:Ed]
seqnameC <- paste(seqname,St,"-",Ed)
write.fasta(sequences = trimmedseq, names = seqnameC, file.out = "Blast.txt", open = "a")
}
Blast <-read.fasta("blast.txt", as.string = TRUE)
Is there an easy fix for making the script address the redundancy in the contig IDs?
Thanks,
Alex
I am working on an R script that will pull contig ID and ranges of differentially expressed transcripts out of the genomic reference I used as input for Cuffdiff. I have the script working, but when it runs into transcripts with the same contig ID (Different ranges) it will only count the first one it encounters and skips the redunancy.
The format of my contig name and range is formatted like this:
NAME START END
gi340012932gbAFJA01012285.1 0 524
gi340012938gbAFJA01012279.1 10334 12405
gi340012938gbAFJA01012279.1 2260 7496
gi340012938gbAFJA01012279.1 69 2164
gi340012938gbAFJA01012279.1 9069 10084
gi340013005gbAFJA01012212.1 5696 9927
gi340013006gbAFJA01012211.1 4079 11139
As you can see, there is redundancy in the contig ID, but the ranges are different.
I am getting warning messages looking something like this:
Warning messages:
1: In St:Ed : numerical expression has 2 elements: only the first used
2: In St:Ed : numerical expression has 2 elements: only the first used
3: In St:Ed : numerical expression has 2 elements: only the first used
4: In St:Ed : numerical expression has 2 elements: only the first used
I have copied my script here:
setwd("C:/Users/Insect1/Documents/Alex/fasta/Treatment_Specific_Analysis")
files <- list.files(".") # Will list all your files
library("seqinr") # load the seqinr package
dnafile <- ("Mrotcontig.fsa") # change the fasta file into an R object
data <- read.csv("R_Index_N6NF_Apr_9_2013.csv") #reads the csv file "Index" into R
myseq <- read.fasta(file = dnafile) # creates a fasta file
for(i in 1:length(data$NAME)){seqname <-data$NAME[i]
#seqnameB <- noquote(seqname)
seqB <- getSequence(eval(parse(text=paste("myseq$", seqname, sep = "")))) # the eval and paste functions are needed to construct the search oommand "myseq$MORT00010950" without spaces or quotes mark which make the command nonfunctional
St <- data[which(data$NAME == seqname), 2]
Ed <- data[which(data$NAME == seqname), 3]
trimmedseq <- seqB[St:Ed]
seqnameC <- paste(seqname,St,"-",Ed)
write.fasta(sequences = trimmedseq, names = seqnameC, file.out = "Blast.txt", open = "a")
}
Blast <-read.fasta("blast.txt", as.string = TRUE)
Is there an easy fix for making the script address the redundancy in the contig IDs?
Thanks,
Alex
Comment