i have calculations written file , read dataframe arranged follows:
sequence_1 sequence_2 identity cp010953 cp010953 100 cp010953 cp012689 73.9 cp010953 cp000025 73.86 cp010953 cp012149 73.77 cp010953 he978252 73.72999999999999 cp010953 cp009043 83.35000000000001
the data comes calculation (in python) calculates number of character matches between 2 strings divided length of 1 of strings (both strings have same length). seemed idea @ time, when did calculations, used itertools.combinations_with_replacement command make calculations quicker. so, if comparing 3 strings (a,b,c), compare a&b, a&c, b&c, , not b&a, c&a, , c&b since respectively have same values a&b, a&c, , b&c. problem when read data r , plot heatmap, end this:
that bunch of gaps (you may able see values need there -- example: al111168 , cp000538 (both located on lower left hand side) have value on y axis, not x axis)!
is there way fill in gaps appropriate values in r? in loop, not r-esque. has been asked before, don't think using right search terms.
here bit of code:
args = commandargs(trailingonly=true) file_name <- args[1] gene_name <- args[2] image_name = paste(gene_name, '.png', sep='') mydf <- read.csv(file_name, header=t, sep='\t') my_palette <- colorramppalette(c('red', 'yellow', 'green')) png(filename=image_name, width=3750,height=2750,res=300) par(mar=c(9.5,4.3,4,2)) print(corpus <- qplot(x=sequence_1, y=sequence_2, data=mydf, fill=identity, geom='tile') + geom_text(aes(label=identity), color='black', size=3) + scale_fill_gradient(limits=c(0, 100), low='gold', high='green4') + labs(title='campylobacter pair-wise sequence identity comparison', x=null, y=null) + guides(fill = guide_legend(title = 'sequence\nsimilarity %', title.theme = element_text(size=15, angle = 0))) + theme(legend.text=element_text(size=12)) + theme(axis.text.x=element_text(angle=45, size=14, hjust=1, colour='black'), axis.text.y=element_text(size=14, hjust=1, colour='black')) ) dev.off()
thank in advance.
mdf <- mydf colnames(mdf)[1] <- 'sequence_2' colnames(mdf)[2] <- 'sequence_1' newdf <- rbind(mdf, mydf)
then plot newdf.
Comments
Post a Comment