heatmap - R: Filling out dataframe to create symmetric identity plot -


i have calculations written file , read dataframe arranged follows:

sequence_1  sequence_2  identity cp010953    cp010953    100 cp010953    cp012689    73.9 cp010953    cp000025    73.86 cp010953    cp012149    73.77 cp010953    he978252    73.72999999999999 cp010953    cp009043    83.35000000000001 

the data comes calculation (in python) calculates number of character matches between 2 strings divided length of 1 of strings (both strings have same length). seemed idea @ time, when did calculations, used itertools.combinations_with_replacement command make calculations quicker. so, if comparing 3 strings (a,b,c), compare a&b, a&c, b&c, , not b&a, c&a, , c&b since respectively have same values a&b, a&c, , b&c. problem when read data r , plot heatmap, end this:

enter image description here

that bunch of gaps (you may able see values need there -- example: al111168 , cp000538 (both located on lower left hand side) have value on y axis, not x axis)!

is there way fill in gaps appropriate values in r? in loop, not r-esque. has been asked before, don't think using right search terms.

here bit of code:

args = commandargs(trailingonly=true)  file_name <- args[1] gene_name <- args[2]  image_name = paste(gene_name, '.png', sep='')  mydf <- read.csv(file_name, header=t, sep='\t')     my_palette <- colorramppalette(c('red', 'yellow', 'green'))  png(filename=image_name, width=3750,height=2750,res=300) par(mar=c(9.5,4.3,4,2)) print(corpus <- qplot(x=sequence_1, y=sequence_2, data=mydf, fill=identity, geom='tile') +                      geom_text(aes(label=identity), color='black', size=3) +                      scale_fill_gradient(limits=c(0, 100), low='gold', high='green4') +                     labs(title='campylobacter pair-wise sequence identity comparison', x=null, y=null) +                     guides(fill = guide_legend(title = 'sequence\nsimilarity %', title.theme = element_text(size=15, angle = 0))) + theme(legend.text=element_text(size=12))  +                     theme(axis.text.x=element_text(angle=45, size=14, hjust=1, colour='black'), axis.text.y=element_text(size=14, hjust=1, colour='black')) ) dev.off() 

thank in advance.

enter image description herei figured out way it.

mdf <- mydf colnames(mdf)[1] <- 'sequence_2' colnames(mdf)[2] <- 'sequence_1' newdf <- rbind(mdf, mydf) 

then plot newdf.


Comments