trying figure out how pull following data r:
http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=3&sch=on&format=0
this works, want eliminate junk on top , bottom, , scores.
read.fwf('http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=3&sch=on&format=0', widths=c(11,26,3,26,3,4,21), skip = 8)
first of welcome stack exchange! changed somethings in code such needed 6 widths, had column got rid of that. when pulling in data online noticed first row pretty strange got ride of , manually added later.
data <- read.fwf('http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=3&sch=on&format=0',widths=c(10,26,3,26,3,4), sep = "\t", header = false, skip = 8) # line subsets data don't have "junk" @ bottom , deletes row # html tagging. data <- data[2:2424,] data <- data.frame(data) # create vector has column headers names <- c("date", "team1","runs", "team 2","runs","something") colnames(data) <- names # create first row of data deleted. firstrow = data.frame("2016-04-03", "@pirates", 4, "cardinals",1,"") colnames(firstrow) <- names finaldata <- rbind.data.frame(firstrow,data)
for future reference if can post screenshot of deem junk helpful people attempting question.
update
data <- read.fwf('http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=3&sch=on&format=0', widths=c(10,26,3,26,3,4), sep = "\t", header = false, skip = 9) data <- data.frame(data) # line subsets data don't have "junk" @ bottom , deletes row # html tagging. firstrow <- read.fwf('http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=3&sch=on&format=0', widths=c(-8,-1,-1,9,26,3,26,3,4), sep = "\t", header = false, n = 1, skip = 8) firstrow <- data.frame(firstrow,stringsasfactors=false) firstrow[,1] <- paste("2",firstrow[1,1],sep = "") # create vector has column headers names <- c("date", "team1","runs", "team 2","runs","something") colnames(data) <- names colnames(firstrow) <- names finaldata <- rbind.data.frame(firstrow,data)
the negative values column move data over, played around until worked out missing in first row "2". paste in "2" , use rbind function create full data frame. hope helps out.
i tested on page well: http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=2&sch=on&format=0 , worked expected.
Comments
Post a Comment