i have code:
from bs4 import beautifulsoup import urllib2 lxml import html lxml.etree import tostring trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index? station_ids=kjfk&std_trans=translated&chk_metars=on&hoursstr=most+recent+only&ch k_tafs=on&submit=submit').read() soup = beautifulsoup(open(trees)) print soup.get_text() item=soup.findall(id="info") print item
however, when type soup on window gives me error , when program runs gives me long html code
, on. greatful.
the first problem in part:
trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index?station_ids=kjfk&std_trans=translated&chk_metars=on&hoursstr=most+recent+only&chk_tafs=on&submit=submit').read() soup = beautifulsoup(open(trees))
trees
file-like object, there no need call open()
on it, fix it:
soup = beautifulsoup(trees, "html.parser")
we explicitly setting html.parser
underlying parser.
then, need specific going extract page. here example code metar text
value:
from bs4 import beautifulsoup import urllib2 trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index?station_ids=kjfk&std_trans=translated&chk_metars=on&hoursstr=most+recent+only&chk_tafs=on&submit=submit').read() soup = beautifulsoup(trees, "html.parser") item = soup.find("strong", text="metar text:").find_next("strong").get_text(strip=true).replace("\n", "") print item
prints kjfk 220151z 20016kt 10sm bkn250 24/21 a3007 rmk ao2 slp183 t02440206
.
Comments
Post a Comment