python - Beautiful Soup Cleaning and Errors -


i have code:

from bs4 import beautifulsoup import urllib2 lxml import html lxml.etree import tostring  trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index?                             station_ids=kjfk&std_trans=translated&chk_metars=on&hoursstr=most+recent+only&ch    k_tafs=on&submit=submit').read() soup = beautifulsoup(open(trees)) print soup.get_text() item=soup.findall(id="info") print item 

however, when type soup on window gives me error , when program runs gives me long html code

, on. greatful.

the first problem in part:

trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index?station_ids=kjfk&std_trans=translated&chk_metars=on&hoursstr=most+recent+only&chk_tafs=on&submit=submit').read() soup = beautifulsoup(open(trees)) 

trees file-like object, there no need call open() on it, fix it:

soup = beautifulsoup(trees, "html.parser") 

we explicitly setting html.parser underlying parser.


then, need specific going extract page. here example code metar text value:

from bs4 import beautifulsoup import urllib2   trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index?station_ids=kjfk&std_trans=translated&chk_metars=on&hoursstr=most+recent+only&chk_tafs=on&submit=submit').read() soup = beautifulsoup(trees, "html.parser")  item = soup.find("strong", text="metar text:").find_next("strong").get_text(strip=true).replace("\n", "") print item 

prints kjfk 220151z 20016kt 10sm bkn250 24/21 a3007 rmk ao2 slp183 t02440206.


Comments