i have application uses selenium webdriver interface phantomjs. scale things up, want run multiple instances of phantomjs , load balance them haproxy. local application, i'm not concerned deployment production environment or that.
here's docker-compose.yml
file:
version: '2' services: app: build: . volumes: - .:/code links: - mongo - haproxy mongo: image: mongo phantomjs1: image: wernight/phantomjs:latest ports: - 8910 entrypoint: - phantomjs - --webdriver=8910 - --ignore-ssl-errors=true - --load-images=false phantomjs2: image: wernight/phantomjs:latest ports: - 8910 entrypoint: - phantomjs - --webdriver=8910 - --ignore-ssl-errors=true - --load-images=false phantomjs3: image: wernight/phantomjs:latest ports: - 8910 entrypoint: - phantomjs - --webdriver=8910 - --ignore-ssl-errors=true - --load-images=false phantomjs4: image: wernight/phantomjs:latest ports: - 8910 entrypoint: - phantomjs - --webdriver=8910 - --ignore-ssl-errors=true - --load-images=false haproxy: image: haproxy volumes: - ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro ports: - 8910:8910 links: - phantomjs1 - phantomjs2 - phantomjs3 - phantomjs4
as can see, i've got 4 instances of phantomjs, 1 haproxy instance, , 1 app (written in python).
here's haproxy.cfg
:
global log 127.0.0.1 local0 log 127.0.0.1 local1 notice maxconn 4096 daemon defaults log global mode http option httplog option dontlognull retries 3 option redispatch maxconn 2000 timeout connect 5000 timeout client 50000 timeout server 50000 frontend phantomjs_front bind *:8910 stats uri /haproxy?stats default_backend phantomjs_back backend phantomjs_back balance roundrobin server phantomjs1 phantomjs1:8910 check server phantomjs2 phantomjs2:8910 check server phantomjs3 phantomjs3:8910 check server phantomjs4 phantomjs4:8910 check
i know need use sticky sessions or in haproxy work, don't know how that.
here's relevant snippet of python app code connects service:
def get_page(url): driver = webdriver.remote( command_executor='http://haproxy:8910', desired_capabilities=desiredcapabilities.phantomjs ) driver.get(url) source = driver.page_source driver.close() return source
the error when try run code this:
phantomjs2_1 | [error - 2016-07-12t23:35:25.454z] routerreqhand - _handle.error - {"name":"variable resource not found","message":"{\"headers\":{\"accept\":\"application/json\",\"accept-encoding\":\"identity\",\"connection\":\"close\",\"content-length\":\"96\",\"content-type\":\"application/json;charset=utf-8\",\"host\":\"172.19.0.7:8910\",\"user-agent\":\"python-urllib/3.5\"},\"httpversion\":\"1.1\",\"method\":\"post\",\"post\":\"{\\\"url\\\": \\\"\\\\\\\"http://www.redacted.com\\\\\\\"\\\", \\\"sessionid\\\": \\\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\\\"}\",\"url\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"urlparsed\":{\"anchor\":\"\",\"query\":\"\",\"file\":\"url\",\"directory\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/\",\"path\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"relative\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"port\":\"\",\"host\":\"\",\"password\":\"\",\"user\":\"\",\"userinfo\":\"\",\"authority\":\"\",\"protocol\":\"\",\"source\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"querykey\":{},\"chunks\":[\"session\",\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\",\"url\"]}}","line":80,"sourceurl":"phantomjs://code/router_request_handler.js","stack":"_handle@phantomjs://code/router_request_handler.js:80:82"} phantomjs2_1 | phantomjs2_1 | phantomjs://platform/console++.js:263 in error app_1 | traceback (most recent call last): app_1 | file "selenium_process.py", line 69, in <module> app_1 | main() app_1 | file "selenium_process.py", line 61, in main app_1 | source = get_page(args.url) app_1 | file "selenium_process.py", line 52, in get_page app_1 | driver.get(url) app_1 | file "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 248, in app_1 | self.execute(command.get, {'url': url}) app_1 | file "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute app_1 | self.error_handler.check_response(response) app_1 | file "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 163, in check_response app_1 | raise exception_class(value) app_1 | selenium.common.exceptions.webdriverexception: message: variable resource not found - {"headers":{"accept":"application/json","accept-encoding":"identity","connection":"close","content-length":"96","content-type":"application/json;charset=utf-8","host":"172.19.0.7:8910","user-agent":"python-urllib/3.5"},"httpversion":"1.1","method":"post","post":"{\"url\": \"\\\"http://www.redacted.com\\\"\", \"sessionid\": \"4eff6a60-4889-11e6-b4ad-095b9e1284ce\"}","url":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","urlparsed":{"anchor":"","query":"","file":"url","directory":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/","path":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","relative":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","port":"","host":"","password":"","user":"","userinfo":"","authority":"","protocol":"","source":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","querykey":{},"chunks":["session","4eff6a60-4889-11e6-b4ad-095b9e1284ce","url"]}} app_1 |
so, how load balancing working? missing?
update
i figured out need kind of session management in haproxy. selenium webdriver , phantomjs communicate via sessions. client sends post /session
, receives reply session id in body. reply looks this:
{"sessionid":"5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6","status":0,"value":{"browsername":"phantomjs","version":"2.1.1","drivername":"ghostdriver","driverversion":"1.2.0","platform":"linux-unknown-64bit","javascriptenabled":true,"takesscreenshot":true,"handlesalerts":false,"databaseenabled":false,"locationcontextenabled":false,"applicationcacheenabled":false,"browserconnectionenabled":false,"cssselectorsenabled":true,"webstorageenabled":false,"rotatable":false,"acceptsslcerts":false,"nativeevents":true,"proxy":{"proxytype":"direct"}}}
then, session progresses, session id sent server part of uri in subsequent requests, such get /session/5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6/source
. how can grab stuff use sticky sessions in haproxy?
you should able add cookies within haproxy config itself..
cookie serverid insert indirect nocache server httpd1 10.0.0.19:9443 cookie httpd1 check server httpd2 10.0.0.18:9443 cookie httpd2 check
then sessions stick through haproxy itself.
Comments
Post a Comment