python - How do I load balance phantomjs using docker-compose and haproxy? -


i have application uses selenium webdriver interface phantomjs. scale things up, want run multiple instances of phantomjs , load balance them haproxy. local application, i'm not concerned deployment production environment or that.

here's docker-compose.yml file:

version: '2' services:   app:     build: .     volumes:       - .:/code     links:       - mongo       - haproxy   mongo:     image: mongo   phantomjs1:     image: wernight/phantomjs:latest     ports:       - 8910     entrypoint:       - phantomjs       - --webdriver=8910       - --ignore-ssl-errors=true       - --load-images=false   phantomjs2:     image: wernight/phantomjs:latest     ports:       - 8910     entrypoint:       - phantomjs       - --webdriver=8910       - --ignore-ssl-errors=true       - --load-images=false   phantomjs3:     image: wernight/phantomjs:latest     ports:       - 8910     entrypoint:       - phantomjs       - --webdriver=8910       - --ignore-ssl-errors=true       - --load-images=false   phantomjs4:     image: wernight/phantomjs:latest     ports:       - 8910     entrypoint:       - phantomjs       - --webdriver=8910       - --ignore-ssl-errors=true       - --load-images=false   haproxy:     image: haproxy     volumes:       - ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro     ports:       - 8910:8910     links:       - phantomjs1       - phantomjs2       - phantomjs3       - phantomjs4 

as can see, i've got 4 instances of phantomjs, 1 haproxy instance, , 1 app (written in python).

here's haproxy.cfg:

global     log 127.0.0.1   local0     log 127.0.0.1   local1 notice     maxconn 4096     daemon  defaults     log     global     mode    http     option  httplog     option  dontlognull     retries 3     option redispatch     maxconn 2000     timeout connect 5000     timeout client 50000     timeout server 50000  frontend phantomjs_front    bind *:8910    stats uri /haproxy?stats    default_backend phantomjs_back  backend phantomjs_back    balance roundrobin    server phantomjs1 phantomjs1:8910 check    server phantomjs2 phantomjs2:8910 check    server phantomjs3 phantomjs3:8910 check    server phantomjs4 phantomjs4:8910 check 

i know need use sticky sessions or in haproxy work, don't know how that.

here's relevant snippet of python app code connects service:

def get_page(url):     driver = webdriver.remote(         command_executor='http://haproxy:8910',         desired_capabilities=desiredcapabilities.phantomjs     )      driver.get(url)     source = driver.page_source     driver.close()      return source 

the error when try run code this:

phantomjs2_1  | [error - 2016-07-12t23:35:25.454z] routerreqhand - _handle.error - {"name":"variable resource not found","message":"{\"headers\":{\"accept\":\"application/json\",\"accept-encoding\":\"identity\",\"connection\":\"close\",\"content-length\":\"96\",\"content-type\":\"application/json;charset=utf-8\",\"host\":\"172.19.0.7:8910\",\"user-agent\":\"python-urllib/3.5\"},\"httpversion\":\"1.1\",\"method\":\"post\",\"post\":\"{\\\"url\\\": \\\"\\\\\\\"http://www.redacted.com\\\\\\\"\\\", \\\"sessionid\\\": \\\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\\\"}\",\"url\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"urlparsed\":{\"anchor\":\"\",\"query\":\"\",\"file\":\"url\",\"directory\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/\",\"path\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"relative\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"port\":\"\",\"host\":\"\",\"password\":\"\",\"user\":\"\",\"userinfo\":\"\",\"authority\":\"\",\"protocol\":\"\",\"source\":\"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url\",\"querykey\":{},\"chunks\":[\"session\",\"4eff6a60-4889-11e6-b4ad-095b9e1284ce\",\"url\"]}}","line":80,"sourceurl":"phantomjs://code/router_request_handler.js","stack":"_handle@phantomjs://code/router_request_handler.js:80:82"} phantomjs2_1  |  phantomjs2_1  |   phantomjs://platform/console++.js:263 in error app_1         | traceback (most recent call last): app_1         |   file "selenium_process.py", line 69, in <module> app_1         |     main() app_1         |   file "selenium_process.py", line 61, in main app_1         |     source = get_page(args.url) app_1         |   file "selenium_process.py", line 52, in get_page app_1         |     driver.get(url) app_1         |   file "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 248, in app_1         |     self.execute(command.get, {'url': url}) app_1         |   file "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute app_1         |     self.error_handler.check_response(response) app_1         |   file "/usr/local/lib/python3.5/site-packages/selenium/webdriver/remote/errorhandler.py", line 163, in check_response app_1         |     raise exception_class(value) app_1         | selenium.common.exceptions.webdriverexception: message: variable resource not found - {"headers":{"accept":"application/json","accept-encoding":"identity","connection":"close","content-length":"96","content-type":"application/json;charset=utf-8","host":"172.19.0.7:8910","user-agent":"python-urllib/3.5"},"httpversion":"1.1","method":"post","post":"{\"url\": \"\\\"http://www.redacted.com\\\"\", \"sessionid\": \"4eff6a60-4889-11e6-b4ad-095b9e1284ce\"}","url":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","urlparsed":{"anchor":"","query":"","file":"url","directory":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/","path":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","relative":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","port":"","host":"","password":"","user":"","userinfo":"","authority":"","protocol":"","source":"/session/4eff6a60-4889-11e6-b4ad-095b9e1284ce/url","querykey":{},"chunks":["session","4eff6a60-4889-11e6-b4ad-095b9e1284ce","url"]}} app_1         | 

so, how load balancing working? missing?

update

i figured out need kind of session management in haproxy. selenium webdriver , phantomjs communicate via sessions. client sends post /session , receives reply session id in body. reply looks this:

{"sessionid":"5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6","status":0,"value":{"browsername":"phantomjs","version":"2.1.1","drivername":"ghostdriver","driverversion":"1.2.0","platform":"linux-unknown-64bit","javascriptenabled":true,"takesscreenshot":true,"handlesalerts":false,"databaseenabled":false,"locationcontextenabled":false,"applicationcacheenabled":false,"browserconnectionenabled":false,"cssselectorsenabled":true,"webstorageenabled":false,"rotatable":false,"acceptsslcerts":false,"nativeevents":true,"proxy":{"proxytype":"direct"}}} 

then, session progresses, session id sent server part of uri in subsequent requests, such get /session/5a27f2b0-48a5-11e6-97d7-7f5820fc7aa6/source. how can grab stuff use sticky sessions in haproxy?

you should able add cookies within haproxy config itself..

cookie serverid insert indirect nocache server  httpd1 10.0.0.19:9443 cookie httpd1 check  server  httpd2 10.0.0.18:9443 cookie httpd2 check  

then sessions stick through haproxy itself.


Comments