Load Balancing Large Numbers Of OSGeo Components

Dragan Podvezanec (IGEA d.o.o.) with Dalibor Kušić (IN2 d.o.o.), Sasa Cvitkovic (Croatian State Geodetic Administration)

09:30 on Friday 20th September (in Session 27, starting at 9 a.m., Sir Clive Granger Building: A31)

Show in Timetable

Description: Through the use of OpenSource HAproxy component, we created a large cluster of PostGIS, Geoserver and GeoWebCache components used as one of the cornerstones of the Croatian NSDI.

Croatian State Geodetic Administration required a new Geoportal that would better fulfill its role as one of the corner stones of the national SDI. This Geoportal was developed using mostly Open Source software which consisted of (but not limited to): Geoserver, PostgreSQL/PostGIS, OpenLayers, WordPress CMS and Linux OS. After initial release, it became clear that planned number of users was greatly underestimated. The numbers (around 1000 requests per second on the first day) were a clear proof that there is a great need for fast and reliable access to national spatial data sets and that there is a great public demand for this kind of data A quick system upgrade (both hardware and software) was urgently necessary, but without any significant downtime. As of March 2013, system is constantly serving 320 GB data per day ( about 7.5 TB per month), with an average of 19.500 unique visitors each day. Statistics cover the usage of all available services (WMS, WFS, and WMTS) and web page visits. Focus of this case study is best practices while load balancing large number of OSGeo components using in turn freely available software load balancer.. Our previous software load balancing setup which was based on Apache HTTP server and mod_proxy_balancer extension, did not prove suitable for such a large scale deployment, but an alternative solution had to be found. HAproxy is a free TCP and HTTP load balancer, suited for sites under very heavy load and can manage thousands of concurrent connections on a very modest system. Using HAproxy as dedicated HTTP balancer, users gain many benefits, such as: - Protecting Geoserver backends from overloading by imposing service limits - Defining connection pools for different type of services - Dividing application layer and service layer - Better WMS load balancing for GeoWebCache seeding - Web GUI for easier understanding of load on backend server - Layer 7 health checks with very fast problem detection - Wide choice of load balancing algorithms In this presentation we will demonstrate: - Advantages of using HAproxy instead of Apache mod_proxy_balancer - HAproxy configuration - Common pitfalls - Benchmarks (Apache mod_proxy_balancer, HAproxy)