Using NoSQL & HTML5 Libraries To Rapidly Generate Interactive Web Visualisations Of High-volume Spatio-temporal Data

Jack Harrison (Ordnance Survey)

10:00 on Saturday 21st September (in Session 53, starting at 9 a.m., Sir Clive Granger Building: A41)

Show in Timetable

Description: The challenges and successes in implementing real-time, browser-based social media analysis and visualisation with open source tools.

Twitter has developed over the past few years into a potent source of public opinion and comment. The service passed 500 million users in June 2012, collectively posting hundreds of millions of tweets each day, and several high-profile analyses of this data (such as the Twitter Political Index, which mapped sentiment across the US towards the 2012 presidential candidates over the course of their campaigns) have demonstrated its potential for insight and near-time customer feedback. Handling such large volumes and throughputs of data is a sizeable engineering challenge, however, and several commercial ventures (TweetReach, Tweet Archivist - many others) have sprung up specifically to deal with this complexity - at a cost. In addition, many existing solutions are unable to properly utilise the location data that is present in a significant proportion of tweets, losing out on the rich geographical context. This retrospective aims to demonstrate how an informed coupling of emerging open-source component technologies can be used to resolve the complex problems of i. large stored data volumes, ii. real-time streaming input, iii. concurrency of writes and iv. geographically querying and visualising results - with a minimal development outlay. Specifically, the construction of an open-source process to read, process, write, query and visualise streaming, geolocated Twitter data using the MongoDB NoSQL database and D3.js JavaScript library will be detailed, focusing on how MongoDB handles real-time spatial data (including spatial indexes & querying) and the unique features that make D3 so well-suited to visualising and exploring spatial data in the web browser.