1st - 5th AUGUST 2014

Brisbane Convention & Exhibition Centre

Menu
Create Account
  • Mini-Conferences
    August 1
  • Presentations
    August 2-3
  • Sprints
    August 4-5

<-- Back to schedule

Stream Based Processing for Data Analysis

Data analysis and prediction (and the insights gained from them) is making a significant impact across many domains. Python is particularly well suited to data analysis for many reasons, including two particularly well developed and supported packages in Pandas (http://pandas.pydata.org/) and Scikit-Learn (http://scikit-learn.org/stable/). The other emerging trend in the analytics and prediction domain is the emergence of real-time, stream based processing. Rather than using single shot data analysis methods, where all of the data must be available before analysis, stream based processing allows you to perform online data analysis and prediction as data becomes available from a variety of different, and often disparate, sources. In this talk I will introduce the concept of computational graphs and talk about existing implementations (i.e. Apache Storm). Most importantly I will present a flexible real-time (i.e. event-based) data analysis and prediction architecture and demonstrate a basic implementation in Python that leverages the power of a number of Python-based data analysis and prediction libraries mentioned above. Finally I will discuss important related topics including data-provenance, and concurrency.

Lachlan Blackhall