Clojure has always been good at manipulating data. With the release of spec and Onyx (“a masterless, cloud scale, fault tolerant, high performance distributed computation system”) good became best. In this talk you will learn about a streaming data layer architecture build around Kafka and Onyx that is self-describing, declarative, scalable and convenient to work with for the end user. The focus will be on the power and elegance of describing data and computation with data; the inferences and automations that can be built on top of that; and how and why Clojure is a natural choice for tasks that involve a lot of data manipulation, touching both on functional programming and lisp-specifics such as code-is-data.
We will look at how such an approach can be used to manage a data warehouse by automatically inferring materialized views from raw incoming data or other views based on a combination of heuristics, statistical analysis (seasonality, outlier removal, ...) and predefined ontologies. Doing so is a practical way to maintain a large number of views, increasing their availability and abstracting the complexity into declarative rules, rather than having an ETL pipeline with dozens or even hundreds of hand crafted tasks.
The system described requires relatively little effort upfront but can easily grow with one's needs both in terms of scale as well as scope. With its good introspection capabilities and strong decoupling it is for instance an excellent substrate for putting machine learning algorithms in production, which is the final use-case we will dive into.
Built my first computer out of Lego bricks and learned to program soon after. Emergence, networks, modes of thought, limits of language and expression are what makes me smile (and up at night). Currently working at GoOpti making the company data-driven; setting up our analytics infrastructure (end goal: provide any answer stemming from data in 2 min or less); and building our predictive-realtime-superduper pricing engine.