Sunday 15:15–16:00 in Audimax

Big Data Systems Performance: The Little Shop of Horrors

Jens Dittrich

Audience level:
Intermediate

Description

The confusion around terms such as like NoSQl, Big Data, Data Science, SQL, and Data Lakes often creates more fog than clarity. However, clarity about the underlying technologies is crucial to designing the best technical solution in any field relying on huge amounts of data including `data science' and machine learning. In this talk I will try to lift the fog.

Abstract

The confusion around terms such as like NoSQl, Big Data, Data Science, SQL, and Data Lakes often creates more fog than clarity. However, clarity about the underlying technologies is crucial to designing the best technical solution in any field relying on huge amounts of data including `data science', machine learning, but also more traditional analytical systems such as data integration, data warehousing, reporting, and OLAP.

In my presentation, I will show that often at least three dimensions are cluttered and confused in discussions when it comes to data management: First, buzzwords (labels & terms); second, data design patterns (principles & best practices); and Third, software platforms (concrete implementations & frameworks).

Only by keeping these three dimensions apart, it is possible to create technically-sound architectures in the field of big data analytics.

I will show concrete examples, which through a simple redesign and wise choice of the right tools and technologies, run thereby up to 10,000 times faster. This in turn triggers tremendous savings in terms of development time, hardware costs, and maintenance effort.

Subscribe to Receive PyData Updates

Subscribe