jueves, 19 de agosto de 2010

What's Essential -- And What's Not -- In Big Data Analytics (Columnar Data Base?)


Far from arguing over the benefits (or drawbacks) of a column-based architecture, shops would be better advised to focus on other, potentially more important issues. Row- or column-based engines marketed by Aster Data, Dataupia, Greenplum Software Inc. (now an EMC Corp. property), Hewlett-Packard Co. (HP), InfoBright, Kognitio, Netezza, ParAccel, Sybase Inc. (now an SAP AG property), Teradata, Vertica, and other vendors (to say nothing of the specialty warehouse configurations marketed by IBM, Microsoft, and Oracle) are by definition architected for Big Analytics.

Analytic database vendors today compete on the basis of several options -- capabilities such as in-database analytics, support for non-traditional (typically non-SQL) query types, sophisticated workload management, and connectivity flexibility.

Every vendor has an option-laden sales pitch, of course -- but few (if any) stories are exactly the same. In-database analytics is particularly hot, according to Eckerson. All analytic database vendors say they support it (to a degree), but some -- such as Aster Data, Greenplum, and (more recently) Netezza, Teradata, and Vertica -- seem to support it "more" flexibly than others.

"With in-database analytics, scoring can execute automatically as new records enter the database rather than in a clumsy two-step process that involves exporting new records to another server and importing and inserting the scores into the appropriate records," he explains.

The twist comes by virtue of (growing) support for non-SQL analytic queries, chiefly in the form of the (increasingly ubiquitous) MapReduce algorithm. Aster Data and Greenplum have supported in-database MapReduce for two years; more recently, both Netezza and Teradata, along with IBM, have announced MapReduce moves. Last month, open source software (OSS) data integration (DI) player Talend announced support for Hadoop (an OSS implementation of MapReduce) in its enterprise DI product. Talend's MapReduce implementation can theoretically support in-database crunching in conjunction with Hadoop-compliant databases.

"[T]echniques like MapReduce make it possible for business analysts, rather than IT professionals, to custom-code database functions that run in a parallel environment," he writes. As implemented by Aster Data and Greenplum, for example, in-database MapReduce permits analysts or developers to write reusable functions in many languages (including the Big Five of Python, Java, C, C++, and Perl) and invoke them by means of SQL calls.

Such flexibility is a harbinger of things to come, according to Eckerson. "[A]s analytical tasks increase in complexity, developers will need to apply the appropriate tool for each task," he notes. "No longer will SQL be the only hammer in a developer's arsenal. With embedded functions, new analytical databases will accelerate the development and deployment of complex analytics against big data."

No hay comentarios:

Publicar un comentario