Data Transformation in Big Data Appliance or Oracle Exadata – a comparison

In a data ware house environment new data is often read out of csv files. This csv files are connected as an external table to the database. In this scenario you need a high available file system, where the text files are stored. The idea is now to use hdfs – hadoop file system – as a high available, high-performance filesystem. Hdfs is than an high performance, high available filesystem and in additional a database where first transformation can be executed without stress the main database.

In the blog I show the way to integrate this data in the data ware house and perform some easy transformations like generate keys. I useĀ  BigDataSQL external tables to connect. On the one hand I create everything in the data base on the other hand I create the keys in an Hive/hadoop environment. The keys are based on Md5 hashes as described in the last blog.
Continue reading