Big Data SQL and Hive – adding Md5 hashes to Oracle and Hive tables

The objective

To use data over several different system it is necessary to create an unique identifier. Out of the data vault 2.0 idea, the best way is to use hashes (see Dan Linstedt). Md5 hashes are available on the most systems. It makes sense to use Md5 as hashing algorithm.  If the same key (hash) is available on all systems, we can use queries across DB and Hive, e.g. using Big Data SQL, based on keys. Where to generate the key data, in Oracle DB or at the level of hadoop, can be decided based on the available resources. In the next part I describe the creation of the keys in Hive and Oracle DB.
Continue reading

Converting XSD element sequence to map item with key/value in a XSLT transformation

To connect to existing webservices with Oracle BPM, SOA or OSB is easy. To give your business user the ability to work with it you must often do some changes on it. The easiest way is to use OSB to change the security level or some XML elements. If you have larger changes you must build some SOA processes. In this example the source webservices uses a key – value structure. This structure is not easy to use out of BPM without detailed XML/XSL know how. For this example I use SOA Suite 12c and an existing webservice on OSB. The XSLT transformation is possible in SOA Suite and OSB.
Continue reading

Virtual Exalytics Infrastructure and Architectural Experiences

Since the beginning of this year it is possible to run virtual machines on Exalytics and use the infiniband stack on it. Now you can use the full power of your exadata with virtual exalytics environment. Furthermore you have the opportunity to license your software per core depending of the virtual machine size. Oracle named it trusted partitions. After Installation of two virtual Exalytics I try to write down a summary of my experiences.

Continue reading

Parallel Execution of R Scripts in Oracle DB

With Oracle R Enterprise it is possible to run R scripts inside an Oracle database. The idea is to use the performance of a database server to make running R code fast. We can also use it on an Oracle exadata database machine. We have than two ways on query data in parallel: On the one hand we can query data in parallel in the database (SQL parallel query) on the other hand  we can run more than one external R processes.  The blog below shows to run R processes parallel as external processes on DB server.  It depends on the R code, if it is possible to run more than one R process. E.g for scoring it should be possible. Continue reading