Long live the “Community Edition”!
Pentaho Data Integration (lovingly known as PDI) is arguably one of the best ETL tools available in a Community Edition variant. I’m often involved in projects where at a high level, the goal is produce “something” that takes data from a source, manipulates it, then stores it in a RDBMS for consumption by an unrelated process. And honestly, whether good or bad, I’ve come to rely on PDI because in that regard, it can do just about anything . Furthermore, since much of what I do is outside of our branded solutions (e.g. custom), I find that my customers are more at ease about taking ownership of a work product that is say a custom ETL job in PDI rather custom application or script.
PDI, to put it simply, is easy mode. And on projects where time, budget, complexity and [lack of] resources are concerns, [for me] it often saves the day. Do I need to read JSON from an endpoint? No problem, PDI has Rest Client and a JSON step. XML? No problem. CSV/Excel/NoSQL/RDBMS? No problem. Strong encryption? Well… that may be a problem…
The thing about “Community” edition tools and products is that their manufacturers generally need to divest them of the “enterprise” features usually found in their non-free variants, otherwise there won’t be any reason for Enterprises to invest in them. For PDI, at least one of these enterprise features is AES encryption, which presents a challenge for the ETL designer when you are forced to handled sensitive data like usernames and passwords. And given the fact that you can pretty much connect to any data source in PDI, the way you handle those credentials inside of the transform; where they get stored, how they are eventually passed to the step, etc is a big deal. So, since I absolutely love PDI but I need strong encryption, I figured out a way to leverage one of its most interesting features to effectively undo the handicap.
User Defined Java Classes
As the Pentaho Data Integration manual states:
You can use the User Defined Java Class step to enter your own Java class to drive the functionality of a complete step
The feature uses Janino (link here) which does not implement the full Java 1.8 spec, but fortunately enough of it to accomplish most goals. Essentially, what this step allows for one to do is use external classes within a step. So for example, if someone perhaps just happened to have a Java class that could decrypt a string encrypted with AES 128 bit encryption then you could use this step to said string inside of the transform before passing it to another step like so:
And the contents of that step:
import com.xxxxxx.services.utils.*; import java.io.File; import java.io.IOException; import org.apache.commons.io.FileUtils; String keyfileField; String saltfileField; String encpasswordField; String passwordField; public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException { if (first) { keyfileField = getParameter("KEY_FILE_FIELD"); saltfileField = getParameter("SALT_FILE_FIELD"); encpasswordField = getParameter("ENC_PASSWORD_FIELD"); passwordField = getParameter("PASSWORD_FIELD"); first=false; } Object[] r = getRow(); if (r == null) { setOutputDone(); return false; } Object[] outputRow = createOutputRow(r, data.outputRowMeta.size()); String salt = null; String key = null; String keyfile = get(Fields.In, keyfileField).getString(r); String saltfile = get(Fields.In, saltfileField).getString(r); String encPassword = get(Fields.In, encpasswordField).getString(r); File key_file = new File(keyfile); File salt_file = new File(saltfile); try { key = new String(FileUtils.readFileToString(key_file)); salt = new String(FileUtils.readFileToString(salt_file)); } catch (IOException e) { throw new AssertionError("Error while accessing the file(s): " + e.getMessage(), e); } PasswordUtils p = new PasswordUtils(); String value = p.decryptPassword(encPassword, key, salt); get(Fields.Out, passwordField).setValue(outputRow, value); putRow(data.outputRowMeta, outputRow); return true; }
So here’s what’s happening. The PasswordUtils class referenced here provides a main class which when run will take a user supplied string to be encrypted (the password) and a user supplied string a keying to produce keying material (the key). The output is the encrypted password string and a randomly generated salt value.
The values of the key and the salt are stored (securely) on the file system where PDI executes. The file name and path plus the encrypted credentials are used as parameters for the job. Those parameters are passed as variables into the transform. The “Add constant” step creates a null field to hold the decrypted password string. The User Defined Java Class step gets the value(s) of the values of the file locations for both the salt and the key and uses commons-io to read the contents of those files to a string. The PasswordUtils class is instantiated and the decryptPassword method is used to decrypt the encrypted password string using the salt and key strings. The resulting string is placed in an output row in the password field created in the “Add constants” step.
So there you have it! A way to get “enterprise” grade encryption in the community edition version of PDI!
Cheers!
-Guru
Hi, I was trying this but I’m getting error:
> Cannot determine simple type name “PasswordUtils”
any chance you can share those java classes?
Thank you.