Thanks to Michael Yoder The best data protection strategy is to remove sensitive information from everyplace it’s not needed Have you ever wondered what sort of “sensitive” information might wind up in Apache Hadoop log files? For example, if you’re storing credit card numbers inside HDFS, might they ever “leak” into a log file outside of HDFS? What about SQL queries? If you have a query like select * from table where creditcard = '1234-5678-9012-3456' , where is that query information ultimately stored? This concern affects anyone managing a Hadoop cluster containing sensitive information. At Cloudera, we set out to address this problem through a new feature called Sensitive Data Redaction , and it’s now available starting in Cloudera Manager 5.4.0 when operating on a CDH 5.4.0 cluster. Specifically, this feature addresses the “leakage” of sensitive information into channels unrelated to the flow of data–not the data stream itself. So, for example, Sensitive...