This vendor-written tech primer has been edited to eliminate product promotion, but readers should note it will likely favor the submitter’s approach.
More businesses are embarking on data lake initiatives than ever before, yet Gartner predicts 90% of deployed data lakes will be useless through 2018 as they’re overwhelmed with data with no clear use cases. Organizations may see the value of having a single repository to house all enterprise data, but lack the resources, knowledge and processes to ensure the data in the lake is of good quality and actually useful to the business.
To truly leverage your organization’s data lake to derive real, actionable insights, there are five best practices to keep in mind:
Ensure you’re populating the data lake with all enterprise data, not just the best data you can get to.
Companies are making massive investments in emerging technologies like Hadoop, Spark and Kafka to build their data lakes, but their ability to gain value and insight is limited by their ability to get data assets from diverse data sources into those environments. Most companies have no trouble ingesting newer sources of data from IoT or mobile devices, but often miss the mainframe, which is inherently difficult to access, but vital to completing the 360-degree view of a business. This data serves as key customer reference data and helps make sense of newer sources.
Consider compliance needs before beginning any data lake project.
Businesses should first discuss their regulatory needs, and when necessary, create a system to preserve a copy of their data in its original, unaltered format. This is especially important in highly-regulated industries like banking, insurance and healthcare, who must maintain data lineage for compliance purposes. To keep up with evolving regulations, businesses also need the flexibility to write and adjust their rules to reflect the ever evolving regulatory updates, and should look for a vendor that provides this.
Create clear, consistent rules to catalogue and govern data.
It’s essential to not only document and catalogue data, but also create enterprise-wide business and technical rules to govern it, preventing misinterpretation by different departments. For example, if a business creates rules to define “mortgage risk” within their data, they’re able to use it to report to regulatory authorities. It’s certainly no easy task, as it often involves multiple sources of data within disparate departments and is subject to human error, as people often add their own free form rules to segment data that may not be clear to others. But when done right, it helps bring value to even the most non-technical person.
Sign up for Computerworld eNewsletters.