There has been a lot of discussion lately that data lakes will transform analytics, giving us access to a huge volume of data with a variety and velocity rarely seen in the past. For those of you who don’t spend your days trawling analytics or big data blogs, the concept of a data lake is simple.
With a traditional data warehouse, the repository is heavily structured, so all the work to convert the data from its raw structure needs to be implemented before the data enters the repository. This makes it expensive to add new data sources and limits the analytics solution so only known, pre-determined questions can be asked.
Object store repositories like Hadoop are designed to store just about any data, in its raw state, with little cost or effort. As a result, it becomes cost effective for organisations to store pretty much everything, on the off chance it might be useful at a later date.
The advantage from an analytics perspective, is a data lake gives access to a much vaster, and richer source of data that facilitates data mining and data discovery in a way that is just impossible with a data warehouse. The disadvantage is the lack of structure provides real challenges for performance, data governance and providing context within which less technical users can be self-sufficient.
These challenges need to be met by those of us that design and build analytics solutions. Here at DataPA, we’ve spent years building a platform that facilitates data governance and context in a live data environment. With our technology and experience there are few companies better placed to take advantage of this new opportunity. Like most new developments, data lakes will not be a golden bullet to solve all analytics requirements. However, we do think they have a significant part to play in the future of analytics and can’t wait to see what opportunities they bring for us and our customers.