Data Integration for Empowering Cloud-Based Analytics
When a CIO is considering cloud as a platform there are many transformational opportunities for an IT organization to consider. One area in particular that is truly transformational is to shift the IT workload of business analytics to the cloud. However, most CIOs don’t have the luxury of simple applications or greenfield analytics; instead, they may have 100’s or 1000’s of business applications and a very complex data architecture to worry about. It is no easy task to ‘lift and shift’ to the Cloud. This is where data integration in the cloud can provide an enabling role in transforming the organization into a more responsive and lower-cost cloud business.
Running analytics in the cloud may be as simple as operating a reporting tool from your cloud provider’s infrastructure. Analytics on the cloud could also be far more powerful (and complex) by hosting an expansive big data environment for staging, transforming and auditing business data from the cloud. A wide variety of opportunities exist between these two extremes, but would typically involve doing some kind of reporting, data visualization, advanced statistical analytics (eg; machine learning) in support of a data mart, data warehouse or data lake that’s hosted in the cloud.
In practice, CIOs find that some of the most difficult aspects of enabling analytics in the cloud surround the data integration and data governance activities. The seemingly simple task of bringing the application data together and keeping track of it is made incredibly difficult by the breadth and complexity of existing application data flows that CIOs already watch out for in their on-premise infrastructure. Therefore, a comprehensive data integration cloud solution must be able to solve for three fundamental use cases:
Data integration may well become the foundation of a truly next-generation cloud solution that enables business transformation for your organization
1. Migrating Data from Ground to Cloud – without additional overhead, connect and migrate data continuously from existing application data sources into the cloud
2. Integrating Data for Marts and Warehouses – quickly load, ETL and govern data that is used for reporting and enterprise data warehouses
3. Data Lifecycle for Big Data Lakes – manage the complete lifecycle of data activities from ingestion to retirement within the big data cloud infrastructure
As a pragmatic observation, what often starts out as a relatively simple task of handling “offline reporting” in the cloud (simple migration of data) can quickly escalate into more complex data warehousing (blending data from many sources) and then ultimately big data scenarios (full copies of data, long lifecycle for downstream audits). In a mature organization this evolution is considered as part of the up-front planning process and not left to ad-hoc or happenstance decisions. Crucially, once the up-front planning considers the whole landscape, it becomes clear that a holistic data integration approach is required to be successful.
After helping many CIOs grapple with these challenges, here are some pro-tips to consider:
1. Think Solutions, Not Piece-Parts – with the emergence of cheap and simple IaaS (Infrastructure as a Service) it is tempting to just throw some data in the cloud and try things. But experienced CIOs know that they need to balance self-service requirements with a strong data architecture, and that for analytics in particular, a well considered solution architecture is necessary to provide reporting and analytics that is trust-worthy. Consider teaming with a vendor that has a strong vision that spans from the Application tier (SaaS), through the Platform tier (PaaS) as well as Infrastructure (IaaS). A vendor with a more expansive view is more likely to be thinking about enterprise solutions and not just selling infrastructure units.
2. Don’t Disrupt Operations – most of the time a CIO will be grappling with IT services that are already in place and running, the demand to move analytics to the cloud must work hand-in-glove with the need to ensure minimal disruption to existing IT services. Consider leveraging proven technologies such as CDC (Change Data Capture) and Replication tools that are minimally invasive and work by reading database logs and only moving changed records. These kinds of tools have been proven in low-downtime data migrations for decades and a mature cloud data integration vendor will offer these kinds of services in the cloud.
3. Scaffold Data Architecture for the Future – it can sometimes be tempting to start the cloud journey and treat the cloud assets as an appendage to core IT services, but consider for a moment that these new cloud services may actually be the future of your core IT services. This kind thinking requires a mature and well-considered approach to cloud architecture. For example, your future IT data workloads will increasingly need to be lower latency, more realtime, and processing will be in-memory more often.
4. Assume Workloads Will Shift Around – in the data architecture of yesterday it was commonplace to build “hub-and-spoke” solutions. Data was consolidated to a hub, work was done on it, then the data was extracted from the hub. But a more modern, and open-source, approach to data architecture is to “let the data lay” and “bring the workload to the data”. In practice, this means being able to ingest and transform your data assets in various locations – without having to require physical consolidation. Once you are in the cloud, you will appreciate this capability even more because now that the infrastructure is hosted, you don’t want to necessarily be aware of location.
5. Don’t Try to Retrofit Governance Afterwards – data governance is the art and science of ensuring that data can be trusted. Specialist activities such as data profiling, glossary / taxonomy maintenance, governance council workflow, and reliable data lineage / provenance are essential to maintain trust in the business data that is at the heart of the analytics. Too often in the past, data governance was forgotten in the early architecture phases and attempted to “add-on” later, but this doesn’t work well in practice. Once you’ve made the decision to move some of your data architecture to cloud, plan for governance from the beginning and you’ll be glad that you did.
There are many reasons for CIOs to go to the cloud. As the demand for more powerful and less expensive data analytics drives more cloud adoption, the need for data integration in the cloud will amplify even more. Use cases around data migrations, data warehousing and big data lakes will become the cornerstone capabilities that all data integration cloud solutions must handle well. Once you accomplish these goals, data integration may well become the foundation of a truly next-generation cloud solution that enables business transformation for your organization.