Anywhere from 50-80% of a data scientist’s time is estimated to be spent on cleaning and preparing data for analytics. This is needed because data is often inconsistent, has gaps, and has to be enriched from multiple external sources to present a full picture. For example, customer transactions gathered from various sources often have different product SKU names, locations that must be reconciled, as well as duplicate records that must be removed. This also implies that enterprises spend a large chunk of their business innovation budget on the data management problem, even before it reaches the data scientists.
In this blog, we’ll outline why this problem occurs and how Actian’s integration with Daml, the open source smart contract platform created by Digital Asset , can greatly alleviate this problem. Less time spent on preparation means more time spent on actual analytics leading to direct business benefits. We also have much happier data scientists!
A unifying business process
In a digital economy, the industry is moving towards a collaborative model based not only on shared data, but also shared business processes. Traditionally, applications and organizations have interacted with each other through various means such as APIs and messaging protocols, and have maintained their own copies of the data for reasons that naturally come about as a result of an evolving technology landscape. Multiple copies of data require constant and expensive reconciliations, and a parallel infrastructure of checks and balances to ensure business agility and optimal customer experiences. It is no surprise that any aggregation of all these silos for analysis purposes also needs significant clean up and matching.
It is in this context that applications leveraging smart contracts in Daml provide us a new model to think about enabling such collaboration. Each application that is a party on a smart contract based workflow has access to a single golden source of the data while still enjoying complete flexibility with respect to privacy and isolation of data to meet various regulatory, audit, and compliance requirements. The business functions can be executed with the knowledge that the underlying data layer can not get out of sync between applications. This architecture means that the traditional problems of reconciliation and data inconsistencies can be eliminated to a significant extent.
A New Challenge:
While smart contracts and blockchain based applications provide tremendous business and operational benefits when it comes to clean and consistent data, we also need to be able to analyze the operations, derive insights and apply those insights to improve future operations. For example, AML (Anti Money Laundering) and fraud analytics on financial transactions. One approach is to take a periodic snapshot of the smart contract data store and do these analytics on an offline, copy of the database. However, this approach can lead us to the same data island and the same reconciliation problems.
Enabling analytics on smart contract data directly and solving this data synchronization problem is what Actian set out to solve with our integration with Daml. Traditionally, Actian’s Data Connect product has addressed the data aggregation challenge through an exhaustive set of connectors to various enterprise applications (ERP, CRM, various custom databases, marketing systems etc.). Using Data Connect, enterprises have been able to get the data where they want it, when they want it, and the form they want it in.
However, smart contracts and blockchains provide a new set of challenges. First, since the Daml runtime has granular privacy guarantees, it is not just a simple matter of retrieving all existing data from the blockchain into an offline database - that is neither easily allowed nor desired. Second, the data that is stored into the smart contracts needs to be transformed to make sense to analytics applications. And finally, we do not just want to create an offline copy, but instead prefer it to be constantly in sync with the smart contract store to avoid the reconciliation and inconsistency problem.
The Solution Components
We created a couple of real life applications to demonstrate this exciting capability. The first was a supply chain use case that allowed retailers and manufacturers to share demand data. Second was a financial use case on Swaps using the ISDA CDM where a trade lifecycle was executed between brokers, clients and central counterparties.
Modeling & Business Operations: To make the scenarios realistic we modeled real life parties for these use cases, such as multiple brokers and clients for the financial use case. These multi-party Daml applications were then hosted on a scalable cloud backend for Daml applications (Daml Hub) that abstracts away the ledger and other plumbing complexities from the participating entities. These distributed applications generated transaction data as part of typical business operations. For example, trade proposals, approved trade instances, cash movements etc.
Data Synchronization: The next step in the process was to make this data available for detecting patterns such as demand trends in case of the supply chain use case, and routine business intelligence on trade information in case of the financial use case. This is where we integrated Actian’s Data Connect product with Daml. We had 2 options. We could either act on behalf of a participating entity, or we could act as a 3rd party regulator or agency that requires access to certain parts of the information. While the connection semantics are the same, we highlight this point because even in the latter case, no special access workarounds are needed. This is because Daml models allow for what is called “Observers” which can also be considered as a party on the network. These observers could be a reporting entity / application, regulator or a financial intermediary. The Data Connect integration can be built in a way to enable continuous synching with the data this Observer receives which meant that any updates to the contracts on the ledger would trigger updates in Data Connect as well. Using this approach also helped us avoid accessing data that reporting systems are not required to store.
Transformation: Smart contract data must be transformed into a format that makes sense for business analytics. We used Data Connect’s built in data exploration and mapping tool to perform this activity. At the end of this step, the data on the Daml ledger is in perfect sync with Data Connect but in a form that is also readily consumable by analytics applications in real time.
Enrichment: Once we had the base ledger data in the form we needed, we then used Data Connect’s comprehensive set of connectors to pull in information from external applications to enrich the data from the ledger. For example, for the trade information, we pulled up client profiles from an external database (dummied up for the purposes of this demonstration). In an ideal world, all of this data might reside on the ledger as reference data. However, in a practical enterprise technology deployment roadmap, functionality is often deployed in phases, or sometimes various constraints do not allow the ideal point of arrival architecture to be realized. We also have multiple systems such as ERPs, CRMs, billing systems, fraud systems etc. to account for. The data enrichment stage was designed within this prototype to account for those realistic scenarios.
Analytics & Insights: Finally, to complete the end to end use case, we used Vector, Actian’s high performance analytics database as an endpoint. Both ledger and enrichment data were then populated into Vector allowing business users to slice and dice the data as desired. Vector also allows for different kinds of visualization and analytic tools to be used such as Tableau, Kibana and others. While we used Actian’s Vector for this integration, we could also have seamlessly used Avalanche, Actian’s cloud data warehouse, or any other enterprise database that is used by organizations. Data Connect has ready connectors with almost every enterprise system.
Alternatives
There are several alternatives to this architecture which are useful in some cases. For example, for simple decisioning at the streaming level - as real time ledger updates are happening - we can use Daml automations to trigger those actions without having to first pull the data into a synchronized data store. However, these decisions that need to be taken by the automations are often those that will be generated and defined through various kinds of advanced analytics already completed. That necessary advanced analytics and enrichment of data is enabled by the integration we have outlined in this post.
A second alternative is provided by the Daml SDK itself, and is very useful while testing applications as they are built. The extractor tool within the SDK allows pulling of the ledger data into an offline DB such as PostgreSQL. Once we get to any significant analytics and business intelligence however, we need advanced transformation and enrichment capabilities that Data Connect provides.
Conclusion
We hope we were able to outline an important use case in data analytics and prediction that is frequently coming up as enterprises move to the next step of their DLT programs. Actian’s integration with Daml is an important step towards enabling the digital transformation journeys of our clients. It also showcases the effectiveness of open technology stack of Daml towards provisioning of an enterprise ready landscape. While Daml applications can run on multiple ledgers and databases, this integration with Actian’s Data Connect provides a wide array of enterprise integrations to enable enrichment of data.
As enterprises move forward in their DLT and smart contracts journey, they must begin to start thinking in terms of their overall enterprise architecture and data analytics roadmap. Both of those will need to be defined as a more flexible, and contextual business process is created with the help of Daml smart contracts based workflows.