DSA Position Paper – Data as an Asset Class

Nov 21

How Decentralized Storage serves the new data as an asset class model

Introduction

In 2006, the phrase “Data is the New Oil” became part of the lexicon and with it a growing interest in the value of digital information and data. What soon followed was expansions on this analogy such as, “like oil, data is valuable, but if unrefined it cannot really be used”. In the decades since, enterprises and institutions have been struggling to learn how to monetize their data all the while continuing to store vast amounts of data.

The advent of AI, autonomous vehicles and robots, and other data-driven innovations has renewed interest in this topic to the point that legislation is now being proposed and adopted around the world that enumerates how data can be leveraged as an asset class, along with greater specificity as to the rights of data owners.

In fairness, though, given the concerns surrounding these innovations, data might more aptly described as ‘uranium yellowcake’ rather than oil, in that it requires proper safeguards and controls before it can be processed into ‘hugely valuable single sources of truth’.

The purpose of this paper is to make readers aware of the strategic opportunities with data being more properly treated as an asset as well as better align market requirements for decentralized data storage and access with network and protocol design and development.

This document briefly describes the characteristics of decentralized storage of data to qualify as a legal asset and the properties required in a system to provide value to data owners and custodians. We will include a short description of the categories of data and the differences in handling that data. Finally, we will propose some examples of how existing technology can be used to create such an initial system, along with expected advancements which will meet the Web3 promise of “Read, Write, Own”.

Legal Requirements

Brazil and UAE have both created regulatory frameworks for data as an asset class which are expected to help shape global frameworks for how data is defined and controlled as well as the rights that may be bestowed upon its creators and/or owners. This enhanced focus on data regulatory controls looks to address the implications of AI on data usage, rights and liabilities. And while the use cases typically concern bodies of information that might be used to inform a Grok or ChatGPT, the growing ability for AI models to synthesize and make use of a wide variety of data sources makes these frameworks noteworthy to all data users and owners.

Key concepts underlying a data asset:

Data type categories
Data ownership (data subject in privacy terminology)
Data provenance (chain of custody to original creator)
Data access control

Additional concepts this paper doesn’t cover include:

Data monetization
Data retrievability latencies
Data licensing
Data liability
Data trade barriers

Data Characteristics and Properties

Data Sensitivity Categories

Broadly, we can think of four categories that determine the way data is handled.

Public data - data that is free to use i.e. in the public domain without a declared owner. One example would be an 1880s edition of the works of William Shakespeare, another is government legislation that is expressly placed in the public domain on its creation.
Private data - data that is owned and controlled by a person or entity, exposure would not be catastrophic. Examples would be streamed music or videos that are sold or rented online.
Secret data - data that is owned and deemed highly confidential and the unauthorized exposure of which would be very damaging. Examples of this include personally identifiable information and individual healthcare records.
Critical data - this is data that is so sensitive it would be stored offline and any exposure would be catastrophic. Examples of this would be pre IPO data plans and documents intended for SEC filings, or the formula for Coca-Cola.

Data Ownership

Ownership of data entitles the data owner to classify the data types as enumerated above or as they see fit. It grants the owner the right to control the usage and access to their data and the degree of controls they see necessary to protect their data. In the context of decentralized data storage, this necessitates the ability of the data to be identified positively with an owner. [Proof of Ownership]

Data Provenance

Data is not static; it can change hands, it can move around, it can be copied. For data to have legal status as an asset, some form of provenance or chain of custody must be in place. In the old Web2 world, the concept of ‘possession is nine-tenths of the law’ simply meant that if you stored data in your data center or with a cloud provider, then you would be able to claim ownership. This does not work in a decentralized Web3 environment where data may move among decentralized physical infrastructure providers. A proof of ownership linked to a proof of content is far more suitable.

Data Access Control

In addition to public data, data access needs to be controlled by the owner of the data, whether to limit access or to track access. This is a mature field in the Web2 world where centralization controls access, but in the Web3 world, decentralization makes this much more challenging. Access control and management are absolute requirements for the concept of owner control and legal claims.

Example System Architecture

This section is not intended to be proscriptive or propose a specific solution, but rather to give ideas for how to create a system that would meet the criteria of any asset class that satisfies market and legal requirements. I am ignoring public data (since it falls outside the full definition of an asset class) and critical data as it is outside of the scope of this document.

One of the best Web3 examples of existing proof of ownership is the widespread use of NFTs. While people often think of an NFT as an immutable storage of digital art, it is a certificate of ownership of art or anything that can be referenced digitally.

A non-fungible token (NFT) is a unique digital identifier that is recorded on a blockchain and is used to certify ownership and authenticity. It cannot be copied, substituted, or subdivided.

The beauty of NFTs is that they can be sold or traded or used to denote a series of objects, such as a collection of prints, e.g., number 25 of 100 prints. This is quite important for an asset class to trace ownership or royalties for an object along with its history (provenance) of its lifecycle.

To review, in order to lay the groundwork for digital data ownership, we first start with data that has value to the owner and can be controlled and positively linked to the owner. This allows the legal frameworks to be solidly built around the intellectual property rights of any creation by a person once it has been saved or sent over digital media (memory sticks, drives, texts, video calls, streaming, etc). In today's world, you typically lose your rights when you post your data online to a third party system.

In a Web3 context, exercising ownership requires a digital identity. Having a secret key or online ID and password not only jeopardizes the security of data ownership but also makes it unwieldy and prone to loss and misuse. Decentralized IDentifiers (DIDs) are a current way to properly associate an identity of an owner with an asset (whether NFTs or the data assets themselves).

Next, how do we connect the NFT (or other certificate of ownership) to an actual data asset? Luckily in the Web3 world we are familiar with the concept of Content IDentifiers (CIDs) which mathematically prove that digital content is authentic and can be associated with an owner. In the legal world, the oldest (date and time) documented artifact determines ownership rights.

DID → NFT→ CID = legal claim of ownership or custodianship.

Finally, to meet market requirements for monetization and confidentiality, we must provide methods of access control and proof of access. Most current methods require some form of centralization, but decentralized access control is still in its infancy. Tools such as smart contracts can be used to manage policies; however, enforcement of data access needs to be baked into the system and specified as a standard.

Summary

As of this writing, decentralized storage has focused on the technical needs to store data without regard for legal rights of data owners. Building a system that is both easy to use and provides legal protections is critical for widespread market adoption. The challenges are not insignificant but these four key concepts outlined above need to be taken into account to meet the desirability of any Web3 based system.

The view of Data as an Asset as a theme is one that will define the rest of the decade and the years to come, not only as the breadth of what data means becomes fully understood but also because of the derived uses. Of which we only see a small fraction so far.

Tom France