An assurance process for Big Data trustworthiness

Abstract

Modern (industrial) domains are based on large digital ecosystems where huge amounts of data and information need to be collected, shared, and analyzed by multiple actors working within and across organizational boundaries. This data-driven ecosystem poses strong requirements on data management and data analysis, as well as on data protection and system trustworthiness. However, although Big Data has reached its functional maturity and represents a key enabler for enterprises to compete in the global market, the assurance and trustworthiness of Big Data computations (e.g., security, privacy) are still in their infancy. While functionally appealing, Big Data does not provide a transparent environment with clear non-functional properties, impairing the users’ ability to evaluate its behavior and clashing with modern data-privacy regulations. In this paper, we present a novel assurance process for Big Data, which evaluates the Big Data pipelines, and the Big Data ecosystem underneath, to provide a comprehensive measure of their trustworthiness. To the best of our knowledge, this approach is the first attempt to address the general problem of Big Data trustworthiness in an holistic way. We experimentally evaluate our solution in a real Big Data Analytics-as-a-Service environment, first presenting a detailed walkthrough evaluation, and then showing its feasibility and negligible performance overhead (i.e., approx 1 min).

Type
Publication
Future Generation Computer Systems