Asset Publisher

DAF ITALIA - Data & Analytics Framework

DAF ITALIA - Data & Analytics Framework

11/Dec/2018

Developing the interoperability of public data between PAs

As regards the exchange of data between the PAs, the practice of stipulating agreements or direct agreements between PAs to regulate the exchange of data necessary for the performance of institutional activities is still widespread. This practice is not scalable and has many limits on the sharing of public sector information.

The DAF – which was approved as part of the Three-year IT Plan for the Public Administration in 2017-2019 - has the objective of developing and simplifying the interoperability of public data between PAs, standardizing and promoting the dissemination of open data, optimizing the processes of data analysis and knowledge generation, supporting scientific research initiatives favoring collaboration with universities and research institutions.

The PA's Big Data Team, set up within the Digital Team, has the task of actively managing the conceptual development and implementation phase of the infrastructure, together with all phases of the data lifecycle, from ingestion to analysis and application development.

The horizontal scalability of the technologies for the management and the analysis of big data allows to extract information from the intersection of multiple databases and process them real-time, allowing to have more perspectives of analysis on a given phenomenon, in a timely manner, generating a significant amplification of the information assets of the PA.

The DAF allows to promote and optimize the exchange of data between PA, minimizing the transaction costs for use and allowing standardized access to a constantly updated data. As a consequence, the use of open data will be more effective, centralizing and redistributing public data through APIs.

Specifically, the DAF consists of a Big Data Platform and a Data Portal. In turn, the platform consists of a data lake, a set of data engines and tools for data communication.

In the data lake, data of potential interest are memorized as databases of data that the PA generate to fulfill their institutional mandate; the data generated by the IT systems of the Public Administrations as log and usage data; authorized data from the web and from social networks of potential interest to the Public Administration.

The big data engines are used to harmonize and process, both in batch and real-time mode, the raw data stored in the data lake and implement machine learning models. To do this, both data exchange tools are being used (useful to facilitate the use of data processed by the subjects involved) and analysis and display of data offered in self-service mode to DAF users.

The Dataportal allows the user to populate their profile with data of interest, extracted from the DAF. In particular, the dataportal consists of a CKAN-based dataset catalog (which manages the metadata related to both the data contained in the DAF and the open data harvested by the PA sites), of user interfaces to access the analysis tools and data visualization. , a module reserved for public administration to manage the process of ingestion and management of data and metadata in the DAF and a module for data stories, through which users can publish their own analyzes and collaborate with other users.

Play a key role in the work, done mostly by AGID, in adopting a common format, the so-called DCAT (and its Italian version), in describing the general characteristics of a data set and the files that compose it. For more advanced users, the Data Portal can be accessed by applications such as Metabase, which allows you to retrieve data with a language similar to SQL.

* Required fields

Write new comment