Improving bank data quality and training employees to work with new tools

Case study: An innovation bank

Challenge

Our client is a bank that specializes in cutting-edge financial services for both individual and corporate clients who have requested a service to improve their operations.
The bank took pride in an impressive array of information systems and a corporate data warehouse, but over time grew concerned about the quality of the data contained within. Furthermore, only a select few technical specialists from the service organization (contractor) had access to the existing repository and primary data sources, leaving business users to rely on intermediaries to obtain necessary data sets. This resulted in a significant increase in labor costs due to the numerous intermediate links in the process.

Challenges encountered by the client:

  • The business users were unable to access the necessary data set directly.
  • The data quality in the repository did not meet the standards of the business users.
  • Experts lacked knowledge of the whereabouts of specific information, and there was no documentation on how to operate information systems.
  • A third-party was required to handle the data.
  • There were only a few report templates available, and creating new reports was difficult.

The client has established several strategic objectives:

  • Collect data from multiple sources and offer business users access to a unified data warehouse.
  • Evaluate the current level of data quality.
  • Devise plans and techniques for enhancing data quality.
  • Create a knowledge base for stored data and repository objects.
  • Supply a range of tools and technologies to generate BI reports and develop new analytical reports for multiple departments within the bank.

Solution

In order to meet the client's strategic objectives, our team has implemented several projects to address specific tasks.
The tasks we aimed to solve were as follows:
  • Create a Data Lake and fill it with business data from multiple primary systems.
  • Develop a data and information system that enables users to locate necessary business data.
  • Establish a tool for measuring and managing data quality.
  • Create a "Golden Record" of the client.
  • Install and configure the necessary BI reporting tools and create a database of reports.
  • Train users on how to use the new tools.
To accomplish these goals, we executed three projects.
The first project involved building a corporate warehouse based on MPP Greenplum and setting up an automated process for collecting data from primary sources of business value.
Next, we created a data and information system – a knowledge base that provides users with information on the location and meaning of specific data.
For the second project, we developed an MDM system – a toolkit for managing and monitoring data quality. Its purpose was to address issues such as missing information, duplicates, and erroneous data. This system led to the creation of the client's "Golden Record".
The "Golden Record" is the most reliable, consistent, and complete view of each company's data object (customer, product, counterparty, etc.). It contains all the attributes necessary to describe the client's profile. This data can be accessed by employees for relevant information.
Measuring and improving the quality of data in primary systems allows specialists to identify problem areas in data sources and eliminate them. To monitor the quality of data, our team formulated and described a methodology for calculating a system of indicators that was later programmed and calculated on a daily basis.
The third project involved implementing the Tableau BI analytics tool and building analytical reports. With this BI system, we provided business-relevant information in the form of interactive reports that enabled analysts and managers at various levels to make real-time decisions.
We used 16 data sources from the corporate warehouse in this project, with a planned volume of over 50 TB at the start of the project.

Technologies

Greenplum PostgreSQL Apache Airflow (Redis, PostgreSQL) StreamSets (DC, Transformer) Apache Superset Apache Atlas Liquibase DBeaver Tableau

The final results

Benefits for the client:
  • Development of a Data Lake, consisting of primary data.
  • Access to a unified data warehouse for bank business users.
  • Establishment of the client's "Golden record", which includes over 250 attributes.
  • Capability to self-evaluate and regulate data quality.
  • Creation of a repository of stored data and storage objects.
  • Assistance in management decisions through BI reporting, as well as an added opportunity for bank staff to autonomously generate analytical reports.
Project implementation period: 18 months.