How to improve data quality management and work independently with new tools
table of content

Challenge

Data analytics and data quality management

The client: An innovative commercial bank

The bank requested a service that would provide high-tech financial services for private clients and businesses. The bank has dozens of information systems and a corporate data warehouse. Over time, the client had become concerned about the quality of the data contained in the repository.

In addition, only a limited number of users had access to the existing repository and primary data sources. As a rule, these were technical specialists of the service organization (contractor). The business user was forced to turn to intermediaries to obtain a specific data set. A large number of intermediate connections within work operations entailed additional labor costs.

Problems faced by the client:

  • Business users couldn’t work directly with the required data set.
  • The quality of the data in the existing repository didn’t meet the requirements of business users.
  • Specialists didn’t know the location of certain information, and there was no documentation on how to work with information systems.
  • The client had to involve a third party to work with the data.
  • There was a limited number of reports and difficulty in creating new ones.

The client set several strategic goals:

  • To ensure data collection from various sources and provide business users access to a single data warehouse.
  • To measure the existing level of data quality.
  • To determine strategies and methods for improving the level of data quality.
  • To improve data management and introduce options for data analytics.
  • To formulate a knowledge base about stored data and repository objects.
  • To provide a set of tools and technologies for creating BI reporting and developing new analytical reports for several departments of the bank.

The data management tasks:

  • To prepare a data lake and fill it with business data from as many primary systems as possible.
  • To create a data and information system that enables the user to determine where the necessary business data is located.
  • To create a tool to measure and manage data quality.
  • To create a ‘golden record’ of the client.
  • To install and configure the necessary tool for BI reporting and start creating a database of reports.
  • To train users on how to use the new tools.

Solution

Master Data Management and data analytics system

The implementation consisted of 3 projects:

In the first project, BlitzBrain built a corporate warehouse based on MPP Greenplum and also set up an automatic collection of data from primary sources of business value.

Then, the BlitzBrain data specialists created a data and information system — a knowledge base concerning the meaning of specific data so that the user could determine where they were located.

As part of the second project, BlitzBrain built an MDM system — a toolkit for managing and monitoring data quality. Its task was to resolve the problem of missing information, existing duplicates, and erroneous data. As a result of this system operation, it was possible to form the client’s golden record.

A customer's ‘golden record’ is the most reliable, consistent, and complete view of each company's data object (customer, product, counterparty, etc.). It contains all the attributes necessary to describe the portrait of the client. This data can be accessed by employees in order to use the relevant information.

Measuring and improving data quality in primary systems allows specialists to identify problem areas in data sources and eliminate them. BlitzBrain experts formulated and described a methodology for calculating a system of indicators, which was programmed and calculated daily to monitor data quality.

The third project was responsible for implementing the Tableau BI analytics tool and building analytical reports.

With the help of a business intelligence (BI) system, business-relevant information was provided in the form of interactive reports that enabled analysts and managers of various levels to make business decisions in real-time.

As part of the project, 16 data sources from the corporate data warehouse were used. The planned volume at the start of the project was more than 50 TB.

Project implementation period: 18 months.

Technology

GreenPlum
PostgreSQL
Apache Airflow (Redis, PostgreSQL)
StreamSets
Apache Superset
Apache Atlas
Liquibase
DBeaver
Tableau

The final result

Benefits for the client:

  • Creation of a data lake consisting of primary data.
  • Obtained access to a single data warehouse by the bank's business users.
  • Formation of the client’s golden record, consisting of more than 250 attributes.
  • Ability to self-measure and manage data quality through comprehensive master data management (MDM)
  • Creation of a knowledge base of stored data and storage objects.
  • Support of management decisions in the form of BI reporting, as well as the additional opportunity for bank employees to independently generate analytical reporting.

Contact us

Sales department
sales@blitz-brain.com
Marketing department

Ready to discuss a project?

Tell us about your project in any form that is convenient for you, whether it is a clearly defined specification or a concept description.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.