The volume of corporate data is growing. How can you avoid the increase in storage costs with big data solutions?
table of content

Challenge

The Client: one of Europe's largest providers of mobile services, online TV, and mobile payment solutions.

Project implementation period: 6 months.

The client used the Apache Hive storage system based on the Hadoop platform (this is proprietary software) to work with corporate data.

The client contacted us with a widespread problem. The volume of the company's data was constantly growing. The need to store it carried with it an increase in the cost of using storage for the following reasons:

  • It was necessary to purchase new equipment for storing new data.
  • The client had to pay annually for the Apache Hive (Hadoop) license.
  • The increase in the number of equipment used increased operating costs.

The goal of the project was to provide the opportunity to accumulate new data and store historical data without having to degrade storage performance and increase the cost of ownership.

The tasks that we have set to achieve the goal of the project:

  • To create a new corporate data warehouse.
  • To provide connection to the primary sources, as well as to ensure the collection and transformation of data in accordance with business logic.
  • To migrate the original and accumulate the primary data for the correct operation of business users.
  • To introduce data management techniques and analytics to improve overall data handling.
  • To train the specialists of the client company to work with the new solution and transfer the necessary technical documentation for the project.

Solution

At the project's start, the data volume was about 260 TB. In comparison, the estimated volume of information by the end of the first year was expected to be 338 TB, which would have required increased server capacities and the purchase of additional Hive licenses.

Our experts suggested that the client replace the current software solution with an alternative one — the Greenplum distributed database. With the help of open-source software development, the proposed solution eliminated several problems while providing additional benefits.

When approaching the final stage of the project, BlitzBrain data engineering specialists took on the responsibility of training the technical specialists of the client company to work with a new solution, before handing over the necessary documentation. This enabled the client to reduce the cost of the employee onboarding process and immerse current employees in the latest technology.

Technology

Python
Oracle
GreenPlum
MySQL
MS SQL
Docker
PostgreSQL

The final result

As a result of the project, the client was able to:

  • Get a new solution for accumulating and storing corporate data that was not equal to the previous one regarding performance, fault tolerance, and security.
  • Ensure that business users have a smooth experience with corporate data and provide secure data management.
  • Reduce the cost of storage ownership due to the absence of mandatory license fees. Using Hadoop, companies pay, on average, from USD 1,000 to USD 2,000 for 1 TB of stored data. At the beginning of the project, the storage capacity was about 260 TB, with the annual data growth expected to be 30%.
  • Independently maintain the storage, as our specialists prepared a technical support team on the client’s side and handed over the necessary documentation for training new employees.

Contact us

Sales department
sales@blitz-brain.com
Marketing department

Ready to discuss a project?

Tell us about your project in any form that is convenient for you, whether it is a clearly defined specification or a concept description.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.