EN
title

Docubiz.fr Company Registry

docubiz.fr

About the Project

B2B service providing information on organizations from the government information systems of France. Currently, it stores and updates data on approximately 50 million companies, including branches of globally renowned brands. Unique approaches to importing semi-structured data have been developed. Additionally, a data quality analytics strategy has been developed to improve the quality and reliability of the data.

Technologies

PostgreSQL,
Elasticsearch,
Docker,
Laravel,
Python,
Nuxt3,
D3,
AWS,
Jenkins,
RabbitMQ,
Redis,
Grafana,
Prometheus.

Credibility. Reliability. Convenience.

We conducted research and found the main government sources of information about companies. The result is a single organization page that aggregates data from 12 data sources. Each company page provides a set of information, including directors, capitalization, staff size, and legal documents. The service is primarily aimed at assessing the reliability of counterparties and providing transparent organization history dating back to 1978. Plans include service development and integration of additional services, including providing subscription-based API access.

presentation one

Data Quality and Currency

Data quality is the main priority of the company registry. To provide users with clean data, the import process includes multi-level data checks and verification. In case of data inconsistencies with requirements and documentation, an analyst receives a notification to investigate the non-standard case and adjusts the import strategy if necessary. Additionally, a strategy for combining persons with typos in personal data has been developed and implemented.

>1 500 000 000

Mind-Boggling Data Volume

The project database contains over 1.5 billion records in total, including data on people, documents, companies, and their relationships.

>1 000 000 000

Change Tracking

A service has been developed to track changes in any company parameter and store the history of these changes throughout its existence. The total number of change transactions in the database exceeds 1 billion!

Search Quality

Special attention was paid to search configuration. Algorithms have been developed to match search results as closely as possible to the search query. The service takes into account typos, input of specific data such as company IDs, and offers optimal results.

presentation two

Tracking Relationships Between Individuals and Companies

One of the main features of the service  - tracking relationships between millions of companies and individuals. The tool provides unique information that can be used for investigations, assessing counterparty reliability, marketing research, and many other purposes. A unique solution has been developed to optimally display large amounts of data on the D3.js framework.

presentation three

Mobile Devices

Special attention was paid to content adaptation for mobile devices. All information was divided into cards containing information on one of the criteria. SVG resources were applied to optimize loading speed, significantly increasing the rating in Google PageSpeed.

presentation four

Next Project