Cours-conférence «  Data Quality Tools : concepts and practical lessons from a vast operational environment », par Gani Hamiti, 13 mars 2019 à 18 heures

Téléchargez les slides d’Isabelle Boydens (1MB) et de Gani Hamiti (5MB).

Dans le cadre du cours STIC-B-510 « Qualité de l’information et des documents numériques » de la filière STIC de l’Université libre de Bruxelles, Gani Hamiti, Data Quality Analyst chez Smals, donnera un cours-conférence intitulé « Data Quality Tools : concepts and practical lessons from a vast operational environment ».

L’événement aura lieu le mercredi 13 mars 2019 de 18 heures à 20 heures à l’Université libre de Bruxelles (auditoire K.4.401, bâtiment K, campus du Solbosch). L’exposé sera suivi d’une séance de questions-réponses ainsi que d’une réception.

Celui-ci s’adresse à un public multidisciplinaire. Le cours-conférence abordera, d’une part, l’analyse critique et l’amélioration de la qualité des bases de données auxquelles peuvent être confrontés des utilisateurs non-informaticiens.  D’autre part, plusieurs fonctionnalités avancées des « data quality tools », parmi les plus élaborées à l’heure actuelle et parfois méconnues de la communauté IT, seront évoquées dans le cadre d’une approche rigoureuse reposant sur une expérience et des exemples concrets.

Le cours-conférence ainsi que l’orateur seront brièvement présentés par Isabelle Boydens, titulaire du cours STIC-B-510, Professeur à l’Université libre de Bruxelles et Data quality Expert au sein du Département « Recherche » de Smals.



A common catchphrase thrown around in IT parties to instantly part the crowd like a human Red Sea is that “Data matures like wine, applications like fish”[efn_note] Initial analogy by James Governor, 2007 (, exact wording commonly attributed to Andy Todd, 2009 ( [/efn_note]. While feature creep and fast paced technical obsolescence may certainly represent challenges to an application’s long-term sustainability, one can easily forget that fine wine only ages better if it is kept in adequate conditions. Furthermore, over time, actual stagnation and non-usage tend to join the ranks of data’s worst enemies.

Facing the ever-growing volume and stakes of data and considerable variety of the problems that it is used to support solving, data quality tools have come to play a major role in helping users to get a better grasp of what is happening into their data as much as tackling the problems that they discover. Practically, they offer technical solutions focused on making the data « fitter » for its uses in a costs-vs-benefits approach. Such tools leverage multiple decades of refined algorithms, optimizations and knowledge bases to deal efficiently with a wide array of challenges, such as building, enforcing and measuring business rules, parsing and validating address or product data, or comparing and matching records of one or multiple databases, possibly from different core business activities.

During this evening, we will go over the most typical features of industry-grade data quality tools and how they can help us to address the aforementioned issues and many more, using examples, stats and facts drawn from a large-scale operational environment.

This presentation will not require prior familiarity with the technical side of data quality and its tools, although that may help grasping underlying challenges and deeper concepts. Furthermore, since this approach is mostly aimed at existing issues in living databases, it integrates very well with the methodological and structural data quality approaches taught at the STIC-B510 « Qualité de l’information et des documents numériques » course, working hand in hand with them.



After graduating from the Master in Information and Communication Science and Technology (Université libre de Bruxelles), Gani Hamiti has been working in the data quality field at Smals.  In this regard, he has been involved in various empirical data migration projects and in widely generalizable application areas such as social fraud detection or accounting information systems integration. In parallel, a significant part of his work consists in spreading awareness and sharing knowledge about data quality.