Privacy and data mining : evaluating the impact of data anonymization on classification algorithms

Hebert O. Silva; Tania Basso; Regina Moraes

Privacy and data mining : evaluating the impact of data anonymization on classification algorithms

Hebert O. Silva, Tania Basso, Regina Moraes

Material

ARTIGO

Idioma

Inglês

Nota geral

Este artigo foi apresentado no evento 13th European Dependable Computing Conference (EDCC), 2017

Agradecimentos: This work has been partially supported by the project EUBra-BIGSEA (www.eubra-bigsea.eu), funded by the Brazilian Ministry of Science, Technology and Innovation (Project 23614 - MCTI/RNP 3rd Coordinated Call) and by the European Commission under the Cooperation Programme, Horizon...

Agradecimentos: This work has been partially supported by the project EUBra-BIGSEA (www.eubra-bigsea.eu), funded by the Brazilian Ministry of Science, Technology and Innovation (Project 23614 - MCTI/RNP 3rd Coordinated Call) and by the European Commission under the Cooperation Programme, Horizon 2020 grant agreement no 690116. Also, it is supported by the project DEVASSES (www.devasses.eu), funded by the European Union’s FP7 for research, technological development and demonstration under grant agreement no PIRSES-GA-2013-612569

Ver mais

Ver menos

Resumo

Abstract: Data anonymization is a technique used to increase the assurance that private data is not accessible to third parties. In data mining processes, anonymization can impact the results, since anonymized data may hinder the data analysis performed by algorithms commonly used in this context....

Abstract: Data anonymization is a technique used to increase the assurance that private data is not accessible to third parties. In data mining processes, anonymization can impact the results, since anonymized data may hinder the data analysis performed by algorithms commonly used in this context. The goal of this Practical Experience Report is to evaluate the accuracy and per-formance impact of data anonymization on data mining classifiers results. This is done through comparisons of their execution using original and anonymized data. A sample of real data generated by a Brazilian city transportation system associated to fictitious users was anonymized at different stages and classification algorithms, such as ZeroR, KNN (k - Nearest Neighbor), and Naive Bayes, were applied. Contrary to expectations, when the anonymization techniques were introduced in some classes, the accuracy was raised, as well as performance, reducing execution time. These results demonstrate that data anonymization techniques, when properly applied, can contribute to the effectiveness of data mining classifiers

Ver mais

Ver menos

Direito de acesso

Fechado

Assuntos

Análise de dados

Mineração de dados (Computação)

Parte de evento

Autoria

Silva, Hebert de Oliveira, 1989- Autor

Basso, Tânia, 1981- Autor

Moraes, Regina Lúcia de Oliveira, 1956- Autor

European Dependable Computing Conference (13. : 2017 : Geneva, Suíça)

Sites

DOI: https://doi.org/10.1109/EDCC.2017.17

Texto completo: https://ieeexplore.ieee.org/document/8123561

Privacy and data mining : evaluating the impact of data anonymization on classification algorithms

Hebert O. Silva, Tania Basso, Regina Moraes

Privacy and data mining : evaluating the impact of data anonymization on classification algorithms

Hebert O. Silva, Tania Basso, Regina Moraes

Fontes

Proceedings of the 13th European Dependable Computing Conference Piscataway, NJ : Institute of Electrical and Electronics Engineers, 2017. p. 111-116