Privacy and data mining : evaluating the impact of data anonymization on classification algorithms
ARTIGO
Inglês
Este artigo foi apresentado no evento 13th European Dependable Computing Conference (EDCC), 2017
Agradecimentos: This work has been partially supported by the project EUBra-BIGSEA (www.eubra-bigsea.eu), funded by the Brazilian Ministry of Science, Technology and Innovation (Project 23614 - MCTI/RNP 3rd Coordinated Call) and by the European Commission under the Cooperation Programme, Horizon...
Agradecimentos: This work has been partially supported by the project EUBra-BIGSEA (www.eubra-bigsea.eu), funded by the Brazilian Ministry of Science, Technology and Innovation (Project 23614 - MCTI/RNP 3rd Coordinated Call) and by the European Commission under the Cooperation Programme, Horizon 2020 grant agreement no 690116. Also, it is supported by the project DEVASSES (www.devasses.eu), funded by the European Union’s FP7 for research, technological development and demonstration under grant agreement no PIRSES-GA-2013-612569
Abstract: Data anonymization is a technique used to increase the assurance that private data is not accessible to third parties. In data mining processes, anonymization can impact the results, since anonymized data may hinder the data analysis performed by algorithms commonly used in this context....
Abstract: Data anonymization is a technique used to increase the assurance that private data is not accessible to third parties. In data mining processes, anonymization can impact the results, since anonymized data may hinder the data analysis performed by algorithms commonly used in this context. The goal of this Practical Experience Report is to evaluate the accuracy and per-formance impact of data anonymization on data mining classifiers results. This is done through comparisons of their execution using original and anonymized data. A sample of real data generated by a Brazilian city transportation system associated to fictitious users was anonymized at different stages and classification algorithms, such as ZeroR, KNN (k - Nearest Neighbor), and Naive Bayes, were applied. Contrary to expectations, when the anonymization techniques were introduced in some classes, the accuracy was raised, as well as performance, reducing execution time. These results demonstrate that data anonymization techniques, when properly applied, can contribute to the effectiveness of data mining classifiers
Fechado
DOI: https://doi.org/10.1109/EDCC.2017.17
Texto completo: https://ieeexplore.ieee.org/document/8123561
Privacy and data mining : evaluating the impact of data anonymization on classification algorithms
Privacy and data mining : evaluating the impact of data anonymization on classification algorithms
Fontes
Proceedings of the 13th European Dependable Computing Conference Piscataway, NJ : Institute of Electrical and Electronics Engineers, 2017. p. 111-116 |