Elementos básicos de Análisis Inteligente de Datos

Autores/as

Jaramillo-Chuqui, Iván Fredy
Universidad Técnica Estatal de Quevedo
https://orcid.org/0000-0003-2743-1794
Villarroel-Molina, Ricardo
Universidad Técnica Estatal de Quevedo
https://orcid.org/0000-0002-6171-9815

Palabras clave:

Inteligencia, Datos, R project

Sinopsis

Este libro trata sobre conceptos elementales junto con scripts cortos de código basado en R Project para hacer análisis inteligente de datos. La relación entre la teoría y la práctica es fundamental en la comprensión de una disciplina, así la aplicación de procedimientos y funciones específicas en tareas elementales es el propósito de este texto. La idea central del texto tiene origen en la asignatura denominada “Análisis Inteligente de Datos”, una cátedra en la que el profesor aporta con elementos fundamentales basados en conceptos y ejercicios prácticos usando R Project. Hoy en día, la disponibilidad de herramientas para la minería de datos es sin duda muy grande. Usuarios con conocimientos básicos pueden aprovechar de utilitarios intuitivos implementados en poderosos entornos de desarrollo. Nosotros hemos querido dar un enfoque al texto hacia una audiencia con mayor relación a la programación y software. Específicamente que constituya una guía básica para estudiantes que inician en el campo de la Inteligencia de datos.

Citas

Acuna, E., & members of the CASTLE group at UPR-Mayaguez Puerto Rico. (2015). dprep: Data Preprocessing and Visualization Functions for Classification. R Package version 3.0.2.

Agrawal, R., & Srikant, R. (1994). Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th International Conference on Very Large Data Bases, 487–499.

Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining Association Rules between Sets of Items in Large Databases. SIGMOD Rec., 22(2), 207–216.

Akaike, H., Petrov, B. N., & Csaki, F. (1973). Second international symposium on information theory. Akademia Kiado.

Al Shalabi, L., Shaaban, Z., & Kasasbeh, B. (2006). Data mining: A preprocessing engine. Journal of Computer Science, 2(9), 735–739.

Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., & Herrera, F. (2011). KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17(2–3), 255–287.

Alcalá-Fdez, J., Sánchez, L., García, S., Jesús, M. D., Ventura, S., Guiu, J. M., Otero, J., Romero, C., Bacardit, J., Santos, V., Fernández, J. C., & Herrera, F. (2009). KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13, 307–318.

Amraoui, H., Mhamdi, F., & Elloumi, M. (2019). Fast Bat Algorithm for Predicting Diabetes Mellitus Using Association Rule Mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Vol. 11888 LNAI.

Ara Shaikh, A., Nirmal Doss, A., Subramanian, M., Jain, V., Naved, M., & Khaja Mohiddin, M. (2022). Major applications of data mining in medical. Materials Today: Proceedings, 56, 2300–2304.

Bachman, C. W. (1972). The Evolution of Storage Structures. Communications of the ACM, 15(7), 628–634.

Batayev, N. (2018). Gas Turbine Fault Classification Based on Machine Learning Supervised Techniques. 2018 14th International Conference on Electronics Computer and Computation (ICECCO), 206–212.

Bell, L., Chambers, J. M., Bickel, P. J., Cleveland, W. S., & Dudley, R. M. (1984). S an Interactive Environment for Data Analysis and Graphics. CRC Press, Inc.

Boriah, S., Chandola, V., & Kumar, V. (2008). Similarity Measures for Categorical Data: A Comparative Evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining (SDM) (pp. 243–254).

Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory, 144–152.

Bramer, M. (2016). Principles of datamining. In Principles of datamining (3rd ed.). Springer-Verlag.

Buyanova, S. N., Shchukina, N. A., Temlyakov, A. Y., & Glebov, T. A. (2023). Artificial intelligence in pregnancy prediction. Russian Bulletin of Obstetrician- Gynecologist, 23(2), 83 – 87.

Carcillo, F., Le Borgne, Y.-A., Caelen, O., Kessaci, Y., Oblé, F., & Bontempi, G. (2021). Combining unsupervised and supervised learning in credit card fraud detection. Information Sciences, 557, 317–331.

Chakraborty, S., Jana, G. C., Kumari, D., & Swetapadma, A. (2020). An improved method using supervised learning technique for diabetic retinopathy detection. International Journal of Information Technology, 12(2), 473–477.

Chen, M., Yin, C. J., & Xi, Y. P. (2011). A New Clustering Algorithm Partition K- Means. Advanced Materials and Computer Science, 474, 577–580.

Chen, X. (2020). Analysis of Classification of Discretization Method. 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), 186–190.

Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.

Colonna, L. (2013). A taxonomy and classification of data mining. SMU Sci. & Tech. L. Rev., 16, 309.

Cristianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods.

Di Noia, A., Martino, A., Montanari, P., & Rizzi, A. (2020). Supervised machine learning techniques and genetic optimization for occupational diseases risk prediction. Soft Computing, 24(6), 4393–4406.

Dua, D., & Graff, C. (2017). {UCI} Machine Learning Repository.

Ebrahim, O. A., & Derbew, G. (2023). Application of supervised machine learning algorithms for classification and prediction of type-2 diabetes disease status in Afar regional state, Northeastern Ethiopia 2021. Scientific Reports, 13(1).

Eskin, E., Prerau, A., & Sal, P. S. (2002). A Geometric Framework for Unsupervised Anomaly Detection. In B. D. Sushil & Jajodia (Eds.), Applications of Data Mining in Computer Security (pp. 77–101). Springer US.

Evaluation for Decision Support Systems. Algorithms, 15(4).

Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis: Fifth edition. In Cluster Analysis: Fifth Edition.

Fayyad, U. M., & Irani, K. B. (1993). Multi-Interval Discretization of Continuous- Valued Attributes. In Conference on Artificial Intelligence (pp. 1022–1027).

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37–54.

Ge, Y., Li, Z., & Zhang, J. (2023). A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods. Scientific Reports, 13(1).

Giha, F. E., Singh, Y. P., & Ewe, H. T. (2003). Customer profiling and segmentation based on association rule mining technique. Proceedings of the IASTED International Conference on Software Engineering and Applications, 7, 37–42.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press. Gordan, M., Sabbagh-Yazdi, S.-R., Ismail, Z., Ghaedi, K., Carroll, P., McCrum, D., &

Gupta, V. K., & Parsad, R. (n.d.). History of Statistics on Timeline.

Hacibeyoglu, M., & Ibrahim, M. H. (2018). EF_Unique: An Improved Version of Unsupervised Equal Frequency Discretization Method. Arabian Journal for Science and Engineering, 43(12), 7695–7704.

Hamzehi, M., & Hosseini, S. (2022). Business intelligence using machine learning algorithms. Multimedia Tools and Applications, 81(23), 33233 – 33251.

Han, J., Pei, J., & Yin, Y. (2000). Mining Frequent Patterns without Candidate Generation. SIGMOD Rec., 29(2), 1–12.

Happy, S. L., Dantcheva, A., & Bremond, F. (2019). A Weakly Supervised learning technique for classifying facial expressions. Pattern Recognition Letters, 128, 162–168.

Heinze, G., Wallisch, C., & Dunkler, D. (2018). Variable selection - A review and recommendations for the practicing statistician. Biometrical Journal. Biometrische Zeitschrift, 60(3), 431–449.

Hernández Millán, Á. R., Mendoza-Moreno, M., Portocarrero López, L. M., & Castro-Romero, A. (2018). Comparative Study of Machine Learning Supervised Techniques for Image Classification Using an Institutional Identification Documents Dataset. 2018 Congreso Internacional de Innovación y Tendencias En Ingeniería (CONIITI), 1–6.

Jahangiri, M., Kazemnejad, A., Goldfeld, K. S., Daneshpour, M. S., Mostafaei, S., Khalili, D., Moghadas, M. R., & Akbarzadeh, M. (2023). A wide range of missing imputation approaches in longitudinal data: a simulation study and real data analysis. BMC Medical Research Methodology, 23(1).

Jaramillo, I. F., Garzás, J., & Redchuk, A. (2021). Numerical Association Rule Mining from a Defined Schema Using the VMO Algorithm. Applied Sciences, 11(13).

Jaramillo, I. F., Villarroel-Molina, R., Pico, B. R., & Redchuk, A. (2021). A Comparative Study of Classifier Algorithms for Recommendation of Banking Products. Trends and Applications in Information Systems and Technologies: Volume 2 9, 253–263.

Jiang, S., Li, X., Zheng, Q., & Wang, L. (2009). Approximate equal frequency discretization method. 2009 WRI Global Congress on Intelligent Systems, 3, 514–518.

Jiawei, H., Jian, P., & Hanghang, T. (2022). Data Mining: Concepts and Techniques. (4th ed.). Morgan Kaufmann.

Jiawei, H., Kamber, M., & Pei, J. (2012). Data mining: concepts and techniques. (3rd ed., Vol. 3). Morgan Kaufmann.

KEEL. (s.f.). Unsupervised data sets. KEEL. https://sci2s.ugr.es/keel/category.php?cat=uns

Kerber, R. (1992). Chimerge: discretization of numeric attributes. Proceedings Tenth National Conference on Artificial Intelligence, 123–128.

Kotsiantis, S., & Kanellopoulos, D. (2006). Discretization Techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, 32(1), 47–58.

Kramer, O. (2013). Dimensionality Reduction. In Dimensionality Reduction with Unsupervised Nearest Neighbors (pp. 33–52). Springer Berlin Heidelberg.

Kumar, K., & Pande, B. P. (2022). 2 - Applications of supervised machine learning techniques with the goal of medical analysis and prediction: A case study of breast cancer. In S. Roy, L. M. Goyal, V. E. Balas, B. Agarwal, & M. Mittal (Eds.), Predictive Modeling in Biomedical Data Mining and Analysis (pp. 21– 47). Academic Press.

Kutuzova, T., & Melnik, M. (2018). Market basket analysis of heterogeneous data sources for recommendation system improvement. Procedia Computer Science, 136, 246–254.

Li, K.-P., & Porter, J. E. (1988). Normalizations and selection of speech segments for speaker recognition scoring. ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing, 595–596.

Li, Z., Lin, X., Zhang, Q., & Liu, H. (2020). Evolution strategies for continuous optimization: A survey of the state-of-the-art. Swarm and Evolutionary Computation, 56, 100694.

Lin, H.-K., Hsieh, C.-H., Wei, N.-C., & Peng, Y.-C. (2019). Association rules mining in R for product performance management in industry 4.0. Procedia CIRP, 83, 699–704.

Liu, H., Hussain, F., Tan, C. L., & Dash, M. (2002). Discretization: An Enabling Technique. Data Mining and Knowledge Discovery, 6(4), 393–423.

Liu, J. W. Q. L. S. W. Y. (2019). Sentiment Analysis Method Based on Kmeans and Online Transfer Learning. Computers, Materials & Continua, 60(3), 1207– 1222.

Lou, N. (2022). Analysis of the Intelligent Tourism Route Planning Scheme Based on the Cluster Analysis Algorithm. Computational Intelligence and Neuroscience, 2022.

Majid, A. M., & Utomo, W. H. (2021). Application of discretization and adaboost method to improve accuracy of classification algorithms in predicting diabetes mellitus. ICIC Express Letters, Part B: Applications, 12(12), 1177 – 1184.

McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133.

McNeil, D. R. (1977). Interactive Data Analysis. New York: Wiley.

Mondal, P. K., Foysal, K. H., Norman, B. A., & Gittner, L. S. (2023). Predicting Childhood Obesity Based on Single and Multiple Well-Child Visit Data Using Machine Learning Classifiers. Sensors, 23(2).

Moret, B. M. E. (1982). Decision Trees and Diagrams. ACM Comput. Surv., 14(4), 593–623.

Müllner, D. (2011). Modern hierarchical, agglomerative clustering algorithms. Odhiambo, P., Okello, H., Wakaanya, A., Wekesa, C., & Okoth, P. (2023).

Mutational signatures for breast cancer diagnosis using artificial intelligence. Journal of the Egyptian National Cancer Institute, 35(1).

Park, J., Müller, J., Arora, B., Faybishenko, B., Pastorello, G., Varadharajan, C., Sahu, R., & Agarwal, D. (2023). Long-term missing value imputation for time series data using deep neural networks. Neural Computing and Applications, 35(12), 9071 – 9091.

Pérez, C., & Santín, D. (2008). Minería de datos. Técnicas y herramientas. Madrid: Paraninfo. Revista Lasallista de Investigación-Árboles de Decisión Como Metodología Para Determinar El Rendimiento Académico En, 104.

Queen, J. Mac. (1967). Some Methods for Classification and Analysis of Multivariate Obser- vations. Roceedings of the 5 Th Berkeley Symposium on Mathematical Statistics and Proba- Bility, 281–297.

Ramírez, C., Orallo, J. H., & Quintana, J. R. (2004). Introducción a la Minería de Datos (1st ed.). Pearson.

Rezayi, S., Maghooli, K., & Saeedi, S. (2021). Applying Data Mining Approaches for Chronic Kidney Disease Diagnosis. International Journal of Intelligent Systems and Applications in Engineering, 9(4), 198–204.

Russell, S. J., & Norvig, P. (1995). Learning in Neural and Belief Networks. In I. Prentice-Hall (Ed.), Artificial Intelligence A Modern Approach. Alan Apt.

Rwzhang. (2018). seeds dataset [Data set]. https://www.kaggle.com/datasets/rwzhang/seeds-dataset

Samali, B. (2022). State-of-the-art review on advancements of data mining in structural health monitoring. Measurement, 193, 110939.

SAS Institute Inc. All. (2015). SAS Enterprise miner: Reveal valuable insights with powerful data mining software. http://www.sas.com/en_us/software/analytics/enterprise-miner.html.

Shanwu, S., Shuru, T., & Nan, W. (2020). Study on The Prediction of Electricity Stealing Based on Improved SMOTE Algorithm and Ensemble Learning. 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), 242–248.

Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310. Silva, H., & Bernardino, J. (2022). Machine Learning Algorithms: An Experimental

Sokolova, M., & Lapalme, G. (2009). A Systematic Analysis of Performance Measures for Classification Tasks. Inf. Process. Manage., 45(4), 427–437.

Šulc, Z., & Řezanková, H. (2019). Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering. Journal of Classification, 36(1), 58–72.

Sun, X., Yin, Y., Yang, Q., & Huo, T. (2023). Artificial intelligence in cardiovascular diseases: diagnostic and therapeutic perspectives. European Journal of Medical Research, 28(1).

Wahyuni, S. N., Khanom, N. N., & Astuti, Y. (2023). K-Means Algorithm Analysis for Election Cluster Prediction. International Journal on Informatics Visualization, 7(1), 1–6.

Wikipedia contributors. (2023, diciembre 3). Lisp (programming language). Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Lisp_(programming_language)&oldid=1188060921

Williams, G. (2012). Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery.

Wirth, R., & Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, 1, 29–40.

Zwolenski, M., & Weatherill, L. (2014). The Digital Universe Rich Data and the Increasing Value of the Internet of Things. Australian Journal of Telecommunications and the Digital Economy.


Flag Counter

Próximamente

19 December 2023

Licencia

Creative Commons License

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-CompartirIgual 4.0.

Detalles sobre el formato de publicación disponible: PDF

PDF

doi

10.55813/egaea.fp.2022.153

Detalles sobre el formato de publicación disponible: HTML

HTML

doi

10.55813/egaea.fp.2022.154

Detalles sobre el formato de publicación disponible: Certificado Sello Editorial

Certificado Sello Editorial

doi

10.55813/egaea.fp.2022.155
Themes by Openjournaltheme.com

Como citar