Analisis Kinerja Algoritma C4.5 pada Dataset Titanic yang Tidak Seimbang Menggunakan Gain Ratio
Penelitian
DOI:
https://doi.org/10.31004/jerkin.v4i2.4402Keywords:
C4.5 Algorithm, Classification, Gain Rasio, Imbalanced Data, Titanic DatasetAbstract
This study aims to analyze the performance of the C4.5 algorithm in classifying passenger survival status using the Titanic dataset, which exhibits an imbalanced class distribution. The research employed a quantitative approach consisting of data preprocessing, manual calculation of entropy, information gain, split information, and gain ratio using Microsoft Excel, followed by model implementation using RapidMiner. The dataset contains 800 passenger records with the survived attribute defined as the class label. Manual calculation results indicate that the Gender attribute has the highest information gain value of 0.955, making it the root node of the decision tree, while other attributes such as Pclass, Age Group, and Fare Group contribute very limited information. The experimental results show that the C4.5 model achieves an accuracy of 62.50%; however, all test instances are predicted as non-survived, resulting in 0% precision and recall for the survived class. In addition, the generated decision tree structure is very shallow with no significant branching. These findings demonstrate that class imbalance in the Titanic dataset strongly affects the performance of the C4.5 algorithm, indicating the need for imbalanced data handling techniques to improve classification results.
References
Anggraini, S. (2018). Analisis Data Mining Penjualan Ban Menggunakan Algoritma C4.5. JITEKI (Jurnal Ilmu Teknik Elektro Komputer Dan Informatika).
Azhari, D. Z. (2022). Penerapan Algoritma C4.5 untuk Klasifikasi Tingkat ... Jurnal PDSI / e-Journal PDSI.
BSI, A. (2020). Penerapan Algoritma C4.5 Dalam Menentukan Prediksi ... IJEC (Jurnal BSI).
Dharmawangsa, A. (2024). Performance Evaluation Algoritma C4.5 pada Berbagai Kondisi Dataset. DJTechno.
Halimah, D. (2022). Algoritma C4.5 untuk Menentukan Klasifikasi Tingkat ... Jurnal Nasional (Garuda).
Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques (3rd ed.). Morgan Kaufmann.
Hofmann, M., & Klinkenberg, R. (2016). RapidMiner: Data Mining Use Cases and Business Analytics Applications. CRC Press.
ITN, A. (2020). Penerapan Pohon Keputusan C4.5 untuk ... Jurnal ITN.
Ningse, W. (2022). Klasifikasi Algoritma C4.5 untuk Penentuan ... Jurnal Nasional (Garuda).
Provost, F., & Fawcett, T. (2013). Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. O’Reilly Media.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
Syahputra, M. (2019). Teknik Klasifikasi C4.5 dalam Menentukan Faktor Utama ... Jurnal Nasional.
Syukrilah, R. (2023). Penerapan Algoritma Decision Tree dan SMOTE untuk ... Repository UII / Prosiding.
UNNES, A. (2019). Klasifikasi dengan Pohon Keputusan Berbasis Algoritme C4.5. Jurnal Universitas Negeri Semarang (UNNES).
Wiraraja, A. (2018). Penerapan Data Mining Menggunakan Algoritma Decision Tree ... Jurnal Wiraraja.
Yamantri, A. B. (2024). Penerapan Algoritma C4.5 untuk Prediksi Faktor Risiko ... Jurnal Kesehatan / JKA.
Yunus, M. (2021). Penerapan Metode Data Mining C4.5 untuk Pemilihan Penerima Kartu Indonesia Pintar (KIP). Paradigma.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Kuncoro Singgih Prasojo, Hasbi Firmansyah, Wahyu Asriyani, Ali Sofyan

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.












