In recent days the breast cancer type has been seen as the most common among women. Therefore in viewing all stages of their lives, "10%" has been affected. In relating to the research studies, the survival data rate has been seen over the "88%" of the whole process. For the follow-up process, the early prediction needs to be identified as soon as possible. Therefore the data mining methods can be involved in the reduction of the happened case types. In using o, the informational data sets, various medical researchers have been identified for exploiting patterns and building the relationships between different variables. Through the "data mining techniques,'' the historical data has been recognized in making the connection between the dataset and code. By the analysis of breast cancer, the model estimation, the different sensitivity, accuracy, and specificity have been evaluated with the research study. Using algorithms like ANN, DT or SVM are used in determining breast cancer performing methods.
Basically, breast cancer can be categorized under the malignant type of tumor, which is activating within the breast cells. Therefore a tumor has also the capacity in spreading to the other parts of the body (Osmanovi? et al., 2019). Therefore breast cancer is also becoming the universal disease which mainly occurs in the women's body within the age of "25- 50". All over the world, the cases have been observed as arising. In finding the survival rates of about "90%," patients have survived within the USA. In finding with the analysis, detectors have been identified as the hormonal factor, although their way of life, apart from the different environmental factors that have also been identified in finding the causes of breast cancer. There are "5- 6%" patients who have been found who are linked to the genetic mutation with the ages of their respective family. In finding the related causes, age increment, obesity, "postmenopausal hormonal imbalances" are considered as the other factors in the cause of breast cancer (Zemouri et al. 2018). The most surprising fact is that there is no such prevention treatment for this. The only way has been to lie in the detection of the disease as early as possible. Therefore this also resulted in the cost reduction of the treatment also. In such cases, the symptoms are unusual to see; for that cause, early detection cannot be possible. It is therefore indispensable in employing the "mammograms" and self-breast tests in detecting the early regularities. In developing machine learning techniques, several neural network systems can be introduced. In the case of the identification of the variables 'Neural Network" can be identified with the sequential dimension and with the process of sequence modeling. In contrast with the neural network types, the data point processing can be introduced with the activation of each step. In using the "artificial network system", the logistic regression and decision trees can be introduced with the model prediction (ieee.org, 2020).
Aim and Objectives
The actual number, as well as the sizes of the medical databases, is growing up rapidly. However, most of these data are not evaluated and analyzed to find valuable and hidden knowledge. It can be done through advanced data mining techniques where hidden patterns and relationships can be evaluated. The actual aim of this project is to create models based on the breast cancer dataset, which can be done through proper data mining techniques. The current project is dependent on the proper data mining techniques where it will be capable of developing appropriate predictive models (Shetty, 2020). There are several objectives which will be helpful for meeting the actual aim of these projects. Different objectives are the following:
? To implement the data set on breast cancer into a predictive model.
? To select appropriate data mining techniques for developing predictive models.
? For discovering hidden patterns and relationships in advanced data mining techniques.
For predicting the recurrence rate of breast cancer, it has used the provided dataset. This dataset consists of population characteristics, and it has included ten input variables. In this case, it has collected from the age limits among the women who are diagnosed with breast cancer. The patients with the recurrence can be followed up, and other data can be removed using the data cleansing and data preparation approach. The variables are too vital for sorting out recurrence, and the missing data can be emitted from the dataset. The missing values regarding the continuous variables can be substituted through the "EM method". a. EM Method Though the expectation-maximization or EM method will get an efficient estimation regarding improper data, improper data, or the datasets will be indirect evidence where it has shown the incomplete data (Shukla et al., 2018). During the combination of several assumptions, it is capable of creating the predictive probability distribution about missing data, and it should evaluate the mean value using the statistical analysis. The EM is a general method or algorithm where it displays the interconnection between missing values and unknown parameters (arxiv.org, 2020). b. Data Mining Process Here it has implemented "DT", "SVM" as well as "ANN" algorithms based on machine learning techniques for predicting the recurrence of breast cancer among women. The DT algorithm is totally dependent on the "ID3 algorithm",, which will also be applied. There are different tree nodes which will be either decision nodes, or it can be leaf nodes in this regard. There are very decision nodes consisting of splits, evaluation and outcomes for several functions regarding the attributes given in the dataset. "Google Collab" will be implemented for analyzing the data. It is an open-source tool regarding the data mining process, and it is capable of providing a lot of algorithms. It will be capable of providing data classification as well as regression, along with the data visualization process.
On the other hand, "Support Vector Machine or SVM",, which will be implemented for classifying the cases (Wang et al., 2019). It can be implemented for pattern recognition within the cancer diagnosis process. Figure 1: “MLP Neural Network” (Source: https://d1wqtxts1xzle7.cloudfront.net/35844306/using-three-machine-learning-techniques-for-predicting-breast-cancer-2157-7420.1000124.pdf) The maximum margin can be rooted in the proper statistical theory, where both linear and nonlinear data can be classified. The "multilayer perceptron" or "MLP" will be able to map a set of input data for generating the output data. Result and Discussion In developing the risk factors, the breast cancer techniques, along with the relevant results, have been predicted through the data mining techniques. Therefore the relevant are also developed in finding with the sources and limitations. For the specific types, the sensitivity, accuracy and specificity have been developed by comparing the data mining techniques; in considering the parameters like MLP and Decision tree, all of the parameters have been considered with the "breast cancer" recurrence factors. In finding the different ANNs tools and the relevant techniques, the mining methodology and Google Collab toolkit are also used. Therefore the different comparison has also been accounted with the existing and the relevant methodologies. Figure 2: Libraries Code (Source: Google Collab) The methodological approach also has some limitations as well as strengths which are generally based on different types of applications. From the results, it will be capable of getting the result accuracy, sensitivity and specific comparison among the data mining techniques. Figure 3: Importing Data Code (Source: Google Collab) The outcomes will be dependent on the SVM outperforms related to the MLP in all of the parameters regarding the sensitivity and accuracy. SVM is the effective predictor’s of recurrence in breast cancer. Figure 4: Data Processing Code (Source: Google Collab) There is an existence of several limitations to this project. Different cases in the prediction model can be lost, and the records about the missing values will be eliminated properly (Ferreira et al., 2018). Figure 5: Training and Predicting Code for Model (Source: Google Collab) Different variables will exist, like "S-phase fraction" as well, as an index of DNBA will not be applied in this project as there is unavailing of the DNA Index. Figure 6: Predictive Regression Model (Source: Google Collab) It is significantly responsible for the performance of the model, and the unavailability will be the actual reason behind the poor performance of the predictive model. The other limitation is regarding the missing value in the given dataset. Figure 7: Splitting Data Code (Source: Google Collab) The obtained results are based on the provided dataset regarding breast cancer, where the comparison can be performed for different types of the data mining process as well as the Google Collab act like a data mining toolkit (arxiv.org, 2021).
For this relevant research study, the proposed method has been developed in compiling the relevant research study. Therefore different supporting values have been evaluated with the development of the "deep neural network". In meeting the efficiency of better performance, quality analysis has also been evaluated. As of better application, the automated diagnosis can also be introduced into breast cancer activity. In finding the techniques, machine learning algorithms have been employed for the identification and analysis of breast cancer. In using developed and artificial techniques, the process of detection can be easily accessible. Different types of algorithms can be introduced in viewing the different dataset related to breast cancer. Therefore from the best cancer re-sampling method, the data filtering process can be introduced with the preprocessing phases. These phases also have been seen as developing the related improvement of the classifier's performance. From the research study, this can also be seen that breast cancer has been recognized as one of the significant causes of women's death throughout the world.
Ferreira, C.A., Melo, T., Sousa, P., Meyer, M.I., Shakibapour, E., Costa, P. and Campilho, A., 2018, June. Classification of breast cancer histology images through transfer learning using a pre-trained inception resnet v2. In International Conference Image Analysis and Recognition (pp. 763-770). Springer, Cham. Osmanovi?, A., Halilovi?, S., Ilah, L.A., Fojnica, A. and Gromili?, Z., 2019. Machine learning techniques for classification of breast cancer. In World Congress on Medical Physics and Biomedical Engineering 2018 (pp. 197-200). Springer, Singapore. Shetty, S., 2020. Breast Cancer Analysis and Prognosis Using Machine Learning (Doctoral dissertation, Dublin, National College of Ireland). Shukla, N., Hagenbuchner, M., Win, K.T. and Yang, J., 2018. Breast cancer data analysis for survivability studies and prediction. Computer methods and programs in biomedicine, 155, pp.199-208. Wang, S., Zhou, Y., Tian, Y. and Takagi, T., 2019. Using stacking ensemble to Predict Survival in Breast Cancer based on microarray dataset. In Proceedings of the Symposium on Chemoinformatics 42th Symposium on Chemoinformatics, Tokyo (p. 1P06). Division of Chemical Information and Computer Sciences The Chemical Society of Japan. Zemouri, R., Omri, N., Devalland, C., Arnould, L., Morello, B., Zerhouni, N. and Fnaiech, F., 2018, March. Breast cancer diagnosis based on joint variable selection and constructive deep neural network. In 2018 IEEE 4th Middle East Conference on Biomedical Engineering (MECBME) (pp. 159-164). IEEE. Online Articles arxiv.org, 2021, OmiEmbed: reconstruct comprehensive phenotypic information from multi-omics data using multi-task deep learning, Available at: https://arxiv.org/pdf/2102.02669.pdf [Accessed on 14.03.2021] ieee.org, 2020, Extending the Tsetlin Machine With Integer-Weighted Clauses for Increased Interpretability, Available at: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9316190 [Accessed on 14.03.2021] arxiv.org, 2020, 2D Convolutional Neural Networks for 3D Digital Breast Tomosynthesis Classification, Available at: https://arxiv.org/pdf/2002.12314.pdf [Accessed on 14.03.2021]