Головна
Application of Soft Computing, Machine Learning, Deep Learning and Optimizations in Geoengineering...

# Application of Soft Computing, Machine Learning, Deep Learning and Optimizations in Geoengineering and Geoscience

*Zhang, Wengang, Zhang, Yanmei, Gu, Xin, Wu, Chongzhi, Han, Liang*

0 /
0

Наскільки Вам сподобалась ця книга?

Яка якість завантаженого файлу?

Скачайте книгу, щоб оцінити її якість

Яка якість скачаних файлів?

This book summarizes the application of soft computing techniques, machine learning approaches, deep learning algorithms and optimization techniques in geoengineering including tunnelling, excavation, pipelines, etc. and geoscience including the geohazards, rock and soil properties, etc. The book features state-of-the-art studies on use of SC,ML,DL and optimizations in Geoengineering and Geoscience. Considering these points and understanding, this book will be compiled with highly focussed chapters that will discuss the application of SC,ML,DL and optimizations in Geoengineering and Geoscience. Target audience: (1) Students of UG, PG, and Research Scholars: Several applications of SC,ML,DL and optimizations in Geoengineering and Geoscience can help students to enhance their knowledge in this domain. (2) Industry Personnel and Practitioner: Practitioners from different fields can be able to implement standard and advanced SC,ML,DL and optimizations for solving critical problems of civil engineering.

Категорії:

Рік:

2021

Видання:

1

Видавництво:

Springer

Мова:

english

Сторінки:

152

ISBN 10:

9811668345

ISBN 13:

9789811668340

Файл:

PDF, 5.02 MB

Ваші теги:

Повідомити про проблему
This book has a different problem? Report it to us

Виберіть, якщо так
Виберіть, якщо так
Виберіть, якщо так
Виберіть, якщо так

вам вдалось відкрити файл

файл містить книгу (комікси тоже допустимі)

зміст книги є прийнятним

Назва, Автор та Мова файлу відповідають опису книги. Ігноруйте інші поля, так як вони є другорядними!

Виберіть, якщо ні
Виберіть, якщо ні
Виберіть, якщо ні
Виберіть, якщо ні

- файл пошкоджений
- файл захищений DRM
- файл не є книгою (наприклад xls, html, xml)

- файл є статтею
- файл є уривком із книги
- файл є журналом
- файл є текстовим бланком
- файл є спамом

ви рахуєте, що зміст книги неприйнятний і повинен бути заблокований

Назва, Автор та Мова файлу не співпадають з описом книги. Ігноруйте інші поля.

Change your answer

Thanks for your participation!

Together we will make our library even better

Together we will make our library even better

Напротязі 1-5 хвилин файл буде доставлено на ваш email.

Напротязі 1-5 хвилин файл буде доставлено на ваш kindle.

**Примітки**: вам необхідно верифікувати кожну книгу, яку Ви надсилаєте на Kindle. Перевірте Вашу електронну скриньку на наявність листів з підтвердженням від Amazon Kindle Support.Виконується конвертація в

Конвертація в не вдалась

## Можливо вас зацікавить Powered by Rec2Me

## Ключові фрази

https

^{205}algorithm

^{173}optimization

^{163}prediction

^{148}models

^{134}displacement

^{115}landslide

^{108}fig

^{105}algorithms

^{100}methods

^{95}analysis

^{79}network

^{78}neural

^{77}zhang

^{74}parameters

^{74}applications

^{66}lstm

^{64}soft computing

^{64}regression

^{58}variables

^{58}input

^{55}application

^{54}engineering

^{53}svm

^{51}layer

^{50}visualization

^{50}mars

^{50}reservoir

^{50}eng

^{48}gwo

^{48}accuracy

^{46}tbm

^{46}parameter

^{45}values

^{44}fuzzy

^{44}geoengineering

^{41}landslides

^{41}bayesian

^{41}classification

^{40}displacements

^{40}networks

^{40}hyper

^{40}prophet

^{39}artificial

^{39}optimal

^{39}xgboost

^{38}trend

^{37}linear

^{37}evolutionary

^{37}predicted

^{36}correlation

^{36}sci

^{36}neural network

^{36}predict

^{36}woa

^{34}dataset

^{33}wang

^{33}comput

^{33}genetic

^{32}## Пов’язані вибірки

0 comments

Wengang Zhang · Yanmei Zhang · Xin Gu · Chongzhi Wu · Liang Han Application of Soft Computing, Machine Learning, Deep Learning and Optimizations in Geoengineering and Geoscience Application of Soft Computing, Machine Learning, Deep Learning and Optimizations in Geoengineering and Geoscience Wengang Zhang · Yanmei Zhang · Xin Gu · Chongzhi Wu · Liang Han Application of Soft Computing, Machine Learning, Deep Learning and Optimizations in Geoengineering and Geoscience Wengang Zhang School of Civil Engineering Chongqing University Chongqing, China Yanmei Zhang College of Aerospace Engineering Chongqing University Chongqing, China Xin Gu School of Civil Engineering Chongqing University Chongqing, China Chongzhi Wu School of Civil Engineering Chongqing University Chongqing, China Liang Han School of Civil Engineering Chongqing University Chongqing, China ISBN 978-981-16-6834-0 ISBN 978-981-16-6835-7 (eBook) https://doi.org/10.1007/978-981-16-6835-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the pu; blisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Preface The so-called Fourth Paradigm has been boomingly developed during the past two decades, in which large quantities of observational data are available to scientists and engineers. Big data is characterized by the rule of the five Vs: Volume, Variety, Value, Velocity and Veracity. The concept of big data naturally matches well with the features of geoengineering and geoscience. Large-scale, comprehensive, multidirectional and multifield geotechnical data analysis is becoming a trend. On the other hand, soft computing (SC), machine learning (ML), deep learning (DL) and optimization algorithm (OA) provide the ability to learn from data and deliver in-depth insight into geotechnical problems. Researchers use different SC, ML, DL and OA models to solve various problems associated with geoengineering and geoscience. Consequently, there is a need to extend its research with big data research through integrating the use of SC, ML, DL and OA techniques. This book focuses on the state of the art and application of SC, ML, DL and OA algorithms in geoengineering and geoscience. Various SC, ML, DL and OA approaches are firstly concisely introduced, concerning mainly the easy-to-interpret multivariate adaptive regression splines (MARS) model, supervised learning, unsupervised learning, deep learning and optimization algorithms. Then their representative applications in the geoengineering and geoscience are summarized via VOSviewer demonstration. The authors also provided their own thoughts learnt from these applications as well as work ongoing and future recommendations. This book aims to make a comprehensive summary and provide fundamental guidelines for researchers and engineers in discipline of geoengineering and geoscience or similar research areas on how to integrate and apply SC, ML, DL and OA methods. Chongqing, China Wengang Zhang Yanmei Zhang Xin Gu Chongzhi Wu Liang Han v Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 4 2 Soft Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 What is Soft Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Components of Soft Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Heuristics and Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Hybrid Metaheuristics in Soft Computing . . . . . . . . . . . . . . . . . . . . . 2.5 A Data-Driven Nonparametric Explainable MARS Model . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 9 11 12 14 19 3 Machine Learning and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 ELM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 DT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 LR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.5 GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.6 NB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.7 SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.8 KNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 K-means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 21 21 22 24 25 27 30 31 33 34 35 35 38 38 4 Deep Learning and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 AE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 DBN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 RNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 41 42 42 42 vii viii Contents 4.5 LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 45 5 Optimization Algorithms and Applications . . . . . . . . . . . . . . . . . . . . . . 5.1 Precise Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Evolutionary Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 GA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 BO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 PSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.5 ABC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.6 ACO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.7 CS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.8 FA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.9 GWO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.10 WOA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.11 SFLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.12 CSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 48 48 48 48 50 52 53 53 54 54 56 59 59 60 67 6 Application of LSTM and Prophet Algorithm in Slope Displacement Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Methodology: Prophet Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 The Prediction Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Evaluation of Modeling Performance . . . . . . . . . . . . . . . . . 6.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Analysis of the Monitoring Data . . . . . . . . . . . . . . . . . . . . . 6.4 Data Processing and Results Analysis . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Displacement Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Prediction of Trend Displacement . . . . . . . . . . . . . . . . . . . . 6.4.3 Prediction of Periodic Displacement . . . . . . . . . . . . . . . . . . 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Summary of this Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 73 75 76 77 78 79 79 80 82 82 83 84 88 90 91 Prediction of Undrained Shear Strength Using XGBoost and RF Based on BO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Bayesian Hyper-Parameter Optimization . . . . . . . . . . . . . . 7.2.2 Sequential Model-Based Optimization (SMBO) . . . . . . . . 7.3 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 93 95 95 96 98 99 7 Contents 8 9 ix 7.3.2 Removal of Outliers for USS . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Implementation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 K-fold CV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Comparison Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Calculation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Predictive Comparisons Among Different Models . . . . . . 7.5.2 Fitting Performance of XGBoost and RF . . . . . . . . . . . . . . 7.5.3 Features Importance Analysis . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 101 101 101 102 102 103 104 105 106 107 Prediction for TBM Penetration Rate Using Four Hyperparameter Optimization Methods and RF Model . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Engineering Background and Data Introduction . . . . . . . . . . . . . . . . 8.2.1 Project Description and Geological Survey . . . . . . . . . . . . 8.2.2 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Prediction and Sensitivity Analysis of Driving Speed . . . . . . . . . . . 8.3.1 Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 111 113 113 114 116 116 118 120 120 121 What We Have Learnt from the Applications . . . . . . . . . . . . . . . . . . . . 125 10 Work Ongoing and Future Recommendations . . . . . . . . . . . . . . . . . . . 133 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 About the Authors Dr. Wengang Zhang is currently a full professor in School of Civil Engineering, Chongqing University, China. His research interests focus on impact assessment on the built environment induced by underground construction, as well as big data and machine learning in geotechnics and geoengineering. He is now the member of the ISSMGE TC304 (Reliability), TC309 (Machine Learning) and TC219 (System Performance of Geotechnical Structures). He has been selected as the World’s Top 2% Scientists 2020. Dr. Yanmei Zhang is currently an associate professor in College of Aerospace Engineering, Chongqing University, China. Her research interests focus on fatigue and failure assessment of engineering structures, fracture analyses of polymeric composites and applications of machine learning technique in geoengineering. She is the member of Chinese Society of Theoretical and Applied Mechanics. xi xii About the Authors Xin Gu is a Ph.D. candidate in the School of Civil Engineering, Chongqing University. He majors in geotechnical engineering, and his research interests involve random finite element analysis, slope stability analysis, spatial variability, Bayesian inference and machine learning. Chongzhi Wu is currently Ph.D. student of Chongqing University, China. His research interest involves the application of machine learning methods in geotechnical engineering, and he is interested in time series analysis. He has experience on feature engineering, data mining and data visualization and has published several research papers about the application of ML methods in geotechnical engineering. Liang Han is presently a Ph.D. candidate of Chongqing University, China. His research interest involves the statistical characterization of geomaterial parameters, site similarity identification and seismic performance evaluation of underground structures. He has experience on data analysis using Bayesian framework and machine learning. He has published several academic papers about the data analysis of geomaterial parameters and seismic performance of utility tunnels. Abbreviations ABC ACO AE AI ANFIS ANN BC BF BM BO CG CNN CS CSO DBN DE DL DT ELM FA GA GEP GP GRNN GWO INSAR KNN LiDAR LR LSTM M5 Artificial bee colony Ant colony optimization Automatic encoder Artificial intelligence Adaptive neuro-fuzzy inference system Artificial neural network Bayesian classifier Bacterial foraging Boosting methods Bayesian optimization Conjugate gradient Convolutional neural network Cuckoo search Cat swarm optimization Deep belief network Differential evolution Deep learning Decision tree Extreme learning machine Firefly algorithm Genetic algorithm Gene expression programming Genetic programing Generalized regression neural network Gray wolf optimizer Interferometric synthetic aperture radar K-nearest neighbor Light detection and ranging Linear regression Long short-term memory M5 model tree xiii xiv MA MARS ML MLP NB OA PCA PSO QGWO RBF RBM RBMs RF RNN SAR SC SFL SFLA SGD SVD SVM SVR TBM TSP WOA XGB XGboost Abbreviations Memetic algorithm Multivariate adaptive regression splines Machine learning Multilayer perceptron Naive Bayes Optimization algorithm Principal component analysis Particle swarm optimization Quantum gray wolf optimizer Radial basis functions Restricted Boltzmann machine Restricted Boltzmann machines Random forest Recurrent neural network Synthetic aperture radar Soft computing Sugeno fuzzy logic Shuffled frog leaping algorithm Stochastic gradient descent Singular value decomposition Support vector machine Support vector regression Tunnel boring machine Traveling salesman problem Whale optimization algorithm Extreme gradient boosting Extreme gradient boosting Chapter 1 Introduction Before introduction of more commonly used Artificial intelligence (AI), Machine learning (ML), Deep learning (DL) and Optimization algorithm (OA) technical expressions, the definition of Soft computing (SC) should be firstly mentioned since the former three terms are more relevant with each other. SC is the use of approximate calculations to provide imprecise but usable solutions to complex computational problems. The approach enables solutions for problems that may be either unsolvable or just rather time-consuming to solve with current hardware. SC is sometimes referred to as computational intelligence, for comparison with the hard computing. It provides an approach to problem-solving using means other than computers. With the human mind as a role model, SC is tolerant of partial truths, uncertainty, imprecision and approximation, unlike traditional computing models. The tolerance of SC allows researchers to approach some problems that traditional computing can’t process. With the rapid development of computer performance, AI has gradually become a significantly vital driver for the fourth industrial revolution. The conception of AI was born at the Dartmouth conference in 1956 by several computational scientists (e.g., John McCarthy, Marvin Minsky, Claude Shannon) and their original idea was to discuss the usage of machines to mimic human learning and some other aspects of intelligence. Since 2012, the prompt increase of data volume induced by superior monitoring technology or devices (Dikshit et al. 2020; Gupta et al. 2020; Meng et al. 2021; Reichstein et al. 2019), the improvement of computing power and the emergence of a new ML algorithm, the DL have constantly contributed to the hot popularity of AI. AI has widely covered many fields, such as expert systems, machine learning, evolutionary computing, computer vision, natural language processing, etc. Although three commonly-seen terms “AI, ML and DL” have connections with each other, yet they are not the same indeed, which can be seen from Fig. 1.1 with highlighting the difference(Dikshit et al. 2020; Reichstein et al. 2019), also shown in Fig. 1.1 is SC. Generally speaking for ML, it can be considered as a method to realize AI, capable of capturing the internal patterns from data and then provide a rational decision as a guidance. Inspired by the way that human brains process information, © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 W. Zhang et al., Application of Soft Computing, Machine Learning, Deep Learning and Optimizations in Geoengineering and Geoscience, https://doi.org/10.1007/978-981-16-6835-7_1 1 2 1 Introduction (a) Relationship between SC, AI, ML and DL (b) Relation of inclusion for AI, ML and DL Fig. 1.1 Different aspects of AI, ML, DL and SC 1 Introduction 3 Fig. 1.2 Basic components of OA algorithms DL is proposed as a main branch of ML, which always use more complex multilayer neural network architectures (LeCun et al. 2015). Compared with other ML methods, DL requires less human guidance, while requires enormous amounts of data to explore complex, various, and inherent relationships hidden in data. OA denotes the optimization algorithms. The essence of most ML algorithms is to build a surrogate model and then optimize the objective function (or loss function) through OA, to obtain the optimal model with the best performance. Figure 1.2 plots the basic components of OA algorithms. Currently, more and more attention has been focused on geohazards or the estimation of the safety of geotechnical structures considering that geohazards or failure of a structure would pose a great threat to the lives and property of the humans, such as Nicoll Highway Collapse, Singapore (Zhang et al. 2018); rockburst in Jinchuan 2nd Mine(Yi et al. 2010), Shenzhen landslide (Luo et al. 2019; Zhan et al. 2021); Bazimen landslide (Zhang et al. 2020b); Wenjia gully debris flow (Liu et al. 2021). To prevent or mitigate the disasters and ensure the geotechnical structures with sufficient safety, accurate predictions for occurrence of geohazards and reliability analysis for geotechnical structures are urgently needed. In recent years, with the rapid development of SC, AI, ML, DL, and OA techniques, these methods are gradually widely adopted to conduct the assessment for the occurrence probability of geohazards and the destroy or deformation of geotechnical structures with enough accuracy and efficiency(Darabi et al. 2012; Li et al. 2014; Liang et al. 2012; Luo et al. 2019; Mohammadi et al. 2015; Neaupane and Adhikari 2006; Shi et al. 2019; Solomatine and Xue 2004; Zhan et al. 2021). Although there have been some high-level overviews of ML for the prediction of some specific areas in geotechnical engineering such as slopes, tunnels, deep braced excavations (Sheil et al. 2020; Shreyas and De 2019; Zhang et al. 2020a), yet a more comprehensive investigation of recent research progress and applications of SC, ML, DL and OA in geotechnical engineering and geoscience is quite limited. Therefore, this 4 1 Introduction book first gives a brief introduction of typical SC, ML, DL and OA mainly via VOSviewer, a software tool for constructing and visualizing bibliometric networks, then some representative applications of these algorithms in geotechnical engineering and geoscience are systematically summarized. Furthermore, reflections from these applications, ongoing research and the future potentials as well as recommendations are also pointed out. This book aims to provide fundamental guidelines for researchers and engineers on how to integrate and apply SC, ML, DL, and OA techniques in the domain of geoengineering and geoscience. References Darabi A, Ahangari K, Noorzad A, Arab A (2012) Subsidence estimation utilizing various approaches - A case study: Tehran No. 3 subway line. Tunn Undergr Sp Technol 31:117–127. https://doi.org/10.1016/j.tust.2012.04.012 Dikshit A, Pradhan B, Alamri AM (2020) Pathways and challenges of the application of artificial intelligence to geohazards modelling. Gondwana Res Gupta R, Tanwar S, Tyagi S, Kumar N (2020) Machine learning models for secure data analytics: A taxonomy and threat model. Comput Commun 153:406–440 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/ nature14539 Li Z, Zhang B, Wang Y et al (2014) Water pipe condition assessment: A hierarchical beta process approach for sparse incident data. Mach Learn 95:11–26. https://doi.org/10.1007/s10994-0135386-z Liang WJ, Zhuang DF, Jiang D, et al (2012) Assessment of debris flow hazards using a Bayesian Network. Geomorphology 171–172:94–100. https://doi.org/10.1016/j.geomorph.2012.05.008 Liu C, Yu Z, Zhao S (2021) A coupled SPH-DEM-FEM model for fluid-particle-structure interaction and a case study of Wenjia gully debris flow impact estimation. Landslides 18:2403–2425. https:// doi.org/10.1007/s10346-021-01640-6 Luo HY, Shen P, Zhang LM (2019) How does a cluster of buildings affect landslide mobility: a case study of the Shenzhen landslide. Landslides 16:2421–2431. https://doi.org/10.1007/s10346-01901239-y Meng W, Li W, Zhou J (2021) Enhancing the security of blockchain-based software defined networking through trust-based traffic fusion and filtration. Inf Fusion 70:60–71. https://doi. org/10.1016/j.inffus.2020.12.006 Mohammadi SD, Naseri F, Alipoor S (2015) Development of artificial neural networks and multiple regression models for the NATM tunnelling-induced settlement in Niayesh subway tunnel, Tehran. Bull Eng Geol Environ 74:827–843. https://doi.org/10.1007/s10064-014-0660-2 Neaupane KM, Adhikari NR (2006) Prediction of tunneling-induced ground movement with the multi-layer perceptron. Tunn Undergr Sp Technol 21:151–159. https://doi.org/10.1016/j.tust. 2005.07.001 Reichstein M, Camps-Valls G, Stevens B et al (2019) Deep learning and process understanding for data-driven Earth system science. Nature 566:195–204. https://doi.org/10.1038/s41586-0190912-1 Sheil BB, Suryasentana SK, Mooney MA, Zhu H (2020) Machine learning to inform tunnelling operations: Recent advances and future trends. Proc Inst Civ Eng—Smart Infrastruct Constr 1–18. https://doi.org/10.1680/jsmic.20.00011 Shi S, Zhao R, Li S et al (2019) Intelligent prediction of surrounding rock deformation of shallow buried highway tunnel and its engineering application. Tunn Undergr Sp Technol 90:1–11. https:// doi.org/10.1016/j.tust.2019.04.013 References 5 Shreyas SK, Dey A (2019) Application of soft computing techniques in tunnelling and underground excavations: state of the art and future prospects. Innov Infrastruct Solut 4:1–15. https://doi.org/ 10.1007/s41062-019-0234-z Solomatine DP, Xue Y (2004) M5 Model trees and neural networks: Application to flood forecasting in the upper reach of the Huai River in China. J Hydrol Eng 9:491–501. https://doi.org/10.1061/ (asce)1084-0699(2004)9:6(491) Yi YL, Cao P, Pu CZ (2010) Multi-factorial comprehensive estimation for jinchans deep typical rockburst tendency. Sci. Technol. Rev. 28:76–80 Zhan LT, Guo XG, Sun QQ, et al (2021) The 2015 Shenzhen catastrophic landslide in a construction waste dump: analyses of undrained strength and slope stability. Acta Geotech 16:1247–1263. https://doi.org/10.1007/s11440-020-01083-8 Zhang W, Han L, Gu X, et al (2020a) Tunneling and deep excavations in spatially variable soil and rock masses: A short review. Undergr Sp Zhang W, Tang L, Li H et al (2020) Probabilistic stability analysis of Bazimen landslide with monitored rainfall data and water level fluctuations in Three Gorges Reservoir, China. Front Struct Civ Eng 14:1247–1261. https://doi.org/10.1007/s11709-020-0655-y Zhang W, Zhang R, Fu Y, et al (2018) 2D and 3D numerical analysis on strut responses due to one-strut failure. Geomech Eng 15:965–972. https://doi.org/10.12989/gae.2018.15.4.965 Chapter 2 Soft Computing As indicated in the introduction part and the Fig. 2.1 plot, definition of SC has some overlaps with ML, DL, and OA. It uses component fields of study in: Fuzzy logic, Machine learning, Probabilistic reasoning, Evolutionary computation, Perceptron, Genetic algorithms, Differential algorithms, Support vector machines, Metaheuristics, Swarm intelligence, Ant colony optimization, Particle optimization, Bayesian networks, Artificial neural networks, Expert systems, etc. As a field of mathematical and computer study, SC has been around since the 1990s. The inspiration was the human mind’s ability to form real-world solutions to problems through approximation. It contrasts with possibility, an approach that is used when there is not enough information available to solve a problem. In contrast, SC is used where the problem is not adequately specified for the use of conventional math and computer techniques. It has numerous real-world applications in domestic, commercial and industrial situations. 2.1 What is Soft Computing Prior to 1994 when Zadeh (1994) firstly defined “soft computing “, the currentlyhandled concepts used to be referred to in an isolated way, whereby each was spoken of individually with an indication of the use of fuzzy methodologies. Although the idea of establishing the area of soft computing dates back to 1990 (Zadeh 2001), it was in Zadeh (1994) that Zadeh established the definition of soft computing in the following terms: “Basically, soft computing is not a homogeneous body of concepts and techniques. Rather, it is a partnership of distinct methods that in one way or another conform to its guiding principle. At this juncture, the dominant aim of soft computing is to exploit the tolerance for imprecision and uncertainty to achieve tractability, robustness and low solutions cost. The principal constituents of soft computing are fuzzy logic, neurocomputing, and probabilistic reasoning, with the latter subsuming genetic © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 W. Zhang et al., Application of Soft Computing, Machine Learning, Deep Learning and Optimizations in Geoengineering and Geoscience, https://doi.org/10.1007/978-981-16-6835-7_2 7 8 2 Soft Computing Fig. 2.1 VOSviewer plots for main SC methods: a network visualization b density visualization algorithms, belief networks, chaotic systems, and parts of learning theory. In the partnership of fuzzy logic, neurocomputing, and probabilistic reasoning, fuzzy logic is mainly concerned with imprecision and approximate reasoning; neurocomputing with learning and curve-fitting; and probabilistic reasoning with uncertainty and belief propagation”. 2.1 What is Soft Computing 9 Therefore, it is clear that rather than a precise definition for soft computing, it is instead defined by extension, by means of different concepts and techniques which attempt to overcome the difficulties which arise in real problems which occur in a world which is imprecise, uncertain and difficult to categorize. There have been various subsequent attempts to further hone this definition, with differing results, and among the possible alternative definitions, perhaps the most suitable is the one presented in Li et al. (1998): “Every computing process that purposely includes imprecision into the calculation on one or more levels and allows this imprecision either to change (decrease) the granularity of the problem, or to “soften” the goal of optimalisation at some stage, is defined as to belonging to the field of soft computing”. Soft computing is considered to be the antithesis of what we might call hard computing. It could therefore be seen as a series of techniques and methods so that real practical situations could be dealt with in the same way as humans deal with them, i.e. on the basis of intelligence, common sense, consideration of analogies, approaches, etc. In this sense, soft computing is a family of problem-resolution methods headed by approximate reasoning and functional as well as optimization approximation methods, including search methods. Soft computing is therefore the theoretical basis for the area of intelligent systems and it is evident that the difference between the area of artificial intelligence and that of intelligent systems is that the first is based on hard computing and the second on soft computing. 2.2 Components of Soft Computing From this other viewpoint on a second level, soft computing can be then expanded into other components which contribute to a definition by extension, such as the one given in the last section. From the beginning (Bonissone 2002), the components considered to be the most important in this second level are probabilistic reasoning, fuzzy logic and fuzzy sets, neural networks, and genetic algorithms (GA), which because of their interdisciplinary, applications and results immediately stood out over other methodologies such as the previously mentioned chaos theory, evidence theory, etc. The popularity of GA, together with their proven efficiency in a wide variety of areas and applications, their attempt to imitate natural creatures (e.g. plants, animals, humans) which are clearly soft (i.e. flexible, adaptable, creative, intelligent, etc.), and especially the extensions and different versions, transform this fourth second-level ingredient into the well-known evolutionary algorithms (EA) which consequently comprise the fourth fundamental component of soft computing, as shown in the following diagram: 10 (1) (2) (3) (4) 2 Soft Computing From the first level and beginning with approximate reasoning methods, when we only concentrate on probabilistic models, we encounter the DempsterShafer theory and Bayesian networks. However, when we consider probabilistic methods combined with fuzzy logic, and even with some other multivalued logics, we encounter what we could call hybrid probabilistic models, fundamentally probability theory models for fuzzy events, fuzzy event belief models, and fuzzy influence diagrams. When we look at the developments directly associated with fuzzy logic, fuzzy systems and in particular fuzzy controllers stand out. Then, arising from the combination of fuzzy logic with neural networks and EA are fuzzy logic-based hybrid systems, the foremost exponents of which are fuzzy neural systems, controllers adjusted by neural networks (neural fuzzy systems which differ from the previously mentioned fuzzy neural systems), and fuzzy logic-based controllers which are created and adjusted with EA. Moving through the first level to the other large area covered by soft computing (functional approach/optimization methods) the first component which appears is that of neural networks and their different models. Arising from the interaction with fuzzy logic methodologies and EA methodologies are hybrid neural systems, and in particular fuzzy control of network parameters, and the formal generation and weight generation in neural networks. The fourth typical component of soft computing and perhaps the newest yet possibly most up-to-date is that of EA, and associated with these are four large, important areas: evolutionary strategies, evolutionary programming, GA, and genetic programming. If we were only to focus on these last areas, we could consider that in this case the amalgam of methodologies and techniques associated with soft computing culminate in three important lines: fuzzy genetic systems, bioinspired systems, and applications for the fuzzy control of evolutionary parameters. On further examination of this last component some additional considerations are needed. Firstly, independently of the broad-minded approach adopted to contemplate what can be embraced by fuzzy genetic systems, bioinspired systems, and fuzzy control applications on evolutionary parameters, other important topics are missing from this description. Secondly, if we are referring in particular to bioinspired systems, it is clear that not only are they the product of fuzzy logic, neural networks or EA (with all the variants that we can consider for these three components) but also that other extremely important methodologies are involved in them. Figure 2.1 provides the VOSviewer plots for main SC methods. Figure 2.1a is the network visualization, in which the keyword co-occurrence network of SC methods was constructed by the VOSviewer software, which is a software tool for constructing and visualizing bibliometric networks. These networks may for instance include journals, researchers, or individual publications, and they can be constructed based on citation, bibliographic coupling, co-citation, or co-authorship relations. It also offers text mining functionality that can be used to construct and visualize co-occurrence networks of important terms extracted from a body of scientific literature (https:// 2.2 Components of Soft Computing 11 www.vosviewer.com/). The size of the nodes and words represents the weights of the nodes. The bigger the node and word are, the larger the weight is. The distance between two nodes reflects the strength of the relation between two nodes. A shorter distance generally reveals a stronger relationship. The line between two keywords represents that they have appeared together. The thicker the line is, the more cooccurrence they have. The nodes with the same color belong to a cluster. VOSviewer divided the keywords of SC-related publications into 6 clusters, i.e., green cluster ball for ANN, red cluster ball for OA, dark blue cluster ball for SVR, purple cluster ball for SVM, yellow cluster ball for CNN and RNN, and light blue cluster ball for k-nearest neighbor. The link strength between two nodes refers to the frequency of co-occurrence. It can be used as a quantitative index to depict the relationship between two nodes. The total link strength of a node is the sum of link strengths of this node over all the other nodes. The node, “ANN”, has thicker lines with “models”, “optimization”, “algorithm”, “machine learning”, “SVM”, “classification”, “rocks”, “earthquakes”, “landslides” and “droughts”. Figure 2.1b is the density visualization. Each node in the keywords density visualization plat has a color that relies on the density of items at that node. In other words, the color of a node depends on the number of items in the neighborhood of the node. The keywords in red color area appear more frequently; on the contrary, the keywords in green color area appear less frequently. Density views are especially useful for understanding the overall structure of a map and drawing attention to the most important areas in the map. From Fig. 2.1b, we can see the research focuses of SC study intuitively. “ANN”, “models”, “algorithms”, “optimization”, “SVM”, “machine learning” turn out to be important. These keywords are the core keywords in the SC study. 2.3 Heuristics and Metaheuristics As stated in Jiménez et al. (2003), since the fuzzy boom of the 1990s, methodologies based on fuzzy sets (i.e. soft computing) have become a permanent part of all areas of research, development and innovation, and their application has been extended to all areas of our daily life: health, banking, home, and are also the object of study on different educational levels. Similarly, there is no doubt that thanks to the technological potential that we currently have, computers can handle problems of tremendous complexity (both in comprehension and dimension) in a wide variety of new fields. As we mentioned above, since the mid 1990s, GA (or EA from a general point of view) have proved to be extremely valuable for finding good solutions to specific problems in these fields, and thanks to their scientific attractiveness, the diversity of their applications and the considerable efficiency of their solutions in intelligent systems, they have been incorporated into the second level of soft computing components. 12 2 Soft Computing EA, however, are merely another class of heuristics, or metaheuristics, in the same way as Tabu Search, Simulated Annealing, Hill Climbing, Variable Neighbourhood Search, Estimation Distribution Algorithms (EDA), Scatter Search, GRASP, Reactive Search and very many others are. Generally speaking, all these heuristic algorithms (metaheuristics) usually provide solutions which are not ideal, but which largely satisfy the decision-maker or the user. When these act on the basis that satisfaction is better than optimization, they perfectly illustrate Zadeh’s famous sentence: “…in contrast to traditional hard computing, soft computing exploits the tolerance for imprecision, uncertainty, and partial truth to achieve tractability, robustness, low solution-cost, and better rapport with reality”. Consequently, among the soft computing components, instead of EA (which can represent only one part of the search and optimization methods used), heuristic algorithms and even metaheuristics should be considered. There is usually controversy about the difference between metaheuristics and heuristics, and while it is not our intention here to enter into this debate, we are interested in offering a brief reflection on both concepts. The term heuristics comes from the Greek word “heuriskein”, the meaning of which is related to the concept of finding something. It is therefore clear that metaheuristics are more broad-brush than heuristics. In the sections which follow, we will focus on the concept of metaheuristics, and will start by pointing out that in the terms that we have defined, certain metaheuristics will always be better than others in terms of their performance when it comes to solving problems. In order to achieve the best performance of the metaheuristics, it is desirable for them to have a series of “good properties” which include simplicity, independence, coherence, effectiveness, efficiency, adaptability, robustness, interactivity, diversity, and autonomy (Melian et al. 2003). In view of their definition and the series of desirable characteristics, it is both logical and obvious that EA are to be found among metaheuristics and they are therefore well placed with the other second-level soft computing components to facilitate the appearance of new theoretical and practical methodologies, outlines, and frameworks for a better understanding and handling of generalized imprecision in the real world. 2.4 Hybrid Metaheuristics in Soft Computing In this section, we will consider the three main previously mentioned groups of metaheuristics. From these, we will then describe the new metaheuristics which have emerged, briefly dwelling on the less developed or less popular ones because they are more recent. 1. Evolutionary Metaheuristics. These metaheuristics are by far the most popular and define mechanisms for developing an evolution in the search space of the sets of solutions in order to come close to the ideal solution with elements which 2.4 Hybrid Metaheuristics in Soft Computing 2. 3. 13 will survive in successive generations of populations. In the context of soft computing, the hybridizations which take these metaheuristics as a reference are fundamental. Relaxation metaheuristics. A real problem may be relaxed when it is simplified by eliminating, weakening or modifying one of its characteristic elements. Relaxation metaheuristics are strategies for relaxing the problem in heuristic design, and which are able to find solutions for problems which would otherwise have been very difficult to solve without the use of this methodology. Examples of these are rounding up or down or adjustments in nature, as occurs when an imprecisely and linguistically-expressed quantity is associated to an exact numerical value. From this point of view, a real alternative is to flexibilize exact algorithms, introducing fuzzy stop criteria, which eventually leads to rule-based relaxation metaheuristics; admitting the vagueness of coefficients, justifying algorithms for resolving problems with fuzzy parameters, and relaxing the verification of restrictions, allowing certain violations in their fulfillment. Search metaheuristics. Generally speaking, these are probably the most important metaheuristics, and their basic operation consists in establishing strategies for exploring the solution space of the problem and iterating the starting-point solutions. Although at first sight they might appear to be similar to evolutionary searches, they are not since evolutionary searches base their operation on the evolution of a population of individuals in the search space. These metaheuristics are usually described by means of various metaphors, which classify them as bioinspired, sociological, based on nature, etc. and this makes them extremely popular. If, however, the search procedure is performed using various metaheuristics, there is always the possibility of cooperation between these (Cruz 2005), and therefore the generalization of everything described so far to the context of parallelism, something which is obviously beyond the sphere of this paper but which it is interesting to reflect on since with the proliferation of parallel computing, more powerful work stations and faster communication networks, parallel implementations of metaheuristics have emerged as something natural and provide an interesting alternative for increasing the speed of the search for solutions. Various strategies have correspondingly been proposed and applied and these have proved to be very efficient for resolving largescale problems and for finding better solutions than those of their sequential counterparts due to the division of the search space, or because they have improved the intensification and diversification of the search. As a result, parallelism (and therefore multiple metaheuristics) not only constitutes a way of reducing the execution times of individual metaheuristics, but also of improving their effectiveness and robustness. 14 2 Soft Computing 2.5 A Data-Driven Nonparametric Explainable MARS Model Multivariate Adaptive Regression Splines (MARS) was first proposed by Friedman (2007) as an adaptive procedure for regression, which works well in high-dimensional analysis. It is a non-parametric statistical method based on a divide and conquer strategy in which the training data sets are partitioned into separate piece-wise linear segments (splines) of differing gradients (slope). In general, the splines are connected smoothly together by weighted summation to handle linear structures, in some special cases, these piece-wise curves, also known as basis functions (BFs) can be involved into multiplications as well to obtain higher-order terms fitting nonlinear behaviors. The form of the piece-wise linear basis functions are shown below: (x − t)+ = And (t − x)+ = x − t, i f x > t, 0, other wise, (2.1) t − x, i f x < t, 0, other wise, where the subscript “+” means positive part. As shown in Fig. 2.2, each one of the basis functions (BFs) (x − 1)+ and (1 − x)+ is piece-wise linear, the interface point between the two pieces is called knot, in this case, the knot value is 1. The two functions together are called as a reflected pair in the discussion below. As the idea is to form reflected pairs for each input Xj with knots at each observed value Xij of that input, the collection of basis functions is Fig. 2.2 The example of basis functions used by MARS 2.5 A Data-Driven Nonparametric Explainable MARS Model C= t∈{x1 j ,x2 j ,x3 j ,...,x N j } Xj − t + , t − Xj + j=1,2,3,4,...,N 15 (2.2) Forward stepwise linear regression is applied in the model-building process, but instead of directly using the original inputs X, the functions from the set C and the their interactions are used. Therefore, the form of the model is. F(X) = β0 + M βm hm (X) (2.3) m=1 where each hm (X) is a function in the set C, or the products of two or more such functions. The model is started with a constant function h 0 = 1. In each step of the forward phase, our goal is to obtain a new basis function pair from all products of a function hm in the current model K with one of the reflected pairs in the set C to produce the largest reduction in the training error. The winning product of each step is added to the model until reaches its preset maximum number of terms. Considering a current model with M basis functions, the next pair to be added to the model is in the form of β M+1 hl (X)∗ Xj − t + + β M+2 hl (X)∗ t − Xj + , hl ∈ K, (2.4) with each β being estimated by minimizing the residual sum-of-squares, that is, by standard linear regression. Figure 2.3 shows an example illustrating how MARS algorithm would make use of basis functions to fit provided data patterns. The MARS mathematical equation is y = −5.0875 − 2.7678h1 (X) + 0.5540h2 (X) + 1.1900h3 (X) Fig. 2.3 Knots and basis functions for a simple MARS example (2.5) 16 2 Soft Computing in which h1 (X) = (x − 17)+ , h2 (X) = (17 − x)+ , h3 (X)(x − 5)+ . The knots are located at x = 5 and 17, which cut the x range into three intervals where different linear relationships are identified. The forward modeling strategy in MARS is hierarchical, as a high-order interaction will only likely exist if some of its lower order “footprints” already exist. For example, a three-way product can only be added into the model when one of its two-way products exists in the model. This strategy helps to avoid the search over an exponentially growing space of alternatives. Generally, the upper limit on the order of interactions is set to be two in the MARS procedure to maintain its high interpretation, but not produce too much negative impact on its adaptiveness on fitting linear structures and non-linear behaviors. When the reflected pair in the set C is multiplied by the constant function h 0 = 1, a linear term is obtained, otherwise the nonlinear structure is formed. Each input is restricted to appear at most once in a term to prevent the sharply increase or decrease near the boundaries of the feature space. There is no need of presetting the underlying functional relationships between dependent and independent variables, instead, MARS allows the model itself to determine the best suited linear and non-linear structures by adding one knot and its corresponding pair of BFs to give the maximum reduction in sum-of-squares residual error in each step. This process ends at the predetermined situation and usually leads to a purposely overfitting model. Therefore, a backward deletion procedure is applied, which removes the term that contribute least on reducing the training error at each step, obtaining an estimated best model fλ of each size λ. Generalized cross-validation is used to optimize λ, the criterion is defined as N GCV(λ) = (yi − fλ (xi ))2 i=1 (1 − M(λ)/N)2 M(λ) = r + cK, , (2.6) (2.7) where M(λ) represents the effective number of parameters in the model, r is the number of terms in the model, K is the number of knots that were selected in the forward process, and c is the number of parameters used for determining one optimal knot value. Some mathematical and simulation results suggest that three parameters should be used for selecting a knot in a piecewise linear regression, so c = 3 in general. But in the case of the model being restricted to be additive, c = 2 is used. Because of its high flexibility, MARS is usually viewed as the generalization of step-wise linear regression. But other than that, it can also be treated as a modification of the CART method. Although they might seem quite different, MARS can actually be turned into CART by applying the changes below. 2.5 A Data-Driven Nonparametric Explainable MARS Model 17 • Rewrite the piecewise linear basis functions (Eq. 2.1) as step functions I (x − t > 0) And I (x − t ≤ 0). (2.8) • Replace the model term by the interaction when it is involved in a multiplication by a candidate term, so it is not available for further interactions. The first change results in a step function multiplied by a pair of reflected step functions in each step, which is equivalent to splitting a node at the step. The second one corresponds to the fact that the same node can only be split once in CART model, which causes CART not have the ability to handle additive structures. By allowing further interactions of a certain model term even when it is involved in a multiplication by a candidate term, MARS forgoes the binary-tree representation of the CART model, thus gains the ability to fit addictive structures in a much more effective way. Similar to CART, MARS not only works very well in regression, but can also handle classification problems efficiently. Generally, for binary classification problems, one can encode the output as 0/1 and then treat it as a regression problem. For the multi-classification problems, the indicator response approach can be used. The K response classes are coded via 0/1 indicator variables, and then a multi-response MARS regression is performed, in which a common set of basis functions are used for all response variables. Finally, the output is the class with the largest predicted value. However, potential masking problems exist in this approach, the better way is to use “optimal scoring” method (Hastie et al. 2017). Specifically, there is a hybrid of MARS called PolyMARS designed by Kooperberg et al. (1997) to handle classification problems. It uses the multiple logistic framework, by which it grows the model in a forward stagewise fashion like MARS, but at each stage, the quadratic approximation of the polynomial log-likelihood is used to search for the next basis function pair. Once found, the enlarged model is fit by maximum likelihood, and the process is repeated until reach preset goal is reached (Hastie et al. 2017). As for the MARS use, Fig. 2.4 provides the VOSviewer plots for network visualization and density visualization, in which the explanation about the network and density visualization is the same. For brevity, these interpretations are ignored. Figure 2.4c provided the overlay visualization The overlay visualization is identical to the network visualization except that items are colored differently via use of the color bar. Herein the color bar in the lower right corner of the visualization represents the years of publication, with correspondence between colors. 18 2 Soft Computing Fig. 2.4 VOSviewer plots for MARS method: a network visualization b density visualization c overlay visualization 2.5 A Data-Driven Nonparametric Explainable MARS Model 19 Fig. 2.4 (continued) References Bonissone P (2002) Hybrid soft computing for classification and prediction applications: Keynote address. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Berlin, Heidelberg, pp 352–353 Cruz CA (2005) Estrategias coordinadas paralelas basadas en soft-computing para la solución de problemas de optimización Friedman JH (2007) Multivariate adaptive regression splines. Ann Stat 19:1–67. https://doi.org/10. 1214/aos/1176347963 Hastie T, Tibshirani R, Tibshirani RJ (2017) Extended comparisons of best subset selection, forward stepwise selection, and the Lasso Jiménez F, Gómez Skarmeta AF, Sánchez G, Cadenas JM (2003) Fuzzy sets based heuristics for optimization: Multi-objective evolutionary fuzzy modeling. In: Verdegay J-L (ed) Springer. Berlin Heidelberg, Berlin, Heidelberg, pp 221–234 Kooperberg C, Bose S, Stone CJ (1997) Polychotomous regression. J Am Stat Assoc 92:117–127. https://doi.org/10.1080/01621459.1997.10473608 Li X, Ruan D, Van Der Wal AJ (1998) Discussion on soft computing at FLINS’96. Int J Intell Syst 13:287–300. https://doi.org/10.1002/(sici)1098-111x(199802/03)13:2/3%3c287::aidint10%3e3.0.co;2-4 Melian B, Moreno-Pérez J, Moreno-Vega J (2003) Metaheuristicas: Una visión global. Intel Artif Rev Iberoam Intel Artif ISSN 1137–3601, null 7, No 19, 2003, pags 7–28 7: Zadeh LA (1994) Soft computing and fuzzy logic. IEEE Softw 11:48–56. https://doi.org/10.1109/ 52.329401 Zadeh LA (2001) Applied soft computing—foreword. Appl Soft Comput 1:1–2. https://doi.org/10. 1016/s1568-4946(01)00003-5 Chapter 3 Machine Learning and Applications 3.1 Supervised Learning Supervised learning, is defined by its use of labeled datasets to train algorithms that to classify data or predict outcomes accurately. As input data is fed into the model, it adjusts its weights until the model has been fitted appropriately. This occurs as part of the cross validation process to ensure that the model avoids overfitting or underfitting. Some methods used in supervised learning include neural networks, naïve bayes, linear regression, logistic regression, random forest, support vector machine (SVM), and more. 3.1.1 ANN Artificial neural network (ANN) is a mathematical model which imitates the structure and function of biological neural networks (Agatonovic-Kustrin and Beresford 2000). It uses a large number of artificial neuron connections to perform calculations. The network is formed by many “neurons” connected to each other. Each neuron represents a specific output function, also named activation function. The connection between the neurons in each layer represents a weighted value of the signal passing through the connection, called weight coefficient, which is equivalent to the memory proportion of ANN. The output of the network is determined by the connection rules of the network, ANN can be understood as an approximation of a certain algorithm or function in nature. Figure 3.1 describes a simple neural network structure that includes an input layer, a hidden layer, and an output layer. The cells in the hidden layer are fully connected to the input layer, while the output layer is fully connected to the hidden layer. If such a network has more than one hidden layer, it can be called a multilayer neural network. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 W. Zhang et al., Application of Soft Computing, Machine Learning, Deep Learning and Optimizations in Geoengineering and Geoscience, https://doi.org/10.1007/978-981-16-6835-7_3 21 22 3 Machine Learning and Applications Fig. 3.1 Architecture of artificial neural network As above, ai(l) represents the i neuron in the l layer. Superscript “in” denotes the input layer, superscript h denotes the hidden layer and superscript “out” represents the output layer. The a0(in) and a0(h) are deviation value, defined as 1, and the input layer is the input value Plus the deviation value. Figure 3.2 provides the VOSviewer plots for some main supervised learning methods, in which the keyword co-occurrence network of ANN was constructed by the VOSviewer software. The size of the nodes and words represents the weights of the nodes. The bigger the node and word are, the larger the weight is. The distance between two nodes reflects the strength of the relation between two nodes. A shorter distance generally reveals a stronger relationship. The line between two keywords represents that they have appeared together. The thicker the line is, the more cooccurrence they have. The nodes with the same color belong to a cluster. VOSviewer divided the keywords of ANN-related publications into 7 clusters. 3.1.2 ELM Extreme learning machine is a new feedforward neural network training method proposed by Huang et al. (2006). This method randomly generates the connection weights of the input layer, the hidden layer and the threshold value of the hidden layer neurons. It only needs to set the number of hidden layer neurons in the training process to obtain the unique optimal solution. Assuming there are N different samples, among which xi = [xi1 , xi2 , · · ·, xin ]T ∈ R n , ti = [ti1 , ti2 , · · ·, tim ]T ∈ R m , the standard single hidden layer feedforward neural network with hidden layer neurons M and activation function g(x) can be expressed as 3.1 Supervised Learning 23 Fig. 3.2 Visualization of ANN algorithms M βi g(wi · xi + bi ) = O j , j = 1, · · ·, N , (3.1) i=1 where wi = [wi1 , wi2 , · · ·, win ]T represents the connection weight between the input layer node and the hidden layer node i;βi = [βi1 , βi2 , · · ·, βim ]T represents the connection weight between the hidden layer node i and the output layer node; bi represents the deviation coefficient of the hidden layer; wi · x j represents the inner T product of wi and x j ; Oi = Oi1 , Oi2 , · · ·, O jn represents the output values. A standard single hidden layer feedforward neural network with hidden layer M and activation function g(x) can approximate the training sample with zero error, when: 24 3 Machine Learning and Applications M O j − t j = 0 (3.2) i=1 Therefore, βi , wi , bi meet the following formula: M βi g(wi · xi + bi ) = yi , j = 1, . . . , N , (3.3) i=1 It can be expressed as Hβ = Y , where H is the output matrix of the neural network. Since the randomly selected input weights and hidden layer bias are fixed, network training is equivalent to solving the least squares solution β of the following equation: minHβ − Y β (3.4) For the minimum value of the least square solution of the above linear system is β = H + Y , where H + is the Moore–Penrose generalized inverse of the hidden layer output matrix H. The learning algorithm of the extreme learning machine mainly has the following three steps. Firstly, randomly set the connection weightw of the input layer with the hidden layer and the bias b of hidden layer neurons. Secondly, calculate the output matrix H of the hidden layer. Lastly, calculate the weight of the output layer, β = H +Y . Figure 3.3 provides the Visualization of ELM algorithms, as well as the applications in geoengineering and geoscience. 3.1.3 DT Decision trees are a way to represent rules underlying data with hierarchical, sequential structures that recursively partition the data (Murthy 1998). Decision trees are a common type of machine learning algorithm, which makes decisions based on a tree structure. A flow chart of establishing the decision tree is shown in Fig. 3.4. S represents the training sample set, A denotes the classification sample set, and N represents a classification leaf node. Decision tree algorithms include Iterative Dichotomiser 3 (ID3), C4.5, Classification And Regression Tree (CART)ANN, Supervised Learning In Quest (SLIQ), etc. (Quinlan 1993; Breiman et al. 2017). Among these algorithms, the well-known and commonly used one is Quinlan’s ID3 algorithm. C4.5 and CART are derived from the ID3 decision tree algorithm. The SLIQ algorithm is further obtained based on the modifications of the C4.5 decision tree classification algorithm. In C4.5 algorithm, the tree is constructed according to the depth-first strategy, while the structure 3.1 Supervised Learning 25 Fig. 3.3 Visualization of ELM algorithms for SLIQ is constructed according to the breadth-first strategy (Niuniu 2010). The generation process of a decision tree has three main parts, namely feature selection, decision tree generation and pruning. The most critical issue is feature selection. Different splitting criteria have significant influence on the generalization error of the decision tree. Figure 3.5 provides the Visualization of DT algorithm. 3.1.4 LR One of the oldest and simplest parametric statistical approaches is linear regression (Rashidi et al. 2019). This technique has been widely used in many studies (Wei et al. 2019; Elmousalami 2020; Ben Chaabene et al. 2020). Linear regression models are to find the best-fitted straight line that is also known as the “least squares regression line” (the best dotted line with the lowest error sum) between the independent variables and the dependent variables (Fig. 3.6). The advantage of this approach is its simplicity and transparency to find the linear relationship. However, this method may not be applicable for some complicated cases, where relationships between the independent variable (the cause or features) and the dependent variables are nonlinear. Figure 3.7 provides the Visualization of LR algorithm. 26 Fig. 3.4 Flow chart of decision tree development Fig. 3.5 Visualization of DT algorithm 3 Machine Learning and Applications 3.1 Supervised Learning 27 Fig. 3.6 Linear regression Fig. 3.7 Visualization of LR algorithm 3.1.5 GP Genetic Programming (GP) is an automated method for creating computer programs starting from a high-level description of the problem to be solved (Oltean et al. 2009). 28 3 Machine Learning and Applications Fig. 3.8 Examples of standard genetic programming coding methods (1) (2) Genetic programming (Wieczorek and Czech 2000): The coding method mainly adopts a tree structure, with special crossover and mutation operators based on the tree structure, which is not only used for problem planning and discrete optimization, but also used for regression analysis and other fields. (Koza 1992; Angeline 1994) developed genetic programming to a great extent, and edited books on GP systematically. GP began to be widely used in automatic programming and numerical and combinatorial optimization problems. Coding and decoding: Genetic programming mainly uses syntax trees for coding instead of traditional linear coding. Figure 3.8 shows the tree coding of the expression (A1 + 10) ÷ (A2 − A3) + (20 − A3) × A5. The expression contains variables A1,A2,A3,A5 and constants 10, 20. These variables and constants are located on the leaves of the tree and are called terminators. The symbols +,−, ×, ÷ are called operators. The terminators and operators together constitute the original set of genetic programming. The advanced form of genetic programming can be constructed in a multicomponent form. In this case, the coding of GP is carried out by connecting multiple subtrees to a root node through specific rules as shown in Fig. 3.9. Each component is a subtree, and each subtree is a complete syntax tree. The form of the solution of each problem can be encoded into a tree structure through grammatical analysis in GP. At the same time, the decoding is deconstructed in accordance with the grammatical rules. (3) Crossover and mutation: The basic operations of genetic programming include duplication, crossover, mutation and so on. Generally, the greater the fitness of an individual is, the greater the probability that it will be. The copy operation is to randomly select an individual based on fitness and copy it to the nextgeneration population. The crossover operation is to generate two offspring Fig. 3.9 Component-based genetic programming coding method 3.1 Supervised Learning 29 Fig. 3.10 Cross operation Fig. 3.11 Mutation operation individuals from two randomly selected parent individuals. Figure 3.10 shows the process of two individuals performing crossover operations. Through mutation operation, a child individual is generated from a parent individual selected randomly based on fitness. As shown in Fig. 3.11, an individual has undergone a mutation operation. (4) (5) Fitness function and fitness: the concept of fitness which from nature is used in genetic programming to measure how likely each individual in the population is to help find the optimal solution in the optimization calculation. Individuals with higher fitness will have a higher probability of inheriting to the next generation, while individuals with lower fitness have a relatively small probability of inheriting to the next generation. The function that measures fitness is called fitness function. GP algorithm: The basic steps of genetic programming can be roughly divided into three steps: a. randomly generating the initial population; b. repeating the following sub-steps on the program population until the termination criterion is met. Each program individual is assigned a fitness degree. Three genetic operators (copy, crossover, mutation) are used to generate a new program population, and the individual to be processed is selected based on the probability value of 30 3 Machine Learning and Applications Fig. 3.12 Visualization of GP algorithm fitness (reselection is allowed); c. returning the individual program determined by means of indicating the result as the running result of genetic programming. Figure 3.12 provides the Visualization of GP algorithm. 3.1.6 NB Naive Bayes (NB) is a classifier based on applying Bayes’ theorem with absolute independence assumptions between attributes. As a result of the naive design and simplified assumptions, the NB model can work efficiently and it only requires a relatively small database for optimizing hyper-parameters (Bhargavi and Jyothi 2009). In general, the construction for a NB model contains five main steps: (i) collecting data, (ii) estimating a prior probability of each class, (iii) estimating means of classes, (iv) generating covariance matrices and finding the inverse and determinant for each class, (v) forming the discriminant function for each lass (Bhargavi and Jyothi 2009; Pham et al. 2017). Given that x = (x1 , x2 , ...xn ) is the input vector and y = (y1 , y2 , ...y K ) is the output vector. The process for classification of NB model can be expressed as follows: P(yk ) ŷ = arg max k={1,2,...,K } = n P(xi |yk ) i=1 P(x) (3.5) 3.1 Supervised Learning 31 Fig. 3.13 Visualization of NB algorithm where ŷ is the class label; P(yk ) is the prior probability of yk ; P(xi |yk ) is the conditional probability. Figure 3.13 provides the Visualization of NB algorithm. 3.1.7 SVM Support vector machine (SVM) is one kind of the supervised learning algorithm, first proposed by Vapnik (2000) and gradually gains more and more popularity for its superior performance in dealing with classification and regression problems. In the following, some basic principles of the SVM model are presented. Assume a specific training set containing l data points, such as (x 1 , y1 ), (x 2 , y2 ), …, (x l , yl ), where x i ∈ Rn (i = 1,2,…, l) is the input variables and yi ∈ R (i = 1,2,…, l) is the corresponding target value. According to Xue et al. (2014), the aim for the support vector regression (SVR) is to solve an optimization problem as follows: Minimize 1 ||w||2 + C (ξi + ξi∗ ) 2 i=1 l Subjected to (3.6) 32 3 Machine Learning and Applications ⎧ n l ⎪ ⎪ ⎪ y − w j x ji − b ≤ ε + ξi ⎪ i ⎪ ⎪ ⎪ j=1 i=1 ⎪ ⎨ l n ⎪ ⎪ w j x ji + b − yi ≤ ε + ξi∗ ⎪ ⎪ ⎪ ⎪ j=1 i=1 ⎪ ⎪ ⎩ ξi , ξi∗ ≥ 0 (3.7) where f (x) = n w j x j + b with w ∈ R n , b ∈ R (3.8) j=1 Some detailed information about these three equations is listed herein. n in Eq. (3.7) and Eq. (3.8) is the number of all variables for training and testing while b represents the bias. j is the weight for the l inputs to the corresponding outputs. The coefficient C guarantees both the simplicity and prediction accuracy of f (x). When the errors between the predicted results and the observations are smaller than ε, this is within permission and no penalty is needed, where ε is named as the insensitive loss function shown in Fig. 3.14. Aside from ε, two slack variables, ξi and ξi∗ to determine the degree of penalizing the samples, can be found in Fig. 3.14. When constructing the SVR models, there are several common kernel functions, including linear kernel functions, polynomial kernel functions, radial basis functions (RBF), etc. (Zhang et al. 2016). To obtain an ideal SVR model, the available data is often divided into training set and the validation set, respectively. The training set is first utilized to build up a rough model and then the validation set is adopted to adjust Fig. 3.14 ε-Insensitive loss function and slack variable ξ in SV-regression (after Samui and Dixon 2012) 3.1 Supervised Learning 33 Fig. 3.15 Visualization of SVM algorithm the hyperparameters to obtain an optimal model to give the accurate predictions with minimum errors. To avoid overfitting, a technique called k-fold cross-validation is widely used. For more details about SVM, it can refer to Samui and Dixon (2012); Xue et al. (2014); Zhang et al. (2016). Figure 3.15 provides the Visualization of SVM algorithm. 3.1.8 KNN KNN is the abbreviation for K-nearest neighbor, one kind of basic supervised learning algorithm. KNN can be adopted to solve both classification and regression problems (Wu et al. 2008). By comparison, the classification problem is more widely used. Therefore, algorithms for classification are mainly introduced here. The data set is first divided into training and testing sets. The training set is utilized to preliminarily establish the KNN model and then the validation set is used to obtain the appropriate hyperparameters. Then, the testing set is adopted to verify the established model. The key principle for KNN classification is to classify the new data into the nearest classes according to the Euclidean Distance, Manhattan Distance or Minkowski Distance and the equations to calculate these three kinds of distances are listed in Eqs. (3.9, 3.10 and 3.11), respectively. Figure 3.16 provides the Visualization of KNN algorithm. 34 3 Machine Learning and Applications Fig. 3.16 Visualization of KNN algorithm D Euclidean k = (xi − yi )2 (3.9) i=1 D Manhattan = k |xi − yi | (3.10) i=1 D Minkowski = k 1/q (|xi − yi ) q (3.11) i=1 3.2 Unsupervised Learning Unsupervised learning uses ML algorithms to analyze and cluster unlabeled datasets. These algorithms discover hidden patterns or data groupings without the need for human intervention. Its ability to discover similarities and differences in information make it the ideal solution for exploratory data analysis, cross-selling strategies, customer segmentation, image and pattern recognition. It’s also used to reduce the number of features in a model through the process of dimensionality reduction; principal component analysis (PCA) and singular value decomposition (SVD) are two 3.2 Unsupervised Learning 35 common approaches for this. Other algorithms used in unsupervised learning include neural networks, k-means clustering, probabilistic clustering methods, and more. 3.2.1 K-means K-means is a typical unsupervised learning algorithm, which divides the sample into several disjoint clusters (Jain 2010). The clustering results have high intra-cluster similarity and low inter-cluster similarity. The idea of k-means algorithm is relatively simple. Assuming that the data is to be divided into K categories, the steps are as follows: The k-means algorithm first initializes k different centroids μ(1) , · · ·, μ(k) , calculates the distance between each sample and the centroids k, and selects the cluster with the closest centroid μ(i) as the category of sample i. Then recalculating the centroid of each cluster and adjusting the category of each sample again until reaching the maximum number of iterations or the adjustment range is less than the threshold. Since the similarity between all samples and each centroid is calculated each time, the convergence speed of the k-means algorithm is relatively slow on large-scale data sets. 3.2.2 PCA In the field of data science, we are frequently faced with the problem of overfitting. Reducing the dimension of the original feature space and simplifying structures hidden in the database is an efficient solution. In order to reduce the dimensionality, there are two commonly used techniques, namely, feature elimination and feature extraction. Feature elimination is used to drop out unimportant variables that contribute less to the target. It is a simple method, and it remains the interpretability as possible. Feature extraction is used to generate new independent variables that account for the correlation in the database, and each newly created variable is a combination of the original variables. Principal components analysis (PCA) is a widely used technique of feature extraction. It combines the input variables in a specific way, and the most important independent principal components can then be used as predictor variables for further analysis. It has several strengths; however, the model interpretability is deteriorated as original variables are not remained at all (Bakshi 1998). Figure 3.17 provides the VOSviewer plots for some main unsupervised learning methods. For brevity, the specific explanations as for the size of the node and the word, as well as the distance are omitted. 36 3 Machine Learning and Applications Fig. 3.17 Network visualization of a SVD b PCA c K means d Probabilistic clustering 3.2 Unsupervised Learning Fig. 3.17 (continued) 37 38 3 Machine Learning and Applications 3.3 Semi-Supervised Learning Semi-supervised learning offers a happy-to-both medium between supervised and unsupervised learning. During training, it uses a smaller labeled data set to guide classification and feature extraction from a larger, unlabeled data set. Semi-supervised learning can solve the problem of having not enough labeled data (or not being able to afford to label enough data) to train a supervised learning algorithm. References Agatonovic-Kustrin S, Beresford R (2000) Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J Pharm Biomed Anal 22:717–727 Angeline PJ (1994) Genetic programming: On the programming of computers by means of natural selection. Biosystems 33:69–73. https://doi.org/10.1016/0303-2647(94)90062-0 Bakshi BR (1998) Multiscale PCA with application to multivariate statistical process monitoring. AIChE J 44:1596–1610. https://doi.org/10.1002/aic.690440712 Ben Chaabene W, Flah M, Nehdi ML (2020) Machine learning prediction of mechanical properties of concrete: Critical review. Constr Build Mater 260https://doi.org/10.1016/j.conbuildmat.2020. 119889 Bhargavi P, Jyothi S (2009) Applying Naive Bayes data mining technique for classification of agricultural land soils. IJCSNS Int J Comput Sci Netw Secur 9:117–122 Bin HG, Zhu QY, Siew CK (2006) Extreme learning machine: Theory and applications. Neurocomputing 70:489–501. https://doi.org/10.1016/j.neucom.2005.12.126 Breiman L, Friedman JH, Olshen RA, Stone CJ (2017) Classification and regression trees. Routledge Elmousalami HH (2020) Artificial intelligence and parametric construction cost estimate modeling: state-of-the-art review. J Constr Eng Manag 146:03119008. https://doi.org/10.1061/(asce)co. 1943-7862.0001678 Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666. https://doi.org/10.1016/j.patrec.2009.09.011 Koza JR (1992) The genetic programming paradigm: Genetically breeding populations of computer programs to solve problems. Dyn Genet Chaotic Program 203–321. https://doi.org/10.1109/TAI. 1990.130444 Murthy SK (1998) Automatic construction of decision trees from data: A multi-disciplinary survey. Data Min Knowl Discov 2https://doi.org/10.1023/A:1009744630224 Niuniu X (2010) Review of decision trees. 2010 3rd Int Conf Comput Sci Inf Technol 105–109 Oltean M, Groşan C, Dioşan L, Mihǎilǎ C (2009) Genetic programming with linear representation: A survey. Int J Artif Intell Tools 18 Pham BT, Tien Bui D, Pourghasemi HR et al (2017) Landslide susceptibility assesssment in the Uttarakhand area (India) using GIS: a comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods. Theor Appl Climatol 128:255–273. https://doi.org/10.1007/s00704-015-1702-9 Quinlan JR (1993) {C4}.5—Programs for Machine Learning Rashidi HH, Tran NK, Betts EV, et al (2019) Artificial intelligence and machine learning in pathology: The present landscape of supervised methods. Acad Pathol 6 Samui P, Dixon B (2012) Application of support vector machine and relevance vector machine to determine evaporative losses in reservoirs. Hydrol Process 26:1361–1369. https://doi.org/10. 1002/hyp.8278 Vapnik VN (2000) The nature of statistical learning theory. Nat Stat Learn Theory. https://doi.org/ 10.1007/978-1-4757-3264-1 References 39 Wei W, Ramalho O, Malingre L et al (2019) Machine learning and statistical models for predicting indoor air quality. Indoor Air 29:704–726 Wieczorek W, Czech ZJ (2000) Grammars in genetic programming. Control Cybern 29:1018–1030 Wu X, Kumar V, Ross QJ et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37. https://doi.org/10.1007/s10115-007-0114-2 Xue XH, Yang XG, Chen X (2014) Application of a support vector machine for prediction of slope stability. Sci China Technol Sci 57:2379–2386. https://doi.org/10.1007/s11431-014-5699-6 Zhang Y, Dai M, Ju Z (2016) Preliminary discussion regarding SVM kernel function selection in the twofold rock slope prediction model. J Comput Civ Eng 30:04015031. https://doi.org/10.1061/ (asce)cp.1943-5487.0000499 Chapter 4 Deep Learning and Applications DL attempts to mimic the human brain, albeit far from matching its ability, enabling systems to cluster data and make predictions with incredible accuracy. It consist of multiple layers of interconnected nodes, each building upon the previous layer to refine and optimize the prediction or categorization. This progression of computations through the network is called forward propagation. The input and output layers of a deep neural network are called visible layers. The input layer is where the deep learning model ingests the data for processing, while the output layer is where the final prediction or classification is made. Another process called backpropagation uses algorithms, like gradient descent, to calculate errors in predictions and then adjusts the weights and biases of the function by moving backwards through the layers in an effort to train the model. Together, forward propagation and backpropagation allow a neural network to make predictions and correct for any errors accordingly. Over time, the algorithm becomes gradually more accurate. The above describes the simplest type of deep neural network in the simplest terms. However, DLs are incredibly complex, and there are different types of neural networks to address specific problems or datasets, for example, AE, DBN, CNN, RNN and LSTM, etc. 4.1 AE Autoencoder is mainly used in dimensionality reduction and compression (Hinton and Zemel 1994). The strategy is to try to make the output equal to the input or attempts to reconstruct the data. The automatic encoder includes an encoder and a decoder. The encoder receives an input, then encodes it into a vector in a lowdimensional latent space, and then the decoder is responsible for decoding the vector to obtain the original input. AEs can obtain a feature representation of a less dimensional input from the output in the network, which achieves dimensionality reduction and compression. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 W. Zhang et al., Application of Soft Computing, Machine Learning, Deep Learning and Optimizations in Geoengineering and Geoscience, https://doi.org/10.1007/978-981-16-6835-7_4 41 42 4 Deep Learning and Applications In addition, it can be utilized to retrieve slightly different input data, or even better data, which can be used for training data enhancement, data denoising, etc. 4.2 DBN Restricted Boltzmann machine (RBM) is a stochastic neural network with generating ability, which can learn probability distribution through input (LeCun et al. 1989; Hinton 1989). Compared with other networks, its biggest feature is that it has only input and hidden layers, not output. In the forward training, a corresponding feature representation generated after an input passed in, and then in the backpropagation, the original input is reconstructed from this feature representation. This process is similar to autoencoder, but it realized in a single network. Multiple restricted Boltzmann machines can be stacked to form a DBN. It is similar to the fully connected layer, but the training way is different. The training of DBN is to train its network layers in pairs according to the training process of RBMs. Recently, fewer people use DBN and RBM due to the emergence of generative adversarial networks and mutated autoencoders. 4.3 CNN CNN is a kind of deep learning algorithm that has been widely used for video recognition, image classification, recommendation systems and self-driving problems (LeCun et al. 1998; Krizhevsky et al. 2017). Compared with traditional methods, CNN has unparallel prediction accuracy and working efficiency, as the pre-processing stage requires less efforts. The core feature of CNN lies in the building block which is composed of many layers that are interconnected with each other using a set of shared weights and biases. Figure 4.1 shows a typical architecture of a CNN model, containing convolutional, pooling and fully connected layers. 4.4 RNN Recurrent neural network is a dynamic neural network and can capture the timeseries data (Graves et al. 2013). As shown in Fig. 4.2, the basic structure of RNN is demonstrated. The advantage of this algorithm is to be able to capture the sequential information. To be specific, a former unit may have influence on the next unit. Based on this structure, the information will pass from one layer to the next sequence by sequence. The input of RNN can be multiple and orderly, and it can transfer the information of the upper hidden layer to the next hidden layer of the neuron. That’s to say, the 4.4 RNN Fig. 4.1 Model structure of CNN structure Fig. 4.2 Model structure of RNN structure (after Le et al. 2019) 43 44 4 Deep Learning and Applications RNN algorithm has a certain memory ability. Furthermore, RNN only remembers the important information and then definitely forgets the remaining unimportant information. Although the principles of the RNN algorithm are straightforward and simple, yet it may be difficult to obtain an ideal RNN model when there is a long time-lag between the target and its former associated events. 4.5 LSTM In order to solve the problem of ‘gradient vanishing or exploding’ faced by RNN, LSTM was proposed (Hochreiter and Schmidhuber 1997). Compared with RNN, LSTM makes better use of historical information for sequential problems, and it is much more applicable for complex sequence learning problems. The main improvement for LSTM lies in the ‘memory cell’, as illustrated in Fig. 4.3. The memory cell in LSTM contains three specially designed gates, namely input gate, forget gate, and output gate. The input gate controls how much new information should be passed to the cell state. The forget gate determines what information will be remained or removed from the cell state. The output gate calculates outcome results, and this process is generally carried out by a sigmoid function. In a word, based on a combination of the current input and the previous output, the LSTM model predicts the target value at next time step. This process continues throughout the whole life cycle before the required error or the maximal iteration number is reached. Fig. 4.3 Model structure of LSTM structure (Zhao et al. 2019) References 45 References Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: ICASSP, IEEE International conference on acoustics, speech and signal processing—proceedings, pp 6645–6649 Hinton GE (1989) Deterministic Boltzmann learning performs steepest descent in weight-space. Neural Comput 1:143–150. https://doi.org/10.1162/neco.1989.1.1.143 Hinton GE, Zemel RS (1994) Autoencoders, minimum description length and Helmholtz free energy. Adv Neural Inf Process Syst 6:3–10 Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780. https:// doi.org/10.1162/neco.1997.9.8.1735 Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84–90. https://doi.org/10.1145/3065386 Le D-T, Lauw HW, Fang Y (2019) Correlation-sensitive next-basket recommendation. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). 2808– 2814. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2323. https://doi.org/10.1109/5.726791 LeCun Y, Galland CC, E HG (1989) GEMINI: Gradient estimation through matrix inversion after noise injection Zhao H, Chen Z, Jiang H, et al (2019) Evaluation of three deep learning models for early crop classification using sentinel-1A imagery time series—a case study in Zhanjiang, China. Remote Sens 11:2673. https://doi.org/10.3390/RS11222673 Chapter 5 Optimization Algorithms and Applications When you’re trying to make tough decisions about questions that involve an inordinate number of factors, optimization helps you to capture key components to build a mathematical model of the engineering situation, giving you the confidence to make better decisions more quickly. Optimization problems can be divided into function optimization problems and combinatorial optimization problems. The object of function optimization problem is a continuous variable in a certain interval, which solves the problem of continuity. The object solution space of combinatorial optimization is discrete state, i.e., it solves discrete problems. Typical combinatorial optimization includes Traveling salesman problem (TSP), Scheduling problem, Knapsack problem, Bin packing problem, Graph coloring problem, Clustering problem, etc. Optimization problems can also be divided into precise algorithm and approximation algorithm (heuristic algorithm). The first is generally rather complex, only suitable for solving small-scale problems, and not practical in geo engineering. The latter is based on intuitionistic or empirical construction which gives a feasible solution of each instance of the combinatorial optimization problem to be solved under acceptable time and space cost. The deviation between the feasible solution and the optimal solution cannot be predicted. Metaheuristic algorithm is an improvement of heuristic algorithm. It is the combination of random algorithm and local search algorithm. The metaheuristic algorithm an also be further divided as: individual-based, such as Tabu search algorithm and simulated annealing, and population-based, such as evolutionary algorithm and novel swarm intelligence optimization mentioned above. Nowadays, the use of swarm intelligence optimization in geoengineering and geo-science is dominant. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 W. Zhang et al., Application of Soft Computing, Machine Learning, Deep Learning and Optimizations in Geoengineering and Geoscience, https://doi.org/10.1007/978-981-16-6835-7_5 47 48 5 Optimization Algorithms and Applications 5.1 Precise Algorithm As mentioned in Fig. 1.2 about the basic components of OA algorithms, the precise algorithm consists of the Stochastic Gradient Descent, Conjugate Gradient (CG) as well as the steepest Descent. Figure 5.1 provides the VOSviewer plots for SGD and CG use in geoengineering and geosciences. 5.2 Evolutionary Algorithm Evolutionary computation is an approach to engineering and optimization in which solutions, instead of being constructed from first principles, are instead evolved through processes modeled after the elements of Darwinian evolution. Evolutionary computation is one of the principal methods in what is called ‘nature-inspired computing.’ The evolutionary algorithm is the main object of interest in evolutionary computation. There is a problem to be solved, and the solution is conceived to lie somewhere in a space of possible candidate solutions—the search space. Based on the literature survey, the common evolutionary algorithms include: 5.2.1 GA GA is one of the most representative strategies for simulating evolutionary computing, which proposed by Holland (1992). It is a random global search and optimization method that imitates the evolution of natural organisms. This method acquires and accumulates knowledge about the search space during the search process, and adaptively controls the search process to obtain the optimal solution. It mainly adopts crossover and mutation operations to achieve global search, which effectively reduces the disadvantages of traditional algorithms that are easy to fall into local optimal solutions. The standard genetic algorithm consists of encoding process, fitness function and genetic operators. Figure 5.2 provides the network Visualization of GA algorithm. 5.2.2 BO Snoek et al.(2012) proposed BO algorithm for hyperparameter tuning in 2012. Since its excellent capabilities and the advantages of time saving, it has been widely used in deep learning, system optimization, environmental monitoring, etc., which providing a new direction for solving complex optimization problems. 5.2 Evolutionary Algorithm Fig. 5.1 VOSviewer plots for a SGD and b CG use in geoengineering and geosciences 49 50 5 Optimization Algorithms and Applications Fig. 5.2 Network visualization of GA algorithm BO can be regarded as a process of finding the optimal parameters to make the unknown objective function to reach the maximum or minimum value. This process involves the two core parts of the Bayesian optimization algorithm, namely the probabilistic surrogate model and the acquisition function. BO algorithm uses a probabilistic surrogate model to fit the true objective function, and actively selects the evaluation point with the most potential for the next evaluation based on the acquisition function, thereby the optimal solution of the objective function can be obtained using a small number of evaluation times, and avoid unnecessary sampling. Figure 5.3 provides the network Visualization of BO algorithm. 5.2.3 PSO PSO is a global optimization algorithm proposed by Eberhart and Kennedy (1995). The main idea is to treat particles as each solution of the optimization problem, and each particle swims in groups based on flight experience of its own and other particles. In order to measure the superiority of each particle solution, it defines the fitness value function to search for the optimal solution from the whole space. Figure 5.4 provides the Network Visualization of PSO algorithm. 5.2 Evolutionary Algorithm Fig. 5.3 Network visualization of BO algorithm Fig. 5.4 Network visualization of PSO algorithm 51 52 5 Optimization Algorithms and Applications 5.2.4 DE DE is a stochastic, simple, and efficient evolutionary algorithm to find global optima for numerous unconstrained test functions by Storn and Price (1997). Compared with other evolutionary algorithms, DE reserves population-based global search strategy and uses a simple mutation operation of the differential and one-on-one competition, so it can reduce the genetic complexity of the operation (Huang and Chen 2013). Meanwhile, the special memory capacity of DE makes it available to dynamically track the current search to adjust the search strategy with a strong global convergence and robustness. Therefore, it is suitable for solving some complex optimization problems (Huang and Chen 2013). The main procedure of DE contains three main operators: namely, mutation, crossover, and selection. In an iterative process, assume that the population of each generation G contains N individuals. The initial population is often randomly generated following a uniform distribution over the variable domain. The mutation operation is used to generate an individual, by adding the weighted difference between two population individuals to a third one. The crossover operation is introduced to diverse the new population. According to the crossover strategy, the old and new individual exchange part of the code to form a new individual. The selection operation is greedy strategy, which compares the cost function values of a candidate individual and the target individual to determine the best one. Figure 5.5 provides the Network Visualization of DE algorithm. Fig. 5.5 Network visualization of DE algorithm 5.2 Evolutionary Algorithm 53 5.2.5 ABC ABC is a swarm-based meta-heuristic algorithm to solve optimization problems. It is inspired by the intelligent foraging behavior of swarm honeybees (Karaboga 2005). In the ABC algorithm, there are three types of bees in the colony: employed bees (forager bees), onlooker bees (observe bees) and scouts. Each type of bee handles a specific task. Employed bees exploit and search for food sources, and for each food source there is only one employed bee. Employed bees share information with the onlooker bees in a hive so that onlooker bee can choose a food source to forager. Scouts are associated with searching for new food sources by exploring the environment surrounding the hive. The waggle dance is used to show the nectar content of the food source explored by a dancing bee, and the duration of the dance is proportional to quality of food source. Onlooker bees will watch numerous dances before choosing a good food source according to the potential quality of that food source. When a bee, whether it is scout or onlooker, finds a food source, it becomes employed. When a food source is discarded, the employed bees associated it will again become scouts or onlookers randomly. It is found that scouts are performing the job of exploration, while employed and onlooker bees are performing the job of exploitation (Garg 2014). There are three important controlling parameters in ABC, which are colony size, limit, and maximum cycle (Yusup et al. 2014). A more detailed calculation process to achieve this algorithm can be referred to (Karaboga and Akay 2009). Figure 5.6 provides the Network Visualization of ABC algorithm. 5.2.6 ACO ACO is a recently developed, population-based technique applied for solving optimization problems by Dorigo and Gambardella (1997). This algorithm is inspired by the foraging behavior of real ant colonies. When searching for food, the ants explore the environment surrounding their nest in a random manner. If an ant finds a food source, it will evaluate it and carry some food back to the nest. During the return trip, the ant marks the path by laying some amount of pheromone on the edge of the path. The pheromone contains information of quantity and quality of the food source, and other ants are then attracted by the pheromone leading them to the food. The pheromone trails represent an indirect communication way (also known as stigmergy) among ants, making it available to find shortest path between the nest and food source (Nait Amar et al. 2018). Figure 5.7 provides the Network Visualization of ACO algorithm. 54 5 Optimization Algorithms and Applications Fig. 5.6 Network visualization of ABC algorithm 5.2.7 CS CS is a nature-inspired population-based stochastic metaheuristic algorithm developed by Yang and Deb (2009). The main idea of this algorithm is inspired by the parasitic behavior of some cuckoo species in combination with the Lévy flight, which is a motion way used by birds to search for food. In the nature, some cuckoos lay their eggs in the nests of other species of birds with amazing abilities selecting suitable nests where host birds recently laid eggs and removing existing eggs to increase the hatching probability of their own eggs. The cuckoo search algorithm can be described based on three-point idealization conditions (Kanagaraj et al. 2014; Zhang et al. 2019). A more detailed construction process to achieve this algorithm can be referred to Yang and Deb (2009). Figure 5.8 provides the Network Visualization of CS algorithm. 5.2.8 FA FA is a heuristic novel bionic optimization algorithm (Yang 2009), which was inspired by the flickering behavior of fireflies, using points in search space to simulate firefly individuals in nature. FA can deal with multi-modal functions more efficiently than 5.2 Evolutionary Algorithm Fig. 5.7 Network visualization of ACO algorithm Fig. 5.8 Network visualization of CS algorithm 55 56 5 Optimization Algorithms and Applications Fig. 5.9 VOSviewer plots for FA algorithm use in geoengineering and geosciences other swarm algorithms (Łukasik and Zak 2009; Yang 2009; Gandomi et al. 2011; Yang et al. 2012). The search process is simulated as the attraction and movement process of firefly individuals. FA mainly includes two basic elements: brightness and attractive brightness (Huang et al. 2019). Brightness mainly reflects the quality of the location of the firefly which can determine the direction in which the firefly moves. Attraction determines the distance the fireflies can travel. The optimization of the goal is done by constantly updating the position of the fireflies in brightness and attractiveness. Figure 5.9 provides the VOSviewer plots for FA Algorithm use in geoengineering and geosciences. 5.2.9 GWO Inspired by the predation behavior of gray wolves, (Mirjalili et al. 2014) proposed a new type of swarm intelligence optimization algorithm: GWO algorithm in 2014. GWO achieves the goal of optimization by simulating the predation behavior of gray wolves, based on the cooperation mechanism of wolves. Figure 5.10 provides the details of GWO, including the demonstration of hunting behavior, i.e., tracking, chasing, and approaching the prey; pursuing, encircling, and harassing the prey until it 5.2 Evolutionary Algorithm 57 Fig. 5.10 GWO: a demonstration of hunting behavior, b 2D and 3D position vectors, c position updating and d basic flowchart 58 Fig. 5.10 (continued) 5 Optimization Algorithms and Applications 5.2 Evolutionary Algorithm 59 Fig. 5.11 The Whale optimization algorithm (after Mirjalili and Lewis (2016)) stops moving; attack towards the prey, 2D and 3D position Vectors, position updating and basic flowchart. 5.2.10 WOA Mirjalili and Lewis (2016) introduced and developed Whale Optimization Algorithm, which is one of the metaheuristic algorithms and proved its capability in balancing between the exploration and exploitation compared to the state-of-the-art optimization algorithms. The humpback whales use the bubble-net hunting technique (Aljarah et al. 2018). WOA algorithm is composed of three phases including: (1) encircling prey phase; (2) exploitation phase; (3) exploration phase (sea

Are you sureyou want to report this book? Please specify the reason below