Contemporary Perspectives in Data Mining, Volume 3
reviewed by Wanli Xing & Henglv Zhao - October 19, 2018
Title: Contemporary Perspectives in Data Mining, Volume 3
Author(s): Kenneth D. Lawrence & Ronald K. Klimberg (Eds.)
Publisher: Information Age Publishing, Charlotte
ISBN: 1641130547, Pages: 168, Year: 2017
Search for book at Amazon.com
In recent years, data growth has been tremendous in both scientific and commercial databases due to great advances in data generation and collection technologies. The sheer scale of this data has far exceeded human sense-making capabilities and human evaluation cannot identify the subtle patterns and complex or multi-dimensional relationships between the variables. Data mining is an analytic process that addresses this problem, extracting implicit, previously unknown, and potentially useful information, and ascertaining consistent patterns and/or systematic relationships between variables within raw data. Various fields today make use of data mining tools and techniques to predict future trends and behaviors, allowing individuals and organizations to make proactive, knowledge-driven decisions (Furnas, 2012; Tan, Steinbach, Karpatne, & Kumar, 2018)
Contemporary Perspectives in Data Mining collects 10 papers written by authors from a variety of backgrounds, each of which introduces applications of data mining methods and specific cases of their recent use. It offers thorough explanations of several data mining methods and compelling perspectives of data minings potential in fields like marketing, finance, health care, academic research. A clear picture of how data mining can be employed to generate desirable analytic results and tackle specific issues emerges, providing readers with an in-depth sense of the role data mining plays in several fields. Data researchers and business practitioners at every level will benefit from reading this volume.
Chapters in this book are organized into three sections: Predictive Analytics, Business Applications, and Topics in Data Mining.
The first section consists of three chapters that delineate the applications of several data mining methods in predictive analytics. In the first chapter, statistics scholars Mark T. Leung and Shaotao Pan demonstrate how the bootstrap aggregation (bagging) framework can be applied to artificial neural network (ANN) forecasting in order to improve predictions of product order quantity in supply chain planning. The authors compare the results of quantity forecasting using two designs of bagging ANN, no bagging ANN, and other models like conventional ARIMA and random walk models. They conclude that bagging ANN enhanced forecasting overall and created persistent accuracy in order forecasts in the longer term. In Chapter Two, data scientists Thomas Ott and Stephen Kudyba illustrate how a data management (or retrospective dashboard) platform and data mining abilities can cooperate with an e-based visualized API to generate a comprehensive (retrospective and predictive) business intelligence application for customer churn. In Chapter Three, business Kenneth Lawrence, Gary Kleinman, and Sheila Lawrence focus on developing a regression model to predict CEO compensation among U.S. corporate insurance companies based on several significant predictors such as companies operating income, capital spending, and financial leverage. The three chapters as a group reveal the power of data mining to extract data from different sources and organize them for future prediction. Chapter One also emphasizes that an advanced or improved data mining method could have more satisfactory results than methods without optimization. Readers will come away with a stronger sense of the process involved in extracting useful information from massive data sets.
Four chapters that highlight the importance of its application in business comprise the Business Applications section. In Chapter Four, supply chain management scholars Denish Pai and Hengameh Hosseini introduce data envelopment analysis (DEA), a mathematical programming approach used to characterize the relationship among multiple inputs and outputs, estimating best practice and evaluating productive inefficiency. It then uses DEA to analyze the operational and financial performance of 90 hospitals in Pennsylvania through a two-stage hospital production process. In Chapter Five, Will Greerer, Gregory Smith, David Hyland, and Mark Frolick examine the driving forces behind online grocery shopping, the varying business models in the industry, its benefits and challenges, and the role of data mining in addressing these challenges. In Chapter Six, business analytics professor B. D. McCullough offers an example of subgroup analysis via statistics to demonstrate the pitfalls of subgroup analysis, such as frequent occurrences of type one and type two errors (in statistical hypothesis testing, a type one error is the rejection of a true null hypothesis, while a type two error is failing to reject a false null hypothesis). McCullough then provides five rules that should govern subgroup analysis. In Chapter Seven, Nick Perrino, Gregory Smith, David Hyland, and Mark Frolick describe the importance of business intelligence to small and medium-sized businesses, the challenges they face in accessing it, and the authors vision for how they should approach it. Like Chapter Five, Chapter Seven focuses less on data mining than the other chapters in the book.
Section Three, Topics in Data Mining, contains three chapters that cover topics in, respectively, health care, academic research, and education. In Chapter Eight, supply chain scholars Virginia Miori and Catherine Cardamone demonstrate the application of data mining and statistics techniques such as clustering, partitioning, ANOVA, stepwise regression, and stepwise logistic regression to determine the statistically significant drivers of treatment success in a Florida drug rehabilitation center. In Chapter Nine, statistics scholar Feng Yang, Xiya Zu, and Zhimin Huang introduce extended H-index (EHI), a new method of evaluating researchers impact, which takes the quality of citations of a paper into consideration. It visualizes the relationship among EHI and four other evaluation index methods, and demonstrates how each method evaluates ten economists by exploratory factor analysis. Results show a clear advantage of the EHI. In Chapter Ten, operations research analyst and consultant Ronald Klimberg, Richard Pollack, and Richard Herschel discuss the lost opportunities caused by insufficient emphasis on developing and improving problem-solving skills in the education of analytics professionals. They also propose that analytics academic programs offer analytics heuristics and analytics grand rounds courses in order to improve students problem-solving skills. While it makes a compelling case, like Chapters Five and Seven, Chapter Ten does not fit into the volume as well as the other chapters because it has only a tangential relationship to data mining.
Digressing chapters aside, Contemporary Perspectives in Data Mining offers many good cases that will improve readers understanding of the capacities of data mining. The contributors offer detailed and clear explanations of how several fields can utilize data mining to get desirable information and prediction models, as well as thought-provoking perspectives on the future of data mining.
In summary, this book contributes to the introduction of data mining research and practice as it illustrates the capacities of data mining in key fields in todays society and also offers a very good starting point for practitioners in fields that have not yet explored the power of data mining. I strongly recommend it to all readers who are interested in data mining.
Furnas, A. (2012, April 13). Everything You Wanted to Know About Data Mining but Were Afraid to Ask. The Atlantic. Retrieved from: https://www.theatlantic.com/technology/archive/2012/04/everything-you-wanted-to-know-about-data-mining-but-were-afraid-to-ask/255388/