Data Mining Thesis

Pages: 10 (3527 words)  ·  Bibliography Sources: 8  ·  File: .docx  ·  Level: College Senior  ·  Topic: Education - Computers

Data Mining

Evaluating Data Mining as a Strategic Technology

The ability to quickly gain insights from a diverse and often incompatibles set of databases and data sets are possible when data mining techniques are used. Data mining is the process by which very large datasets are analyzed for trends, patterns, insights and intelligence not discernable from a cursory analysis of the data sets themselves through manual means (Osei-bryson, Rayward-smith, 2009). Data mining is the study of how to glean insights and intelligence from data sets which are often not integrated with each other in a common database, further adding a level of abstraction to the analysis, making its interpretation even more difficult (Buddhakulsomsiri, Zakarian, 2009). There is an exceptional level of insights that can be gained by evaluating data mining as a strategic technology. The use of data mining for auto warranties for example (Buddhakulsomsiri, Zakarian, 2009) where there is a massive amount of data to interpret in completing government reporting requirements, is a case in point. The intent of this analysis is to evaluate data mining as a strategic technology.

Evaluating Data Mining as a Strategic Technology

Buy full Download Microsoft Word File paper
for $19.77
The continual refinement of data mining from a technology to platform on which solutions for analyzing, monitoring and defining are built continues at an accelerating pace (Osei-bryson, Rayward-smith, 2009). The levels of economic uncertainty and the need companies have to compete using intelligence is one of the primary factors driving its adoption and growth (Li, Wu, 2010). Global economic recessions tend to be the catalysts of information technologies that have the potential to deliver inordinately large increases in insight, competitive and market intelligence. The use of data mining is accelerating as a result of companies across all industries seeking to gain a competitive advantage through analysis of their channels, customers, suppliers and own internal processes as well.

Thesis on Data Mining Assignment

Examples of data mining abound in industries that have an exceptionally large amount of information they have collected form customers. This includes but is not limited to aerospace and defense (Cressionnie, 2008), auto manufacturers including aftermarket auto warranty analysis and lifetime product quality of automobiles (Buddhakulsomsiri, Zakarian, 2009), customer relationship management (Sun, 2006), eduation (Velasquez, Gonzalez, 2010), healthcare (Li, Wu, D2010) and many others. Despite the diversity of these industries they all share a common need for gaining greater insights into the interrelationships hidden in structured and unstructured content in their organizations. All also share the need for using the data in their companies for getting an understanding of how strategies in place today will yield results in the future (Kuhn, Ducasse, Girba, 2007). Data mining also requires an intensive level of data integration across databases, legacy and often standalone systems, in addition to a redefining of the most critical processes used for accumulating information in the first place (da Cunha, Agard, Kusiak, 2010). The intensive nature of data, system and process integration however can yield significant insights and intelligence not capable of being captured before.

The intent of this analysis is to evaluate the essentials of data mining include its definitions, assess data mining as a technology trend, analyze how data mining and its many associated technologies are managed and used at Google, and assess the future direction of data mining as well. Data mining is also leading to the development of text mining applications that take in massive amounts of unstructured text and create linguistic models from the data so new insights can be found including the emerging field of customer sentiment analysis (Li, Wu, 2010). CRM-based implementations of data mining often include sentiment analysis which provide insights into branding and perceptions of companies obtained through social networks (Sun, 2006). The future of data mining is going to include sentiment analysis and the ability to ascertain attitudinal data from the massive amounts of data being generated from social networks (Lai, Liu, 2009).

Defining Data Mining

Definitions of data mining vary significantly in scope and inclusion or exclusion of key concepts. The most common definition includes the four types of relationships including classes, clusters, associations and sequential patterns (Han, Kamber, 2000). Data mining definitions also vary in their reliance on the level of insight and intelligence that these processes deliver, with the most recent concentrating on linguistic modeling being able to determine sentiment and attitudinal scaling based on social networks' unstructured content (Li, Wu, 2010). The more mainstream definition of data mining however concentrates on the integration of disparate, often non-integrated systems together so that a single system of record can be produced upon which analysis, queries and advanced extraction can be performed (Berry, 2004). The use of Extraction, Transfer & Load (ETL) technologies and Online Analytic Processing (OLAP) are often used for creating reporting and analytical frameworks that organizations use to streamline the analysis, reporting and continual updating of databases in a data warehouse, which is used for completing data mining tasks (Rutledge, 2009).

While there are major differences in these definitions of data mining, they all share the common mission of unifying the analytical, transaction and customer-based databases that are prevalent throughout organizations. Data mining applications are used for determining patterns, relationships and the relative strength or weaknesses of causality in data sets, often looking to bring greater intelligence to transaction-based records and databases (Maggioni, 2009). In many data mining systems the overarching objective is to find greater levels of insight into transactions so that more effective selling and CRM-based strategies (Sun, 2006) can be accomplished. Definitions of data modeling also vary in terms of their reliance to the underlying technologies for finding relationships in the data itself. Traditionally statistically-based analytics applications were used for looking at causality and the strength or weakness of interrelationships in the data itself (Cressionnie, 2008). There are also data mining applications that seek to create neural networks (Han, Kamber, 2000) that can interpolate the relationships between data elements and create causal-based models over time. Google is using data mining not only to determine how users are accessing their search engine, for the definition of personalization (Stamou, Ntoulas, 2009) and for the development of linguistic models through latent semantic indexing (Kuhn, Ducasse, Girba, 2007) which gives the search engine provider a better understanding of how to index the Internet.

Classes, clusters, associations and sequential patterns are the four types of relationships that data mining applications seek to discover and add insight to (Stamou, Ntoulas, 2009). Classes are as the name suggests stored data that provide segmentation-based insights, including the purchasing behavior of customers and their demographic characteristics. Classes are often used as segmentation criteria across all industries that rely on data mining. Clusters are the second type of relationship that data mining applications look for in analyzing data sets and systems of record (Stamou, Ntoulas, 2009). Clusters are data items that are grouped through previously defined customer relationships and preferences, and as a result these are also used in the development of market segments. The use of clustering has also been used in the development of linguistic modeling to determine customer audiences within segments including the definition of consumer affinities for given channels of communication and methods of learning about new products (Sun, 2006). Data modeling in this regard has been instrumental in the development of entirely new approaches to managing communications and the integration of social networking applications into the multichannel messaging strategies of companies as well. The third type of relationships that data mining applications look to capture, validate and report on is associations. The classic connection of husbands and young fathers who purchase beer and diapers in the same grocery store run is an example of this type of relationship (Li, Wu, 2010). The last type of relationship that data mining applications seek to find are sequential patterns that are used for predicting future behavior of a specific audience or customer segment including the development of mass customization selections for build-to-order products and services (da Cunha, Agard, Kusiak, 2010). The use of sequential patterns for the development of cross-sell and up-sell selections in e-commerce systems is becoming more prevalent as this type of data mining gains adoption and integration into e-commerce platforms. The development of mass customization product strategies is highly dependent on this ability to determine sequential associations between products as well. The use of linguistics modeling and latent semantic indexing within Google is another example of how this approach to discovering and analyzing sequential patterns over time (Stamou, Ntoulas, 2009). The use of these linguistic models to also determine specific personalization requirements for each search on Google is an example of data mining taken to a highly personalized level (Stamou, Ntoulas, 2009).

The foundation of all data mining definitions also include five major elements that illustrate the major process steps required for data mining applications to be successful (Li, Wu, 2010). These include the first stage of extract, transfer and load (ETL) of data into the data warehouse systems (Stamou, Ntoulas, 2009) so the data can be quickly queried and used to create models for continual analysis of data sets. The second… [END OF PREVIEW] . . . READ MORE

Two Ordering Options:

Which Option Should I Choose?
1.  Buy full paper (10 pages)Download Microsoft Word File

Download the perfectly formatted MS Word file!

- or -

2.  Write a NEW paper for me!✍🏻

We'll follow your exact instructions!
Chat with the writer 24/7.

Data Mining in Business Research Executive Summary Term Paper

Data Mining in Healthcare Information Systems Case Study

Data Mining Techniques in a Healthcare Discussion Chapter

Data: Warehousing, Mining, and Management Term Paper

Database and Data Mining Security Strategy Essay

View 200+ other related papers  >>

How to Cite "Data Mining" Thesis in a Bibliography:

APA Style

Data Mining.  (2010, February 8).  Retrieved September 30, 2020, from

MLA Format

"Data Mining."  8 February 2010.  Web.  30 September 2020. <>.

Chicago Style

"Data Mining."  February 8, 2010.  Accessed September 30, 2020.