Text Mining Term Paper

Pages: 10 (3299 words)  ·  Bibliography Sources: 10  ·  File: .docx  ·  Level: Doctorate  ·  Topic: Education - Computers

¶ … Mining

The concept of text mining comes from the idea that there is a relationship between the terms used in a text message or file if that file is unstructured. The relation may extend to other similar files and the relation once established can provide information to business and researchers on many areas that would change the way they do business or enhance knowledge.

The definition of text mining is very broad. In simple terms it can be said that the term 'text mining' refers to the process whereby information is retrieved in text form. But it can also be deeper wherein a pattern is to be established in the textual data where there is a need not only to find the proper text but also a theory of making the information useful. Many definitions of text mining basically assumes the single need to extract high end information from a data base or other unstructured text field and make use of the text in arriving at some conclusion. There is also a connection between the analysis of text and the concept of data mining. This is slightly different in the sense that data mining relates to a structured data base from which the information is sought. The principal challenge in text mining is that the method is used to achieve the same result with unstructured data. (Trujillo, 2010)

Get full Download Microsoft Word File access
for only $8.97.
Text is the most used media and data type. It is the means of exchange of information of all people and the medium used are e-mail, chat, and digital libraries and reports and other books that are available on the internet and other communication medium. There are not only data generated by the users of the net but also a lot of journals, research material and other valuable reports like statistical reports and government related work. These data bases grow at astronomical proportions and are distributed on a global scale. (Mitra; Acharya, 2003) Thus text mining has become an important tool set in many operations on the information spectrum.

Technical Details:

Term Paper on Text Mining Assignment

The method of text mining is complex and has a lot of steps and determinants for it to be totally successful. Text mining begins as an algorithm to extract the facts available from a textual source and converts it to a figure that can be used to create a "hypotheses that are further explored by traditional data mining and data analysis methods." (Maimon; Rokach, 2005) in text parsing, the problems are encountered with hyponyms -- that is generalization of information and thus the contributor -- 'Human' and his positions -- corporate executive, and other features may in general be a casual information, but also a vital information. Thus the information of a general nature would be normally ignored because the span and token of the program does not consider this information. The same may be vital information when viewed from a different perspective. (Srivastava; Sahami, 2009)

To this end in text mining, the major operation is tagging, and the component of a text mining program can tag the document using a statistical tagging, or semantic tagging and this is the basis of arriving at any new information. There are requirements for managers to find information from a new angle and this is often found in customer responses that need not be structured. This thus is based on a task-oriented preprocessing approach to find the method of creating a structured document from an unstructured one. Another method called the 'Text Mining and Information Extraction' is used to summarize a document. In any case, the text mining operations form the base of tagging and thus create entities and relationships. (Maimon; Rokach, 2005) There are many researches underway to create better algorithms. One research that was carried out showed the possibilities of "implementation of information extraction and categorization in the text mining." (Mustafa; Akbar; Sultan, 2009)

The aim of text mining is to provide a method for knowledge management, analysis and decision-making. There are numerous methods and functions that go into the creation of text matter parsing and the 'text mining' has a lot of functions which are combined to create a 'text mining' algorithm. Some of the mining activities include extracting information after a comprehensive search that results in categorization, and the extracted data set is then summarized and used for monitoring and answering questions based on the need. The fundamental requisite from a text mining operation is to get an associative distribution for words and terms and find a common significance that can be used for some research or for business forecast needs. (Mustafa; Akbar; Sultan, 2009)

The most important part is the information extraction. This means that the process of identifying words or feature terms from within a textual file is attempted and these are then processed through a 'layered model of the Text Mining Application.' (Mustafa; Akbar; Sultan, 2009) the text and data mining have the same analytical functions but differ in the use of natural language -- NL and information retrieval -- IR techniques. (Maimon; Rokach, 2005) the procedures that go into the process of text mining are numerous and deserve a special discussion because the method of mining is common to all algorithms in data mining and text mining.


The processes differ slightly between the data mining and the text mining because the text mining is envisaged for unordered data. This makes the basis of the search different. The typical process followed in this is stemming, that is identifying the root of a certain word. The stemming techniques are of two types called the inflectional and derivational. The stem is a very useful concept because the roots manage to avoid the singular, plural and other nuances and help in reducing the data to bare essentials. The size of the dictionary will be thus kept to the minimum and stems and token help keep the accuracy of the information extracted. Thus these two concepts aid in faster and shorter algorithms that develop data from random text matter. Documents will then be classified according to their threads or common contents and this grouping along with the use of identical roots or stems and tokens for the other words that have connections with the root help in finding a feature (Weiss, 2005)

Derivational stemming is where a new word is created from an existing root. This though is interesting; the practical application is with the 'Inflectional Stemming.' The algorithm used is the 'Porter's Algorithm' for stemming. This is where the parsing is done with the elements of the language and its grammar, like plural, singular, present, past and the other grammatical syntax. (Mustafa; Akbar; Sultan, 2009) the inclusion of data mining provides a method of extracting data. However the data is not all in forms. Thus the methods envisaged have helped parse some of the text. We cannot find an algorithm that can anticipate all human communications because of its complexity. Thus text mining has a lot of problems unlike data mining because of the differences in language, method of using the language and also the difference of expression from one individual to another. Words thus may mean different things at different contexts. (Mitra; Acharya, 2003)

The process of categorization is to pin point the category of the domain in use, and combined with a token, it results in allocation of the text to the best category and this is done using the table managing algorithms called the 'Hash Tables.' (Mustafa; Akbar; Sultan, 2009) These procedures are unique to the text mining process because it works with unstructured data using a domain dictionary which has the set of terms that has to be exhaustive for the mining to be effective. Text data is in a compressed form and in the future accessing the text data will be a problem because of the need of algorithms for decompression along with the search. Text data bases are compressed using Lempel -- ziv type algorithms and the similar algorithm is used in data mining and text mining to retrieve the matter efficiently. The greatest source of text is the web, and the mining also is thus related to the web largely. (Mitra; Acharya, 2003)

One method of text mining that was proposed was called 'DISCOTEX -- Discovery from Text EXtraction,' which used a system, and a 'standard rule induction module.' By extracting information it is possible to create a well structured, searchable database that makes the online text more easily accessible. Another algorithm that can be mentioned is the APRIORI a standard association rule mining algorithm and both combined have been claimed to find interesting patterns from book descriptions. (Daelemans; Plessis; Snyman, Teck, 2005)

Not only single words, but also strings can be mined. The analysis of similarities in whole strings also is in the ambit of text mining. The aim of the exercise overall is to achieve information integration and this is achieved when there can be established an optimal correspondence between variables such that some factor can be associated… [END OF PREVIEW] . . . READ MORE

Two Ordering Options:

Which Option Should I Choose?
1.  Buy full paper (10 pages)Download Microsoft Word File

Download the perfectly formatted MS Word file!

- or -

2.  Write a NEW paper for me!✍🏻

We'll follow your exact instructions!
Chat with the writer 24/7.

Data Mining in Business Research Executive Summary Term Paper

Data Mining Thesis

Data Mining in Healthcare Information Systems Case Study

Consumer Piracy Research Paper

Diamonds and Their Production Prospecting Mining Natural Sources Term Paper

View 200+ other related papers  >>

How to Cite "Text Mining" Term Paper in a Bibliography:

APA Style

Text Mining.  (2011, May 4).  Retrieved February 25, 2021, from https://www.essaytown.com/subjects/paper/text-mining/377122

MLA Format

"Text Mining."  4 May 2011.  Web.  25 February 2021. <https://www.essaytown.com/subjects/paper/text-mining/377122>.

Chicago Style

"Text Mining."  Essaytown.com.  May 4, 2011.  Accessed February 25, 2021.