Study on Text Mining of English Materials for Tourism Purposes
DOI:
https://doi.org/10.9734/bpi/niebm/v10/15881DKeywords:
Data mining, English style analysis, statistical analysis, text mining, tourismAbstract
In this paper, several English books on tourism are investigated, compared with journalism in terms of metrical linguistics. As a result, it is clearly shown that English materials for tourism have some interesting characteristics. Tourism knowledge has become increasingly significant and reading resources in English, which can be considered a world common language, has become essential. Reading the texts will be easier if the peculiarities of English in this sector are well understood. Several English novels on tourism are looked at and they are compared to journalism in terms of metrical linguistics in this article. In short, C++ software is used to explore the frequency characteristics of character- and word-appearance. An exponential function is used to approximate these qualities. In addition, the difficulty level and K-characteristic of each content are determined by comparing the percentage of Japanese junior high school necessary vocabulary to that of American basic vocabulary. In terms of character-appearance, it is clearly demonstrated that English materials for tourism have a similar tendency to literary compositions. Furthermore, the K-characteristic values for tourism materials are high, and older novels with a higher specialty are more difficult to read than journalism. The results of this study will be useful for identifying the genre of certain writings as tourism. In order to improve the reliability of identification, it is needed to accumulate the analysis results.