Study on Difficulty-Level Classification for English Writings
DOI:
https://doi.org/10.9734/bpi/crlle/v4/3409EKeywords:
Accuracy, difficulty-level, F-measure, machine learningAbstract
This study extracts eleven types of attribute from English text data, with the aim of classifying English text according to level of difficulty by learning and categorization. Using the method of “leave-one-out cross-validation,” text is subjected to machine learning and categorization. E-books have recently gained in popularity. As the quantity of e-books grows, the effort of manually categorising all of them takes a long time. When English sentences are classified according to their difficulty level, it is possible to recommend a foreign-language book that is appropriate for the reader's level of English proficiency. In order to improve accuracy, furthermore, an experiment is carried out in which the size of text data is varied, and the attribute selection method is implemented. As a result, accuracy is improved to 77.04%, and F-measure to 63.96%. In addition, erroneous identification resulting from the impact of columns between sentences is also noted.