This dissertation is accessible only to the Illinois State University community.

  • Off-Campus ISU Users: To download this item, click the "Off-Campus Download" button below. You will be prompted to log in with your ISU ULID and password.
  • Non-ISU Users: Contact your library to request this item through interlibrary loan.

Date of Award


Document Type

Thesis-ISU Access Only

Degree Name

Master of Science (MS)


School of Information Technology: Information Systems

First Advisor

Yongning Tang


Exponential growth rates of learning materials and rapid distribution of those resources among e-learners via Internet have made it nearly infeasible to manually review each document and categorize it. In addition, the ability to classify objects into groups is of high importance in many applications like text retrieval, query search and learning recommender systems. This emerging need has re-emphasized the importance of automatic text classification systems in enabling us to classify semi-structured and unstructured text documents into predefined labeled groups. Specifically, if we can categorize learning resources based on their content (and not just based on subject topic or user declaration), this can help recommender systems and search engines to automatically build repositories of relevant documents and present to the users the most relevant ones based on query and/or user preferences.

Text classification poses many challenges for learning systems which now deal with huge numbers of texts of highly variable length, structure and content. The representing features should be carefully selected to capture the important semantics of text to yield an acceptable classification performance while keeping the computational cost within a practically reasonable range. The features must also be useful across a wide range of class definitions.

In this study, we investigate the applicability of text mining algorithms for categorizing different text-based educational resources into curriculum defined learning objectives. To do this, a variant of Term Frequency Inverse Document Frequency (TF-IDF) feature selection method along with a majority-voting-based classification system comprised of five different classical classifiers will be utilized. Three different knowledge domains with 65 learning objectives in total will be used in the experiments to evaluate the performance of the system. We will also study the effects of varying the number of features per each label on system performance.

To deal with the rapid dimensionality rise of the feature vector as the system is being extended (which introduces more computational burden and tends to limit the capacity of the system to scale up), we will propose a hierarchical multitier classification architecture that can outperform single-layer single-node classification system in terms of computational cost and scalability. A simple version of this scheme will be implemented and analyzed. We will experimentally show that this architecture needs less number of features per label in comparison to the single node classification system. Despite other advantages like easier scalability, and lower maintenance cost, this multi-layer architecture could suffer from higher initial setup cost.


Imported from ProQuest SoufizadehBalaneji_ilstu_0092N_11002.pdf


Page Count


Off-Campus Download