MultinomialNB Word Analysis

Multi-Word Ambiguity Resolution

Project Description

In this project, I used the Naive Bayes algorithm to address the multi-word ambiguity problem. In many languages, the same word can have multiple meanings, creating a significant challenge in natural language processing (NLP). For instance, the word 'sağ' can refer to direction or health. To resolve such ambiguities, I developed a specialized model.

Dataset Preparation

For this project, I prepared two different datasets containing words like 'sağ' and 'dil'. In these datasets, I classified the sentences based on the context of these words, determining whether 'sağ' referred to direction or health. The data was organized in Excel files, and using this data, I developed a model to resolve the ambiguity.

Model Setup

After preparing the dataset, I converted the sentences into numerical data by vectorizing them with the TF-IDF (Term Frequency-Inverse Document Frequency) method. This method evaluates the importance of each word in the sentence. Then, I used the Naive Bayes algorithm to train the model on the training dataset.

Model Training and Evaluation

Using the training data, I created a Naive Bayes classifier model, which was then tested on the test data. The results were quite successful; the model made accurate predictions to resolve the ambiguity in the given sentences. The accuracy on the test data was 73.6%, which indicates that the model is highly effective at resolving word ambiguity in sentences. The F1 score was calculated to be 0.693, an important metric showing the model's success in correct classification.

Conclusion

This project is an important step as an example of a system developed using the Naive Bayes algorithm to resolve multi-word ambiguity in language. The datasets I created for the words 'sağ' and 'dil' show how such problems can be addressed and what methods can be used for correct classification. The model’s accuracy and success demonstrate its potential application to more complex language problems in the future.

Technologies Used

Python

Scikit-learn

KNN

Pandas

Spotify API

Git

View Code