Wine Quality Classification using Self-Organizing Maps (SOM)


Introduction

This project aims to classify the quality of wines using Self-Organizing Maps (SOM), an unsupervised machine learning algorithm. The dataset used for this analysis is the "Red Wine Quality" dataset sourced from the UCI Machine Learning Repository. The dataset contains various physicochemical properties of red wines, along with their quality ratings provided by experts.

Importance

Wine quality classification is essential for both wine producers and consumers. Accurately assessing wine quality based on its physicochemical attributes is crucial for production optimization, quality control, and informed consumer choices. Self-Organizing Maps offer an effective method for visualizing and clustering complex data, making them suitable for wine quality classification.

Data Description

The "Red Wine Quality" dataset consists of 1,599 instances of red wines, each described by 11 physicochemical attributes. These attributes include fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol content. The quality rating ranges from 0 to 10, representing wine quality on a discrete scale.
  1. Fixed acidity
  2. Volatile acidity
  3. Citric acid
  4. Residual sugar
  5. Chlorides
  6. Free sulfur dioxide
  7. Total sulfur dioxide
  8. Density
  9. pH
  10. Sulphates
  11. Alcohol
  12. Quality (score between 0 and 10)

Project Steps

Features Correlation

Analyzing the correlations, we can observe the following insights:
The expected outcome of this project is a wine quality classification model based on Self-Organizing Maps. This model will classify wines into different quality categories based on their physicochemical attributes. Additionally, the SOM visualization will provide a visual representation of the clusters and relationships within the dataset, aiding in the interpretation of wine quality patterns.

Orginal Dataset Statistics

An Analysis of Physicochemical Features of Not Quality Wine in the (1,6) Location on the Self-Organizing Map

The analysis of the physicochemical features of not quality wine samples at the (1,6) location on the Self-Organizing Map reveals several notable differences compared to the original dataset. These differences suggest that specific attributes play a significant role in determining the quality of wine. The not quality wine samples at the (1,6) location exhibit higher levels of fixed acidity, citric acid, residual sugar, and chlorides, while having lower levels of volatile acidity and alcohol. These findings indicate that higher levels of certain components, such as citric acid and residual sugar, and lower levels of volatile acidity and alcohol, may contribute to wines being classified as not of high quality. This analysis highlights the importance of understanding the impact of individual physicochemical attributes on wine quality and provides valuable insights for further investigation and improvement in winemaking processes.
Conclusions
By leveraging Self-Organizing Maps, this project will contribute to the wine industry by providing a data-driven approach to wine quality classification. The SOM-based model can assist wine producers in quality control processes, enabling them to optimize production and maintain consistent quality standards. Additionally, consumers can benefit from the model by making more informed decisions when selecting wines based on predicted quality.