Apple Ripeness Classification: Multi-Stage Detection with Haralick Texture Features

Python Computer Vision Texture Analysis Machine Learning Streamlit
Apple Classification Dashboard

Project Overview

The Apple Ripeness Classification project was my thesis work aimed at bringing objective measurement to what is typically a subjective process in agriculture. This computer vision system utilizes Haralick texture features and the K-Nearest Neighbors (KNN) algorithm to precisely classify apples into five distinct ripeness levels (20%, 40%, 60%, 80%, and 100%).

By analyzing subtle changes in apple skin texture, the system achieves over 96% accuracy in distinguishing between different ripeness stages - creating a reliable, consistent classification method that eliminates the subjectivity of human visual assessment.

The project utilizes a comprehensive dataset of 500 apple images (100 per ripeness level), providing a robust foundation for the machine learning model. The system's effectiveness demonstrates how texture analysis can provide high-accuracy classification even without relying on more resource-intensive deep learning approaches.

At the heart of the project is an interactive Streamlit dashboard that not only delivers real-time classification results but also visualizes the key texture features differentiating each ripeness stage. This makes the complex machine learning concepts accessible to agricultural professionals who may lack technical expertise in computer vision.

Key Features

  • High-Precision Classification: 96% accuracy in distinguishing between five distinct ripeness levels using texture-based analysis.
  • Interactive Parameter Exploration: Dynamic controls to adjust and optimize distance (d), angle (θ), and k values for the KNN algorithm.
  • Advanced Visualization Suite: Confusion matrix visualization showing classification performance across all 5 ripeness levels.
  • Comprehensive Performance Metrics: Detailed analytics including precision, recall, and F1-scores for each ripeness category.
  • PCA-based Nearest Neighbor Visualization: Intuitive graphical representation showing how KNN makes classification decisions.
  • Cross-validation Visualization: Radar charts and parallel coordinates plots for robust model validation.
  • Responsive Dashboard Design: User-friendly interface that works seamlessly across different devices.

Technologies Used

Python
Streamlit
Scikit-learn
Plotly
NumPy
Pandas
Scikit-image
PCA
K-fold CV
Pickle

Challenges & Solutions

Performance Bottlenecks

The initial dashboard design recalculated all features and classifications whenever a parameter was changed, causing significant performance lag that disrupted the user experience.

Solution:

Implemented strategic caching mechanisms that store intermediate results and only recalculate when necessary. This optimization reduced response time from several seconds to near-instantaneous, creating a smooth interactive experience even when rapidly exploring different parameter combinations.

Visualizing Complex ML Results

Creating intuitive visualizations that could effectively communicate complex machine learning concepts and results to non-technical stakeholders presented a significant challenge.

Solution:

Developed a suite of interactive Plotly charts that allow users to explore the data and results from multiple perspectives. The visualizations include tooltips, color-coding, and interactive elements that make complex concepts like classification boundaries and nearest neighbors more accessible and intuitive.

Responsive Dashboard Design

Creating a dashboard that maintained usefulness and readability across different screen sizes proved challenging due to the complexity of the visualizations and control panels.

Solution:

Utilized Streamlit's column system with dynamic width adjustments based on screen size. Implemented prioritized display logic that ensures critical visualizations are always visible while less essential elements adapt or reorganize based on available space, creating a consistent experience across devices.

Making KNN Intuitive

The K-Nearest Neighbors algorithm, while conceptually simple, operates in a high-dimensional space that is difficult to visualize and understand for users without a machine learning background.

Solution:

Developed a PCA-based visualization that projects the high-dimensional feature space onto two dimensions while preserving the relative distances between samples. This allowed users to visually understand how the algorithm makes classification decisions based on proximity to training examples.

Impact & Results

96%
Classification Accuracy
5
Ripeness Levels
500
Image Samples

The Apple Ripeness Classification system achieved an impressive 96% accuracy with optimal parameters (d=1, θ=45°, k=3), successfully distinguishing between five distinct ripeness levels (20%, 40%, 60%, 80%, and 100%). Particularly notable was the near-perfect classification for certain ripeness stages, especially the 20% and 60% ripeness categories.

Cross-validation testing confirmed the model's robustness, showing minimal variation across different data splits. This consistency demonstrates that the classification approach is stable and reliable, not merely benefiting from a fortuitous data distribution in the training set.

Beyond the technical achievements, this project has significant potential applications in agricultural sorting systems. By providing objective, consistent ripeness assessment, such a system could significantly reduce food waste through more precise harvest timing and optimized distribution planning based on ripeness levels.

Perhaps most importantly, this project successfully bridges the gap between advanced machine learning techniques and practical agricultural applications by creating an interactive tool that makes complex classification concepts accessible to non-technical stakeholders. This accessibility is crucial for the adoption of such technologies in traditional industries.

Future Improvements

The following enhancements are planned for the next iterations of this project:

  • Incorporating color features alongside texture analysis for potentially higher classification accuracy
  • Adapting the system for mobile devices to enable in-field use by farmers during harvest
  • Expanding the classification methodology to other fruits beyond apples
  • Optimizing the algorithm for real-time classification in industrial sorting and packing settings
  • Adding explanatory tooltips and contextual documentation to make the dashboard more self-explanatory

Learn More

Explore the technical details and implementation of this project: