Wednesday, August 27, 2025

Just Own Research: Classification of Data using K-Nearest Neighbor Algorithm ~ Machine Learning

 

As show in the above graph, this test would like to find out whether the test point is nearer to Normal or Anamalous condition. Anomalous readings could trigger alerts or maintenance actions in a factory setting.

Explanation
  1. Synthetic Training Data:
    • Normal Data: 50 points centered at (30°C, 0.5°C/min), representing typical equipment operation (e.g., a motor running normally in a factory).
    • Anomalous Data: 50 points centered at (60°C, 1.0°C/min), indicating overheating or equipment stress.
    • Realism: These values are based on typical industrial temperature sensor ranges (e.g., thermocouples monitoring motors or HVAC systems), where normal temperatures are 20–40°C and anomalous ones exceed 50°C.
  2. Realistic Test Point:
    • Test Point: [42, 0.9] (42°C, 0.9°C/min).
    • Why Realistic:
      • 42°C is just above normal operating temperatures (e.g., a motor typically at 30°C), suggesting early signs of overheating but not yet critical (e.g., >50°C).
      • 0.9°C/min is a moderate rate of change, plausible for equipment under stress (e.g., a cooling system failing), making it an edge case that tests k-NN’s ability to distinguish between normal and anomalous conditions.
      • These values align with real-world factory scenarios where temperature sensors detect gradual failures.
  3. k-NN Classification:
    • Uses Euclidean distance to find the 3 nearest neighbors to the test point in the 2D space (temperature, rate of change).
    • Classifies the test point as “Normal” or “Anomalous” based on majority voting.
    • The test point [42, 0.9] is likely classified as “Normal” but is close to the anomalous cluster, making it a realistic and interesting case.

Python Code by Grok

import numpy as np
import matplotlib.pyplot as plt
from collections import Counter

# Generate synthetic temperature sensor data
np.random.seed(42)
# Normal data: 50 points (temperature in °C, rate of change in °C/min)
normal_data = np.random.normal(loc=[30, 0.5], scale=[5, 0.1], size=(50, 2))  # Centered at (30°C, 0.5°C/min)
# Anomalous data: 50 points (high temperature, higher rate of change)
anomalous_data = np.random.normal(loc=[60, 1.0], scale=[5, 0.1], size=(50, 2))  # Centered at (60°C, 1.0°C/min)
# Combine data and labels
X = np.vstack((normal_data, anomalous_data))
y = np.array([0] * 50 + [1] * 50)  # 0 = Normal, 1 = Anomalous

# Realistic test sensor reading (temperature, rate of change)
test_point = np.array([42, 0.9])  # Realistic: slightly high temp, moderate rate of change

# Euclidean distance function
def euclidean_distance(point1, point2):
    return np.sqrt(np.sum((point1 - point2) ** 2))

# k-NN classifier
def knn_predict(X_train, y_train, test_point, k=3):
    distances = [(euclidean_distance(test_point, x), y) for x, y in zip(X_train, y_train)]
    distances.sort()
    k_nearest_labels = [label for _, label in distances[:k]]
    return Counter(k_nearest_labels).most_common(1)[0][0]

# Classify the test point
k = 3
predicted_class = knn_predict(X, y, test_point, k)
class_name = "Normal" if predicted_class == 0 else "Anomalous"
print(f"Test temperature reading {test_point} is classified as: {class_name}")

# Plot 1: Temperature sensor data before classification
plt.figure(figsize=(8, 6))
plt.scatter(normal_data[:, 0], normal_data[:, 1], c='blue', label='Normal', alpha=0.6)
plt.scatter(anomalous_data[:, 0], anomalous_data[:, 1], c='red', label='Anomalous', alpha=0.6)
plt.xlabel('Temperature (°C)')
plt.ylabel('Rate of Change (°C/min)')
plt.title('Temperature Sensor Data Before Classification')
plt.legend()
plt.grid(True)
plt.show()

# Plot 2: Temperature sensor data with classification result
plt.figure(figsize=(8, 6))
plt.scatter(normal_data[:, 0], normal_data[:, 1], c='blue', label='Normal', alpha=0.6)
plt.scatter(anomalous_data[:, 0], anomalous_data[:, 1], c='red', label='Anomalous', alpha=0.6)
plt.scatter(test_point[0], test_point[1], c='green', marker='*', s=200, label=f'Test Point ({class_name})')
plt.xlabel('Temperature (°C)')
plt.ylabel('Rate of Change (°C/min)')
plt.title('Temperature Sensor Data Classification with k-NN (k=3)')
plt.legend()
plt.grid(True)
plt.show()

No comments:

Post a Comment