Tuesday, April 28, 2026

Kaggle Data: Plot(With Respect) of Breast Cancer Wisconsin Data Set

Scatter Plot: Mean Radius vs Mean Texture of Breast Cancer Wisconsin Data Set


This scatter plot visualizes the relationship between two important features from the Breast Cancer Wisconsin (Diagnostic) dataset: Mean Radius and Mean Texture of cell nuclei.
  • X-axis (Mean Radius): Represents the average size of the cell nuclei.
  • Y-axis (Mean Texture): Represents how much the cell nuclei vary in gray levels (a measure of texture or irregularity). Rougher / more varied appearance under the microscope (higher Mean Texture)
Each point on the plot represents one breast mass sample. The color indicates the diagnosis:
  • Red points = Malignant (cancerous)
  • Blue points = Benign (non-cancerous)
Key Observation: The plot clearly shows two distinct clusters. Malignant tumors (red) tend to appear in the upper-right region, indicating they generally have larger radius and higher texture values. In contrast, benign tumors (blue) mostly cluster in the lower-left area with smaller radius and smoother texture.This natural separation suggests that these two features (Mean Radius and Mean Texture) contain valuable information for distinguishing between cancerous and non-cancerous breast tumors. 


Python Code for the Plot
import osimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltfrom kaggle.api.kaggle_api_extended import KaggleApiimport zipfile
# ==================== YOUR KAGGLE CREDENTIALS ====================os.environ['KAGGLE_USERNAME'] = "xxxxxxxx"os.environ['KAGGLE_KEY'] = "KGAT_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Authenticateapi = KaggleApi()api.authenticate()
# Download and extract datasetapi.dataset_download_files("uciml/breast-cancer-wisconsin-data", path='.', unzip=False)
with zipfile.ZipFile('breast-cancer-wisconsin-data.zip', 'r') as zip_ref:    zip_ref.extractall('breast_cancer_data')
# Load datadf = pd.read_csv('breast_cancer_data/data.csv')
# ============================# Clean Scatter Plot with Correct Legend# ============================
plt.figure(figsize=(10, 8))
ax = sns.scatterplot(    data=df,    x='radius_mean',    y='texture_mean',    hue='diagnosis',    palette={'M': 'red', 'B': 'blue'},    alpha=0.8,    s=75)
plt.title("Mean Radius vs Mean Texture\nBreast Cancer Wisconsin (Diagnostic) Dataset",           fontsize=14, fontweight='bold', pad=20)
plt.xlabel("Mean Radius", fontsize=12)plt.ylabel("Mean Texture", fontsize=12)
# === Fix Legend Properly ===handles, labels = ax.get_legend_handles_labels()
# Map correct readable labelslabel_map = {'M': 'Malignant (M)', 'B': 'Benign (B)'}new_labels = [label_map[label] for label in labels]
ax.legend(handles, new_labels, title="Diagnosis", loc='upper left', fontsize=10)
plt.grid(True, alpha=0.3)plt.tight_layout()plt.show()

No comments:

Post a Comment