Scatter Plot: Mean Radius vs Mean Texture of Breast Cancer Wisconsin Data Set
This scatter plot visualizes the relationship between two important features from the Breast Cancer Wisconsin (Diagnostic) dataset: Mean Radius and Mean Texture of cell nuclei.
Python Code for the Plot
import osimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltfrom kaggle.api.kaggle_api_extended import KaggleApiimport zipfile
# ==================== YOUR KAGGLE CREDENTIALS ====================os.environ['KAGGLE_USERNAME'] = "xxxxxxxx"os.environ['KAGGLE_KEY'] = "KGAT_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Authenticateapi = KaggleApi()api.authenticate()
# Download and extract datasetapi.dataset_download_files("uciml/breast-cancer-wisconsin-data", path='.', unzip=False)
with zipfile.ZipFile('breast-cancer-wisconsin-data.zip', 'r') as zip_ref: zip_ref.extractall('breast_cancer_data')
# Load datadf = pd.read_csv('breast_cancer_data/data.csv')
# ============================# Clean Scatter Plot with Correct Legend# ============================
plt.figure(figsize=(10, 8))
ax = sns.scatterplot( data=df, x='radius_mean', y='texture_mean', hue='diagnosis', palette={'M': 'red', 'B': 'blue'}, alpha=0.8, s=75)
plt.title("Mean Radius vs Mean Texture\nBreast Cancer Wisconsin (Diagnostic) Dataset", fontsize=14, fontweight='bold', pad=20)
plt.xlabel("Mean Radius", fontsize=12)plt.ylabel("Mean Texture", fontsize=12)
# === Fix Legend Properly ===handles, labels = ax.get_legend_handles_labels()
# Map correct readable labelslabel_map = {'M': 'Malignant (M)', 'B': 'Benign (B)'}new_labels = [label_map[label] for label in labels]
ax.legend(handles, new_labels, title="Diagnosis", loc='upper left', fontsize=10)
plt.grid(True, alpha=0.3)plt.tight_layout()plt.show()
- X-axis (Mean Radius): Represents the average size of the cell nuclei.
- Y-axis (Mean Texture): Represents how much the cell nuclei vary in gray levels (a measure of texture or irregularity). Rougher / more varied appearance under the microscope (higher Mean Texture)
- Red points = Malignant (cancerous)
- Blue points = Benign (non-cancerous)
Python Code for the Plot
import osimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltfrom kaggle.api.kaggle_api_extended import KaggleApiimport zipfile
# ==================== YOUR KAGGLE CREDENTIALS ====================os.environ['KAGGLE_USERNAME'] = "xxxxxxxx"os.environ['KAGGLE_KEY'] = "KGAT_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Authenticateapi = KaggleApi()api.authenticate()
# Download and extract datasetapi.dataset_download_files("uciml/breast-cancer-wisconsin-data", path='.', unzip=False)
with zipfile.ZipFile('breast-cancer-wisconsin-data.zip', 'r') as zip_ref: zip_ref.extractall('breast_cancer_data')
# Load datadf = pd.read_csv('breast_cancer_data/data.csv')
# ============================# Clean Scatter Plot with Correct Legend# ============================
plt.figure(figsize=(10, 8))
ax = sns.scatterplot( data=df, x='radius_mean', y='texture_mean', hue='diagnosis', palette={'M': 'red', 'B': 'blue'}, alpha=0.8, s=75)
plt.title("Mean Radius vs Mean Texture\nBreast Cancer Wisconsin (Diagnostic) Dataset", fontsize=14, fontweight='bold', pad=20)
plt.xlabel("Mean Radius", fontsize=12)plt.ylabel("Mean Texture", fontsize=12)
# === Fix Legend Properly ===handles, labels = ax.get_legend_handles_labels()
# Map correct readable labelslabel_map = {'M': 'Malignant (M)', 'B': 'Benign (B)'}new_labels = [label_map[label] for label in labels]
ax.legend(handles, new_labels, title="Diagnosis", loc='upper left', fontsize=10)
plt.grid(True, alpha=0.3)plt.tight_layout()plt.show()
No comments:
Post a Comment