Sessi 14 — Etika AI untuk NLP

Tujuan: mengenali & mengukur bias, menerapkan metrik keadilan (fairness), menjelaskan prediksi (explainability), serta mengelola privasi & kebocoran data dalam proyek NLP.

Learning Outcomes: (1) Menghitung metrik fairness per‑kelompok; (2) Memeriksa calibration per‑kelompok; (3) Menjelaskan model dengan koefisien/SHAP; (4) Mengidentifikasi potensi pelanggaran privasi & mengurangi kebocoran data.

1) Konsep Inti

  • Bias data: ketidakseimbangan representasi/label yang menghasilkan perilaku tidak adil.
  • Fairness metrics: Demographic Parity (selection rate), Equal Opportunity (TPR parity), Equalized Odds (TPR & FPR parity).
  • Calibration: skor probabilitas konsisten antar‑kelompok (uji Brier/ELCE sederhana).
  • Explainability: koefisien fitur (model linier), SHAP/LIME untuk penjelasan lokal.
  • Privacy: data minimization, k‑anonymity, deteksi kebocoran (data leakage) & train/test contamination.

2) Praktik Google Colab — Audit Fairness & Explainability

Kita gunakan model dari Sessi 7/10 (LogReg pada TF–IDF). Tambahkan sensitive attribute sederhana untuk simulasi, atau muat dari file meta_labels.csv jika tersedia.

A. Setup & Data

!pip -q install pandas numpy scikit-learn matplotlib shap

import numpy as np, pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, brier_score_loss, roc_auc_score
import matplotlib.pyplot as plt

# 1) Muat data berlabel
try:
    df = pd.read_csv('logreg_dataset_sessi7.csv')  # kolom: text, y
except:
    base = pd.read_csv('corpus_sessi3_variants.csv')['v2_stop_stemID'].dropna().astype(str).tolist()
    POS = {"bagus","mantap","menyenangkan","cepat","baik","excellent","impressive","friendly","tajam","bersih"}
    NEG = {"buruk","lambat","telat","downtime","lemah","weak","late","dented","smelled","failed","delay"}
    def weak_label(t):
        w=set(t.split()); p=len(w & POS); n=len(w & NEG)
        if p>n: return 1
        if n>p: return 0
        return None
    df = pd.DataFrame({'text':base, 'y':[weak_label(t) for t in base]}).dropna()

# 2) Tambahkan sensitive attribute (simulasi / muat dari meta)
try:
    meta = pd.read_csv('meta_labels.csv')  # kolom: id(optional), group
    df = df.reset_index(drop=True)
    df['group'] = meta['group'][:len(df)].fillna('A')
except:
    # Heuristik: kelompok A jika mengandung kata2 tertentu, else B (sekadar simulasi)
    KEWA = {"ibu","mbak","sista"}
    KEWB = {"bapak","mas","bro"}
    def assign_group(t):
        w=set(t.split())
        if len(w & KEWA)>0: return 'A'  # contoh: female-coded terms
        if len(w & KEWB)>0: return 'B'  # contoh: male-coded terms
        return np.random.choice(['A','B'])
    df['group'] = df['text'].apply(assign_group)

print(df['group'].value_counts())

# 3) Split & Vectorize
X_train, X_test, y_train, y_test, g_train, g_test = train_test_split(
    df['text'], df['y'].astype(int), df['group'], test_size=0.3, stratify=df[['y','group']], random_state=42)

vec = TfidfVectorizer(ngram_range=(1,2), min_df=2, max_df=0.95, sublinear_tf=True, norm='l2')
Xtr = vec.fit_transform(X_train)
Xte = vec.transform(X_test)

clf = LogisticRegression(max_iter=300, class_weight='balanced')
clf.fit(Xtr, y_train)
proba = clf.predict_proba(Xte)[:,1]
yhat  = (proba>=0.5).astype(int)

print('Overall Test Report')
print(classification_report(y_test, yhat, digits=3))

B. Metrik Fairness per‑Kelompok

import numpy as np
from sklearn.metrics import confusion_matrix

def group_metrics(y_true, y_prob, y_pred, group):
    rows=[]
    for g in sorted(np.unique(group)):
        m = group==g
        yt, yp, yb = y_true[m], y_pred[m], y_prob[m]
        tn, fp, fn, tp = confusion_matrix(yt, yb).ravel()
        sel_rate = (yb==1).mean()                       # Demographic parity proxy
        tpr = tp/(tp+fn+1e-12)                           # Equal opportunity
        fpr = fp/(fp+tn+1e-12)
        brier = brier_score_loss(yt, yb)                 # calibration proxy @0/1
        auc = roc_auc_score(yt, y_prob) if len(np.unique(yt))>1 else np.nan
        rows.append([g, sel_rate, tpr, fpr, brier, auc, tp, fp, fn, tn, len(yt)])
    import pandas as pd
    return pd.DataFrame(rows, columns=['group','selection_rate','TPR','FPR','Brier','ROC_AUC','TP','FP','FN','TN','N'])

per_group = group_metrics(y_test.values, proba, yhat, g_test.values)
print(per_group)

# Disparitas
dp_diff  = per_group['selection_rate'].max() - per_group['selection_rate'].min()
eo_diff  = per_group['TPR'].max() - per_group['TPR'].min()
eo2_diff = (per_group['FPR'].max() - per_group['FPR'].min())
print(f'ΔDemographicParity={dp_diff:.3f}  ΔTPR={eo_diff:.3f}  ΔFPR={eo2_diff:.3f}')

C. Tuning Ambang per‑Kelompok (Equal Opportunity)

# Cari threshold per‑kelompok untuk menyamakan TPR target
import numpy as np

def find_threshold_for_tpr(y_true, y_prob, target_tpr=0.8):
    thr_list = np.linspace(0.1,0.9,41)
    best=0.5; best_gap=1
    for t in thr_list:
        yb = (y_prob>=t).astype(int)
        tn, fp, fn, tp = confusion_matrix(y_true, yb).ravel()
        tpr = tp/(tp+fn+1e-12)
        if abs(tpr-target_tpr)=thr_A).astype(int), (proba>=thr_B).astype(int))
per_group_adj = group_metrics(y_test.values, proba, yhat_adj, g_test.values)
print('\nSesudah threshold per‑kelompok:')
print(per_group_adj)

D. Explainability: Koefisien & SHAP

# Koefisien fitur (global)
feat = vec.get_feature_names_out()
coef = clf.coef_.ravel()
ix_pos = coef.argsort()[::-1][:20]
ix_neg = coef.argsort()[:20]
print('Top + fitur:', [(feat[i], round(float(coef[i]),3)) for i in ix_pos])
print('Top - fitur:', [(feat[i], round(float(coef[i]),3)) for i in ix_neg])

# SHAP (penjelasan lokal untuk 20 contoh)
import shap
explainer = shap.LinearExplainer(clf, Xtr, feature_dependence="independent")
shap_values = explainer.shap_values(Xte[:20])
# Ringkas nilai absolut rata2
import numpy as np
abs_mean = np.abs(shap_values).mean(axis=0)
ix = abs_mean.argsort()[::-1][:20]
print('Top SHAP fitur:', [(feat[i], round(float(abs_mean[i]),4)) for i in ix])

E. Cek Kebocoran & Privacy Quick‑Wins

# 1) Cek duplikasi train↔test (kebocoran sederhana)
tr_set = set(map(str, X_train))
leak = sum(1 for t in X_test if str(t) in tr_set)
print('Duplikasi train→test (indikasi kebocoran):', leak)

# 2) Masking PII sederhana (email/telepon) sebelum vektorisasi
import re
PII_EMAIL = re.compile(r"[\w\.-]+@[\w\.-]+")
PII_PHONE = re.compile(r"\b\+?\d[\d\-\s]{7,}\b")

def mask_pii(text):
    text = PII_EMAIL.sub('  ', text)
    text = PII_PHONE.sub('  ', text)
    return text

print('PII masking contoh:', mask_pii('hubungi saya di agus@kampus.id atau +62-812-1234-5678'))

F. Simpan Artefak Audit

import joblib
joblib.dump({'per_group_before':per_group, 'per_group_after':per_group_adj}, 'fairness_report_sessi14.joblib')
joblib.dump({'vec':vec,'clf':clf}, 'tfidf_logreg_sessi14.joblib')
print('Tersimpan: fairness_report_sessi14.joblib, tfidf_logreg_sessi14.joblib')

3) Studi Kasus & Analisis

KasusMasalah EtikaMitigasiCatatan
Moderasi komentarFalse positive lebih tinggi pada kelompok tertentuThreshold per‑kelompok + audit berkalaPantau ΔTPR/ΔFPR
Screening tiketDemographic parity bias (selection rate timpang)Reweighting/Resampling + atur ambangEvaluasi dampak layanan
Analisis sentimenFitur spurious (kata slang daerah)Kurasi stopword domain, SHAP auditRe‑training setelah perbaikan

4) Checklist Praktikum Etika

5) Tugas Mini (Dinilai)

  1. Bangun laporan fairness (tabel per‑kelompok + Δparity) untuk model LogReg TF–IDF Anda.
  2. Lakukan tuning ambang per‑kelompok untuk menyeimbangkan TPR (target 0.8) dan laporkan dampaknya terhadap FPR & AP.
  3. Gunakan SHAP pada 20 contoh untuk menemukan 5 fitur paling berpengaruh; audit 5 kesalahan terbesar dan usulkan perbaikan data.