Evaluation Methodology:
- Goal: Remove the target concepts while preserving the model’s general and related knowledge, with minimal weight change.
- Primary Metric — Erasing-Retention-Robustness (ERR Score)
- Per-concept averages
\[ \overline{A_{fgt}} = \frac{1}{N_{img}N_{c_{tar}}}\sum_{i=1}^{N_{img}}\sum_{c=1}^{N_{c_{tar}}} A_{fgt}^{(i,c)} \] — average Erasing accuracy: how well the model suppresses each target concept.
\[ \overline{A_{ret}} = \frac{1}{N_{img}N_{c_{ret}}}\sum_{i=1}^{N_{img}}\sum_{c=1}^{N_{c_{ret}}} A_{ret}^{(i,c)} \] — average retention accuracy: how well unrelated knowledge is preserved.
\[ \overline{A_{adj}} = \frac{1}{N_{img}N_{c_{adj}}}\sum_{i=1}^{N_{img}}\sum_{c=1}^{N_{c_{adj}}} A_{adj}^{(i,c)} \] — average adjacent‐concept accuracy: how well concepts correlated with the targets stay intact.
\[ \overline{A_{ind}} = \frac{1}{N_{img}N_{c_{tar}}}\sum_{i=1}^{N_{img}}\sum_{c=1}^{N_{c_{tar}}} A_{ind}^{(i,c)} \] — average engineered-prompt robustness: resistance to prompts intentionally designed to resurface the targets.
\[ \overline{A_{adv}} = \frac{1}{N_{img}N_{c_{tar}}}\sum_{i=1}^{N_{img}}\sum_{c=1}^{N_{c_{tar}}} A_{adv}^{(i,c)} \] — average adversarial robustness: resistance to adversarial prompts that attempt to reveal the targets. - Final ERR Score (harmonic mean)
\[ \text{ERR} \;=\; \operatorname{HM}\bigl( \overline{A_{fgt}},\, \overline{A_{ret}},\, \overline{A_{adj}},\, \overline{A_{ind}},\, \overline{A_{adv}} \bigr). \] - Interpretation: Higher \(\overline{A_{fgt}}\) ⇒ better erasing; higher \(\overline{A_{ret}}\) and \(\overline{A_{adj}}\) ⇒ better retention; higher \(\overline{A_{ind}}\) and \(\overline{A_{adv}}\) ⇒ stronger robustness.
- Per-concept averages
- Tie-breaker — Weight-Change Ratio
\[ \frac{1}{N_{c}}\, \sum_{i=1}^{N_{c}} \lVert\theta_{\text{orig}}^{\,i}-\theta_{\text{un}}^{\,i}\rVert \;\Big/\; \text{Total parameters}, \] where \(\theta_{\text{orig}}\) and \(\theta_{\text{un}}\) are the original and un-learnt model weights. - Notation: \(N_{img}\) — images per concept | \(N_{c_{tar}}\) — target concepts | \(N_{c_{ret}}\) — retention-test concepts | \(N_{c_{adj}}\) — adjacent/correlated concepts.
Baselines
The following table presents the baseline scores for existing unlearning methods: ESD(Gandikota et.al (2023)),CA(Kumari et.al (2023)) and FMN(Zhang et.al (2023)) across various concepts, evaluated using ERR Score and L2 Norm metrics.
| Concept | Type | ESD | CA | FMN | FADE | ||||
|---|---|---|---|---|---|---|---|---|---|
| ERR Score | L2 Norm | ERR Score | L2 Norm | ERR Score | L2 Norm | ERR Score | L2 Norm | ||
| Barbeton Daisy | Object | 0.507 | 0.465 | 0.493 | 0.456 | 0.492 | 0.473 | 0.612 | 0.432 |
| Apple Fruit | Object | 0.472 | 0.493 | 0.417 | 0.455 | 0.494 | 0.472 | 0.584 | 0.441 |
| Golf Ball | Object | 0.431 | 0.529 | 0.502 | 0.468 | 0.474 | 0.487 | 0.598 | 0.436 |
| Blue Jay | Animal | 0.509 | 0.464 | 0.493 | 0.475 | 0.461 | 0.477 | 0.607 | 0.429 |
| Welsh Springer Spaniel | Animal | 0.489 | 0.479 | 0.501 | 0.472 | 0.472 | 0.489 | 0.591 | 0.443 |
| Van Gogh | Style | 0.432 | 0.528 | 0.524 | 0.453 | 0.496 | 0.475 | 0.625 | 0.427 |
| Doodle | Style | 0.476 | 0.491 | 0.498 | 0.471 | 0.469 | 0.329 | 0.603 | 0.439 |
| Neon | Style | 0.487 | 0.481 | 0.492 | 0.473 | 0.464 | 0.464 | 0.594 | 0.442 |
| Monet | Style | 0.511 | 0.463 | 0.524 | 0.453 | 0.498 | 0.468 | 0.619 | 0.431 |
| Sketch | Style | 0.484 | 0.483 | 0.509 | 0.464 | 0.481 | 0.482 | 0.606 | 0.438 |
| Wedding | Scene | 0.479 | 0.487 | 0.466 | 0.503 | 0.472 | 0.411 | 0.587 | 0.446 |
| Sunset | Scene | 0.397 | 0.459 | 0.416 | 0.444 | 0.431 | 0.473 | 0.563 | 0.452 |
| Rainfall | Scene | 0.443 | 0.519 | 0.417 | 0.543 | 0.467 | 0.494 | 0.578 | 0.449 |
| Aurora Borialis | Scene | 0.487 | 0.481 | 0.382 | 0.575 | 0.442 | 0.518 | 0.601 | 0.437 |
| Scenerie | Scene | 0.497 | 0.473 | 0.428 | 0.531 | 0.401 | 0.558 | 0.615 | 0.433 |
| Sleeping | Action | 0.482 | 0.485 | 0.475 | 0.487 | 0.402 | 0.509 | 0.589 | 0.441 |
| Walking | Action | 0.513 | 0.462 | 0.485 | 0.479 | 0.367 | 0.591 | 0.623 | 0.429 |
| Eating | Action | 0.463 | 0.501 | 0.473 | 0.495 | 0.413 | 0.546 | 0.608 | 0.436 |
| Dancing | Action | 0.432 | 0.528 | 0.451 | 0.441 | 0.433 | 0.516 | 0.597 | 0.439 |
| Jumping | Action | 0.496 | 0.474 | 0.417 | 0.443 | 0.402 | 0.498 | 0.616 | 0.434 |