Genμ Challenge (U&ME Workshop @ ICCV'2025)

Evaluation Methodology:

Goal: Remove the target concepts while preserving the model’s general and related knowledge, with minimal weight change.
Primary Metric — Erasing-Retention-Robustness (ERR Score)
- Per-concept averages
  \[ \overline{A_{fgt}} = \frac{1}{N_{img}N_{c_{tar}}}\sum_{i=1}^{N_{img}}\sum_{c=1}^{N_{c_{tar}}} A_{fgt}^{(i,c)} \] — average Erasing accuracy: how well the model suppresses each target concept.
  
  \[ \overline{A_{ret}} = \frac{1}{N_{img}N_{c_{ret}}}\sum_{i=1}^{N_{img}}\sum_{c=1}^{N_{c_{ret}}} A_{ret}^{(i,c)} \] — average retention accuracy: how well unrelated knowledge is preserved.
  
  \[ \overline{A_{adj}} = \frac{1}{N_{img}N_{c_{adj}}}\sum_{i=1}^{N_{img}}\sum_{c=1}^{N_{c_{adj}}} A_{adj}^{(i,c)} \] — average adjacent‐concept accuracy: how well concepts correlated with the targets stay intact.
  
  \[ \overline{A_{ind}} = \frac{1}{N_{img}N_{c_{tar}}}\sum_{i=1}^{N_{img}}\sum_{c=1}^{N_{c_{tar}}} A_{ind}^{(i,c)} \] — average engineered-prompt robustness: resistance to prompts intentionally designed to resurface the targets.
  
  \[ \overline{A_{adv}} = \frac{1}{N_{img}N_{c_{tar}}}\sum_{i=1}^{N_{img}}\sum_{c=1}^{N_{c_{tar}}} A_{adv}^{(i,c)} \] — average adversarial robustness: resistance to adversarial prompts that attempt to reveal the targets.
- Final ERR Score (harmonic mean)
  \[ \text{ERR} \;=\; \operatorname{HM}\bigl( \overline{A_{fgt}},\, \overline{A_{ret}},\, \overline{A_{adj}},\, \overline{A_{ind}},\, \overline{A_{adv}} \bigr). \]
- Interpretation: Higher \(\overline{A_{fgt}}\) ⇒ better erasing; higher \(\overline{A_{ret}}\) and \(\overline{A_{adj}}\) ⇒ better retention; higher \(\overline{A_{ind}}\) and \(\overline{A_{adv}}\) ⇒ stronger robustness.
Tie-breaker — Weight-Change Ratio
\[ \frac{1}{N_{c}}\, \sum_{i=1}^{N_{c}} \lVert\theta_{\text{orig}}^{\,i}-\theta_{\text{un}}^{\,i}\rVert \;\Big/\; \text{Total parameters}, \] where \(\theta_{\text{orig}}\) and \(\theta_{\text{un}}\) are the original and un-learnt model weights.
Notation: \(N_{img}\) — images per concept | \(N_{c_{tar}}\) — target concepts | \(N_{c_{ret}}\) — retention-test concepts | \(N_{c_{adj}}\) — adjacent/correlated concepts.

Baselines

The following table presents the baseline scores for existing unlearning methods: ESD(Gandikota et.al (2023)),CA(Kumari et.al (2023)) and FMN(Zhang et.al (2023)) across various concepts, evaluated using ERR Score and L2 Norm metrics.

Concept	Type	ESD		CA		FMN		FADE
		ERR Score	L2 Norm	ERR Score	L2 Norm	ERR Score	L2 Norm	ERR Score	L2 Norm
Barbeton Daisy	Object	0.507	0.465	0.493	0.456	0.492	0.473	0.612	0.432
Apple Fruit	Object	0.472	0.493	0.417	0.455	0.494	0.472	0.584	0.441
Golf Ball	Object	0.431	0.529	0.502	0.468	0.474	0.487	0.598	0.436
Blue Jay	Animal	0.509	0.464	0.493	0.475	0.461	0.477	0.607	0.429
Welsh Springer Spaniel	Animal	0.489	0.479	0.501	0.472	0.472	0.489	0.591	0.443
Van Gogh	Style	0.432	0.528	0.524	0.453	0.496	0.475	0.625	0.427
Doodle	Style	0.476	0.491	0.498	0.471	0.469	0.329	0.603	0.439
Neon	Style	0.487	0.481	0.492	0.473	0.464	0.464	0.594	0.442
Monet	Style	0.511	0.463	0.524	0.453	0.498	0.468	0.619	0.431
Sketch	Style	0.484	0.483	0.509	0.464	0.481	0.482	0.606	0.438
Wedding	Scene	0.479	0.487	0.466	0.503	0.472	0.411	0.587	0.446
Sunset	Scene	0.397	0.459	0.416	0.444	0.431	0.473	0.563	0.452
Rainfall	Scene	0.443	0.519	0.417	0.543	0.467	0.494	0.578	0.449
Aurora Borialis	Scene	0.487	0.481	0.382	0.575	0.442	0.518	0.601	0.437
Scenerie	Scene	0.497	0.473	0.428	0.531	0.401	0.558	0.615	0.433
Sleeping	Action	0.482	0.485	0.475	0.487	0.402	0.509	0.589	0.441
Walking	Action	0.513	0.462	0.485	0.479	0.367	0.591	0.623	0.429
Eating	Action	0.463	0.501	0.473	0.495	0.413	0.546	0.608	0.436
Dancing	Action	0.432	0.528	0.451	0.441	0.433	0.516	0.597	0.439
Jumping	Action	0.496	0.474	0.417	0.443	0.402	0.498	0.616	0.434