Skip to content

feat(examples): add richer evaluation metrics to advanced-pytorch#6713

Open
SalimELMARDI wants to merge 3 commits intoflwrlabs:mainfrom
SalimELMARDI:feat/advanced-pytorch-rich-metrics
Open

feat(examples): add richer evaluation metrics to advanced-pytorch#6713
SalimELMARDI wants to merge 3 commits intoflwrlabs:mainfrom
SalimELMARDI:feat/advanced-pytorch-rich-metrics

Conversation

@SalimELMARDI
Copy link

@SalimELMARDI SalimELMARDI commented Mar 7, 2026

Issue

Description

The advanced-pytorch example currently reports only basic evaluation metrics (loss/accuracy on server-side and eval_loss/eval_acc on client-side).
This limits visibility into ranking quality and class-level behavior during federated runs.

Related issues/PRs

Supersedes the earlier quickstart-focused PR : #6638 (moved to advanced-pytorch based on @chongshenng feedback).

Proposal

Explanation

This PR extends evaluation reporting in examples/advanced-pytorch while keeping existing metric keys backward-compatible.

Changes:

  • Updated examples/advanced-pytorch/pytorch_example/task.py:
    • Extended test(...) to compute:
      • top-1 accuracy (existing behavior)
      • top-3 accuracy
      • per-class top-1 accuracy for Fashion-MNIST (class_accuracy_0 ... class_accuracy_9)
    • Uses torch.bincount-based accumulation for per-class stats.
  • Updated examples/advanced-pytorch/pytorch_example/client_app.py:
    • Kept existing eval_loss and eval_acc
    • Added eval_acc_top3
    • Added eval_acc_class_0 ... eval_acc_class_9
  • Updated examples/advanced-pytorch/pytorch_example/server_app.py:
    • Kept existing loss and accuracy
    • Added accuracy_top3
    • Added accuracy_class_0 ... accuracy_class_9

Validation:

  • Ran a 1-round simulation locally:
    • flwr run .\examples\advanced-pytorch --stream --run-config "num-server-rounds=1 fraction-train=0.2 fraction-evaluate=0.2"
  • Confirmed new client and server metrics are present in logs.

Checklist

  • Implement proposed change
  • Write tests
  • Update documentation
  • Make CI checks pass
  • Ping maintainers on Slack (channel #contributions)

Any other comments?

No API-breaking changes. Existing metric keys were preserved for compatibility.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the examples/advanced-pytorch evaluation reporting so federated runs expose richer quality signals (top-k and per-class accuracy) while keeping existing metric keys intact (accuracy/loss server-side, eval_acc/eval_loss client-side).

Changes:

  • Extend test(...) to compute top-3 accuracy and per-class top-1 accuracy (Fashion-MNIST, 10 classes).
  • Emit the new metrics from both ClientApp.evaluate and server-side centralized evaluation.
  • Introduce a shared NUM_CLASSES constant to drive per-class metric generation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
examples/advanced-pytorch/pytorch_example/task.py Computes top-3 and per-class accuracies and returns a metrics dict from test(...).
examples/advanced-pytorch/pytorch_example/client_app.py Adds eval_acc_top3 and eval_acc_class_{i} metrics while preserving existing keys.
examples/advanced-pytorch/pytorch_example/server_app.py Adds accuracy_top3 and accuracy_class_{i} metrics while preserving existing keys.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added the Contributor Used to determine what PRs (mainly) come from external contributors. label Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Contributor Used to determine what PRs (mainly) come from external contributors.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants