ConversationRelevancyMetric Removed in DeepEval 3.7.x, Need Official Replacement & Clarification

### Issue Summary
In DeepEval version 3.2.1, `ConversationRelevancyMetric` works correctly for multi-turn conversational evaluation.  
However, in DeepEval 3.7.x and above, this metric is no longer available, and the documentation does not provide any explanation or migration path.

I attempted to switch to `TurnRelevancyMetric`, but it produces **incorrect and inflated scores**, especially for **negative feedback evaluation datasets**.

### What I Expected
- Clear information on whether `ConversationRelevancyMetric` was deprecated or replaced.
- A recommended metric for multi-turn conversation relevancy (task coverage + relevance).
- Similar scoring behavior between multi-turn vs windowed-turn metrics.
- A migration guide or official note in the documentation.

### What Actually Happened
1. ImportError:
```python
ImportError: cannot import name 'ConversationRelevancyMetric' from 'deepeval.metrics'
````

2. `TurnRelevancyMetric` gives **very high scores (0.90+)** for negative samples, even when responses are clearly irrelevant.

3. No documentation explains:

   * why the metric was removed,
   * why TurnRelevancyMetric behaves differently,
   * how to replicate ConversationRelevancyMetric behavior in newer versions.

### Minimal Reproduction

```python
from deepeval.test_case import ConversationalTestCase, Turn
from deepeval.metrics import TurnRelevancyMetric

turns = [
    Turn(role="user", content="What is X?"),
    Turn(role="assistant", content="This is unrelated and incorrect."),
]

test_case = ConversationalTestCase(turns=turns)
metric = TurnRelevancyMetric()
result = metric.measure(test_case)
print(result)
```

This produces a relevance score that is unexpectedly high for irrelevant answers.

### Additional Context

* Positive feedback datasets → Both metrics behave similarly.
* **Negative feedback datasets → TurnRelevancyMetric produces inflated scores**, while ConversationRelevancyMetric (3.2.1) gives more realistic values.
* I need guidance from maintainers on which metric should be used for multi-turn conversation evaluation going forward.

###Key Questions

- Why does TurnRelevancyMetric score negative feedback so highly?
- Is there any proper replacement for ConversationRelevancyMetric in the version 3.7.x and above?
- Do you recommend implementing a custom metric to replicate the older behavior?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConversationRelevancyMetric Removed in DeepEval 3.7.x, Need Official Replacement & Clarification #2321

Issue Summary

What I Expected

What Actually Happened

Minimal Reproduction

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ConversationRelevancyMetric Removed in DeepEval 3.7.x, Need Official Replacement & Clarification #2321

Description

Issue Summary

What I Expected

What Actually Happened

Minimal Reproduction

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions