Skip to content

Conversation

@nnethercott
Copy link
Contributor

@nnethercott nnethercott commented Jun 9, 2025

Pull Request

Small bug fix & improvement.

The Document model stores twice as many fields needed through Document.__doc and doesn't properly implement __getattr__.

An example of an assert that fails :

doc = Document({"foo": "bar"})
assert getattr(doc, "foo") == "bar" # passes
assert doc.__getattr__("foo") == "bar" # fails, current implementation gives 'foo' again

Duplicated data

Currently dict(Document) gives this:

  {
    "_Document__doc": {
      "id": 5,
      "title": "Hard Times",
      "genres": [
        "Classics",
        "Fiction",
        "Victorian",
        "Literature"
      ],
      "publisher": "Penguin Classics",
      "language": "English",
      "author": "Charles Dickens",
      "description": "Hard Times is a novel of social protest which attacks utilitarianism, Bentham's theory which only considered the practical aspects of happiness and ignored emotional, spiritual and moral values. It is set in Coketown, a fictious industrial city in northern England in the mid-1800s.",
      "format": "Hardcover",
      "rating": 3
    },
    "id": 5,
    "title": "Hard Times",
    "genres": [
      "Classics",
      "Fiction",
      "Victorian",
      "Literature"
    ],
    "publisher": "Penguin Classics",
    "language": "English",
    "author": "Charles Dickens",
    "description": "Hard Times is a novel of social protest which attacks utilitarianism, Bentham's theory which only considered the practical aspects of happiness and ignored emotional, spiritual and moral values. It is set in Coketown, a fictious industrial city in northern England in the mid-1800s.",
    "format": "Hardcover",
    "rating": 3
  }

This PR removes the __doc field so that it now looks like:

  {
    "id": 5,
    "title": "Hard Times",
    "genres": [
      "Classics",
      "Fiction",
      "Victorian",
      "Literature"
    ],
    "publisher": "Penguin Classics",
    "language": "English",
    "author": "Charles Dickens",
    "description": "Hard Times is a novel of social protest which attacks utilitarianism, Bentham's theory which only considered the practical aspects of happiness and ignored emotional, spiritual and moral values. It is set in Coketown, a fictious industrial city in northern England in the mid-1800s.",
    "format": "Hardcover",
    "rating": 3
  }

What does this PR do?

  • Fixes the __getattr__ implementation for Document
  • Fixes erroneous test

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Summary by CodeRabbit

  • Bug Fixes

    • Improved attribute access in documents to return actual values instead of attribute names.
    • Iterating over documents now correctly yields key-value pairs from the document data.
  • Tests

    • Added a new test to verify correct document initialization and conversion.
    • Updated existing tests to reflect improved attribute access and iteration behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 9, 2025

Warning

Rate limit exceeded

@brunoocasali has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 23 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 0fdd5b8 and 4cb403c.

📒 Files selected for processing (2)
  • meilisearch/models/document.py (1 hunks)
  • tests/models/test_document.py (1 hunks)

Walkthrough

The Document class in the Meilisearch models was refactored to store its data directly in the instance's __dict__ instead of a private dictionary. Attribute access now returns actual values or raises errors if missing. Iteration yields key-value pairs from the instance dictionary. Tests were added and updated to verify initialization, attribute access, and iteration behavior.

Changes

File(s) Change Summary
meilisearch/models/document.py Refactored Document to use instance __dict__ for data storage, updated __getattr__ to return values or raise AttributeError, and adjusted iteration to yield key-value pairs.
tests/models/test_document.py Added test for Document initialization, corrected attribute access test to check returned values, and updated iteration test to verify key-value pairs from the instance dictionary.

Poem

A hop and a skip, the Document’s new trick—
No more secrets in a private dict!
Now attributes shine, clear and direct,
With tests to ensure all works as we expect.
In fields of code, a bunny leaps,
Where data flows and logic keeps! 🐇

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Member

@brunoocasali brunoocasali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nnethercott thanks for your PR!

I'm not sure if I got the idea behind this part:

This PR removes the __doc field so that it now looks like:

So it means this introduces a breaking-change?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pyproject.toml (1)

117-117: Consider improving exception handling instead of disabling the warning.

While disabling the "raise-missing-from" warning is acceptable, consider whether the exception handling in the Document class could be improved to follow Python best practices for exception chaining.

tests/models/test_document.py (1)

28-28: LGTM: Verifies correct iteration behavior.

The updated assertion correctly checks that iterating over a Document yields key-value pairs from the internal dictionary.

Minor nitpick: There's a typo in the test data - "fiels2" should probably be "field2". If this is intentional for testing purposes, consider adding a comment to clarify.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01483a3 and 2030e1b.

📒 Files selected for processing (3)
  • meilisearch/models/document.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/models/test_document.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/models/test_document.py (2)
meilisearch/models/document.py (1)
  • Document (4-15)
tests/models/test_index.py (1)
  • test_getattr (8-10)
🪛 Ruff (0.11.9)
meilisearch/models/document.py

12-12: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (3)
meilisearch/models/document.py (1)

6-6: LGTM: Eliminates duplicate data storage.

This change correctly addresses the duplicate data storage issue by updating the instance's __dict__ directly instead of maintaining a separate __doc attribute.

tests/models/test_document.py (2)

9-12: LGTM: Good test coverage for the refactored initialization.

This new test properly verifies that the Document class correctly initializes from a dictionary and can be converted back to a dictionary, ensuring the refactoring works as expected.


17-17: LGTM: Fixes the erroneous assertion.

This change correctly fixes the test to assert the actual attribute value ("test 1") instead of the attribute name ("field1"), aligning with the corrected __getattr__ implementation.

Comment on lines 8 to 12
try:
return self.__dict__[attr]
except Exception as _:
raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve type annotation and exception handling.

The __getattr__ method fixes the core issue but has room for improvement:

  1. Return type should be Any since attributes can be of any type
  2. Exception handling should be more specific and follow best practices
-    def __getattr__(self, attr: str) -> str:
+    def __getattr__(self, attr: str) -> Any:
         try:
             return self.__dict__[attr]
-        except Exception as _:
-            raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")
+        except KeyError:
+            raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}") from None

This addresses the static analysis hint about exception chaining and makes the exception handling more precise.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __getattr__(self, attr: str) -> str:
if attr in self.__doc.keys():
return attr
raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")
try:
return self.__dict__[attr]
except Exception as _:
raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}")
def __getattr__(self, attr: str) -> Any:
try:
return self.__dict__[attr]
except KeyError:
raise AttributeError(f"{self.__class__.__name__} object has no attribute {attr}") from None
🧰 Tools
🪛 Ruff (0.11.9)

12-12: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In meilisearch/models/document.py around lines 8 to 12, update the __getattr__
method to have a return type of Any instead of str to reflect that attributes
can be any type. Also, replace the broad except Exception clause with a more
specific KeyError exception to catch only missing dictionary keys. When raising
the AttributeError, use "from None" to suppress exception chaining and follow
best practices for clearer error handling.

@nnethercott
Copy link
Contributor Author

nnethercott commented Jun 9, 2025

So it means this introduces a breaking-change?

Kind of a grey area tbh.

__doc was a private attribute before so users would get an AttributeError if they did a document.__doc. The issue though is that this contract wasn't really respected by the previous Document.__iter__ implementation since iter(document) returns the contents of __doc along with the other fields :(

Since this PR changes the dict representation I guess it's safer to label it as a breaking-change !

@brunoocasali
Copy link
Member

brunoocasali commented Jun 9, 2025

Thanks a lot for your answer @nnethercott 🙏

Can you create a small migration guide, so I can inform other users within the release changelog? You can update the PR description and I copy from there.

@nnethercott
Copy link
Contributor Author

@brunoocasali sure thing !

🛠 migration guide

what changed

  • The internal __doc field in Document has been removed to avoid data duplication.
  • Document.__getattr__ now correctly returns attribute values rather than the attribute name itself.

impact

  • Code relying on internal access to document._Document__doc (e.g. through iterators) will break.
  • Behavior of doc.getattr("field") is now consistent and correct.

what you need to do:

  • If you were manually accessing document._Document__doc, switch to using direct attribute access or casting with dict(doc).

example updated usage

doc = index.get_document("foo")
doc_dict = dict(doc) 
doc_dict.pop("_Document__doc")  # ❌ Not needed anymore

@nnethercott nnethercott requested a review from brunoocasali June 10, 2025 08:05
@brunoocasali brunoocasali enabled auto-merge June 20, 2025 00:40
@brunoocasali brunoocasali disabled auto-merge June 20, 2025 00:45
@brunoocasali brunoocasali force-pushed the improve-document-serde branch from 0fdd5b8 to 4cb403c Compare June 20, 2025 00:46
@brunoocasali brunoocasali enabled auto-merge June 20, 2025 00:46
@brunoocasali brunoocasali added enhancement New feature or request breaking-change The related changes are breaking for the users labels Jun 20, 2025
@brunoocasali brunoocasali added this pull request to the merge queue Jun 20, 2025
Merged via the queue into meilisearch:main with commit 2b436e9 Jun 20, 2025
10 of 11 checks passed
@nnethercott nnethercott deleted the improve-document-serde branch June 20, 2025 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change The related changes are breaking for the users enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants