Skip to content

[GOOD FIRST ISSUE] Refactor DataSet class: Move large methods into smaller helper functionsΒ #183

@raphael-intugle

Description

@raphael-intugle

name: Good First Issue
about: A beginner-friendly task perfect for first-time contributors
title: '[GOOD FIRST ISSUE] Refactor DataSet class: Move large methods into smaller helper functions'
labels: 'good first issue'
assignees: ''


Welcome! πŸ‘‹

This is a beginner-friendly issue perfect for first-time contributors to the Intugle project. We've designed this task to help you get familiar with our codebase while making a meaningful contribution.

Task Description

The DataSet class in src/intugle/analysis/models.py contains several large methods (e.g., profile_columns, identify_keys, generate_glossary) that perform multiple steps in a single function. For better readability, maintainability, and testability, we want to refactor these methods by breaking them down into smaller, well-named helper functions.

Your task:

  • Identify at least one large method in the DataSet class (e.g., profile_columns, identify_keys, or generate_glossary).
  • Refactor it by extracting logical blocks into private helper methods (e.g., _collect_column_profiles, _build_column_profiles_df, etc.).
  • Replace the in-method code with calls to these new helper methods.
  • Ensure the main method remains concise and easy to read.

Why This Matters

  • Improves code readability and maintainability for all contributors.
  • Makes it easier to test and debug smaller, focused functions.
  • Helps new contributors understand the codebase faster.

What You'll Learn

  • How to refactor large Python methods into smaller, reusable functions
  • Best practices for code organization and readability

Step-by-Step Guide

Prerequisites

  • Python 3.10+ installed
  • Git basics (clone, commit, push, pull request)
  • Read our CONTRIBUTING.md guide

Setup Instructions

  1. Fork and clone the repository

    git clone https://github.com/YOUR_USERNAME/data-tools.git
    cd data-tools
  2. Create a virtual environment

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install dependencies

    pip install -e ".[dev]"
  4. Create a new branch

    git checkout -b fix/issue-NUMBER-refactor-dataset-methods

Implementation Steps

  1. Open src/intugle/analysis/models.py and locate the DataSet class.
  2. Choose one of the larger methods (e.g., profile_columns, identify_keys, or generate_glossary).
  3. Identify logical blocks within the method that can be separated (e.g., collecting column data, building DataFrames, updating attributes).
  4. Move these blocks into private helper methods (prefix with _), placing them as methods of the DataSet class.
  5. Replace the original code in the main method with calls to these new helper methods.
  6. Ensure all existing tests pass and that the refactored code behaves identically.

Files to Modify

  • File: src/intugle/analysis/models.py
    • Change: Refactor at least one large method in the DataSet class by extracting helper functions.
    • Line(s): For example, profile_columns (around line 120-150), identify_keys (around line 180-220), or generate_glossary (around line 260-300).

Testing Your Changes

# Run tests
pytest tests/

# Or run specific test
pytest tests/test_analysis_models.py

Submitting Your Work

  1. Commit your changes

    git add .
    git commit -m "Refactor DataSet method(s) into smaller helper functions"
  2. Push to your fork

    git push origin fix/issue-NUMBER-refactor-dataset-methods
  3. Create a Pull Request

    • Go to the original repository
    • Click "Pull Requests" β†’ "New Pull Request"
    • Select your branch
    • Fill out the PR template
    • Reference this issue with "Fixes #ISSUE_NUMBER"

Example Code

# Before
def profile_columns(self) -> 'DataSet':
    # ... (long method with multiple steps)

# After
def profile_columns(self) -> 'DataSet':
    self._collect_column_profiles()
    return self

def _collect_column_profiles(self):
    # ... (code moved from profile_columns)

Expected Outcome

  • The chosen method in DataSet is now concise and calls one or more private helper methods.
  • The helper methods are well-named and encapsulate logical sub-tasks.
  • All tests pass and there is no change in functionality.

Definition of Done

  • Code changes implemented
  • Tests added/updated
  • Tests passing locally
  • Code follows project style guidelines
  • No new linter warnings
  • Documentation updated (if needed)
  • Pull request submitted

Resources

Need Help?

Don't hesitate to ask questions! We're here to help you succeed.

Skills You'll Use

  • Python basics
  • Git and GitHub
  • Testing with pytest (optional)
  • Other: Refactoring, code organization

Thank you for contributing to Intugle!

Tips for Success:

  • Take your time and read through everything carefully
  • Don't be afraid to ask questions
  • Test your changes before submitting
  • Have fun! πŸŽ‰

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions