Skip to content

Lowercase example seems needlessly complex #3

@Justin-Fisher

Description

@Justin-Fisher

In the intro_to_working_with_text the lowercase example splits each line into separate words, turns each word lowercase with .lower, and then rejoins the words into a new string.

data['Story'] = data['Story'].apply(lambda x: " ".join(x.lower() for x in x.split()))

You could instead get a very similar result by applying .lower to the whole strings which would be simpler and more novice-friendly.

data['Story'] = data['Story'].apply(lambda x: x.lower() )

Technically, these are not perfectly equivalent. The split-and-rejoin method converts every splitting point (i.e. every contiguous sequence of whitespace characters) into a single space, whereas the all-at-once approach leaves each splitting point unchanged, so e.g., split-and-rejoin will convert the quadruple-space in "A B" to a single-space in "a b" whereas my all-at-once suggestion would leave it as a quadruple-space, yielding "a b". Similarly newline characters get converted to single spaces by the split-and-rejoin approach, and are preserved as newline characters by my proposal. In some circumstances, one of these outputs may be preferable to the other, but it's odd to fold this whitespace-rewriting functionality into something that purports just to be a lowercase example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions