-
Notifications
You must be signed in to change notification settings - Fork 3
Description
In the intro_to_working_with_text the lowercase example splits each line into separate words, turns each word lowercase with .lower, and then rejoins the words into a new string.
data['Story'] = data['Story'].apply(lambda x: " ".join(x.lower() for x in x.split()))You could instead get a very similar result by applying .lower to the whole strings which would be simpler and more novice-friendly.
data['Story'] = data['Story'].apply(lambda x: x.lower() )Technically, these are not perfectly equivalent. The split-and-rejoin method converts every splitting point (i.e. every contiguous sequence of whitespace characters) into a single space, whereas the all-at-once approach leaves each splitting point unchanged, so e.g., split-and-rejoin will convert the quadruple-space in "A B" to a single-space in "a b" whereas my all-at-once suggestion would leave it as a quadruple-space, yielding "a b". Similarly newline characters get converted to single spaces by the split-and-rejoin approach, and are preserved as newline characters by my proposal. In some circumstances, one of these outputs may be preferable to the other, but it's odd to fold this whitespace-rewriting functionality into something that purports just to be a lowercase example.