Skip to content

Commit 0844a68

Browse files
committed
frequency
1 parent 6b2cd8e commit 0844a68

File tree

6 files changed

+367
-2
lines changed

6 files changed

+367
-2
lines changed

notebooks/_toc.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,12 @@ parts:
99
- file: exos/beginners/EXO-countdown-nb
1010
# import, argparse
1111
- file: exos/beginners/EXO-guess-nb
12-
# slicing conditional expression return/break
12+
# slicing, conditional expression, return/break
1313
- file: exos/beginners/EXO-palindrom-nb
14-
#
14+
# sorted
1515
- file: exos/beginners/EXO-anagrams-nb
16+
# counter, regexp
17+
- file: exos/beginners/EXO-frequency-nb
1618
- caption: exercices basiques
1719
chapters:
1820
# str, ints, binary computations
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
---
2+
jupytext:
3+
text_representation:
4+
extension: .md
5+
format_name: myst
6+
format_version: 0.13
7+
jupytext_version: 1.16.4
8+
kernelspec:
9+
display_name: Python 3 (ipykernel)
10+
language: python
11+
name: python3
12+
---
13+
14+
# fréquence des mots
15+
16+
+++
17+
18+
## le sujet
19+
20+
il s'agit d'écrire un programme qui lit un fichier texte, et calcule la fréquence
21+
des mots présents dans ce fichier
22+
23+
par exemple ici nous allons afficher les <n> mots les plus fréquents, avec leur nombre d'apparitions
24+
par exemple avec (ce fichier est fourni)
25+
26+
```bash
27+
python frequency.py -n 5 frequency-sample.txt
28+
the 65
29+
he 62
30+
a 52
31+
to 52
32+
it 45
33+
```
34+
35+
+++
36+
37+
## un outil bien pratique
38+
39+
pour faire ça, on va utiliser un module de la librairie standard,
40+
`collections.Counter`; entrainez-vous à trouver sa doc sur `docs.python.org`, et à chercher une méthode qui va nous aider à faire ce qui est demandé ici
41+
42+
````{admonition} une méthode en particulier
43+
:class: dropdown
44+
45+
et notamment la méthode `most_common()`
46+
````
47+
48+
+++
49+
50+
## solutions
51+
52+
### v0: très rustique (marche assez mal)
53+
54+
pour montrer comment on utilise `Counter`, on va couper les mots à la serpe avec
55+
`split()`, et on va ignorer la ponctuation
56+
57+
````{admonition} pour voir la v0
58+
:class: dropdown
59+
60+
```{literalinclude} frequency_v0.py
61+
```
62+
````
63+
64+
+++
65+
66+
### v1: un peu comme pour palindrom
67+
68+
ici on va remplacer tous les caractères de ponctuation par des espaces avant de
69+
couper en morceaux avec `split()`; on montre 3 façons d'obtenir ces caractères
70+
de ponctuation (et le mieux c'est d'importer ce qu'on avait fait pour le
71+
palindrome)
72+
73+
````{admonition} la v1
74+
:class: dropdown
75+
76+
```{literalinclude} frequency_v1.py
77+
```
78+
````
79+
80+
+++
81+
82+
### v2: avec les expressions régulières
83+
84+
enfin on montre comment on ferait en vrai; c'est moins lisible car ça utilise
85+
les expressions régulières, un sujet disons un peu aride de prime abord, mais
86+
comme vous pouvez le voir ça donne une solution beaucoup plus propre, et
87+
globablement plus efficace aussi
88+
89+
````{admonition} la v2
90+
:class: dropdown
91+
92+
```{literalinclude} frequency_v2.py
93+
```
94+
````
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
Chapter One
2+
3+
The house stood on a slight rise just on the edge of the village. It stood on its own and looked out over a broad spread of West Country farmland. Not a remarkable house by any means—it was about thirty years old, squattish, squarish, made of brick, and had four windows set in the front of a size and proportion which more or less exactly failed to please the eye.
4+
5+
The only person for whom the house was in any way special was Arthur Dent, and that was only because it happened to be the one he lived in. He had lived in it for about three years, ever since he had moved out of London because it made him nervous and irritable. He was about thirty as well, tall, dark-haired and never quite at ease with himself. The thing that used to worry him most was the fact that people always used to ask him what he was looking so worried about. He worked in local radio which he always used to tell his friends was a lot more interesting than they probably thought. It was, too—most of his friends worked in advertising.
6+
7+
On Wednesday night it had rained very heavily, the lane was wet and muddy, but the Thursday morning sun was bright and clear as it shone on Arthur Dent’s house for what was to be the last time.
8+
9+
It hadn’t properly registered yet with Arthur that the council wanted to knock it down and build a bypass instead.
10+
11+
12+
At eight o’clock on Thursday morning Arthur didn’t feel very good. He woke up blearily, got up, wandered blearily round his room, opened a window, saw a bulldozer, found his slippers, and stomped off to the bathroom to wash.
13+
14+
Toothpaste on the brush—so. Scrub.
15+
16+
Shaving mirror—pointing at the ceiling. He adjusted it. For a moment it reflected a second bulldozer through the bathroom window. Properly adjusted, it reflected Arthur Dent’s bristles. He shaved them off, washed, dried and stomped off to the kitchen to find something pleasant to put in his mouth.
17+
18+
Kettle, plug, fridge, milk, coffee. Yawn.
19+
20+
The word bulldozer wandered through his mind for a moment in search of something to connect with.
21+
22+
The bulldozer outside the kitchen window was quite a big one.
23+
24+
He stared at it.
25+
26+
'Yellow,' he thought, and stomped off back to his bedroom to get dressed.
27+
28+
Passing the bathroom he stopped to drink a large glass of water, and another. He began to suspect that he was hung over. Why was he hung over? Had he been drinking the night before? He supposed that he must have been. He caught a glint in the shaving mirror. “Yellow,” he thought, and stomped on to the bedroom.
29+
30+
He stood and thought. The pub, he thought. Oh dear, the pub. He vaguely remembered being angry, angry about something that seemed important. He’d been telling people about it, telling people about it at great length, he rather suspected: his clearest visual recollection was of glazed looks on other people’s faces. Something about a new bypass he’d just found out about. It had been in the pipeline for months only no one seemed to have known about it. Ridiculous. He took a swig of water. It would sort itself out, he’ d decided, no one wanted a bypass, the council didn’t have a leg to stand on. It would sort itself out.
31+
32+
God, what a terrible hangover it had earned him though. He looked at himself in the wardrobe mirror. He stuck out his tongue. 'Yellow,' he thought. The word yellow wandered through his mind in search of something to connect with.
33+
34+
Fifteen seconds later he was out of the house and lying in front of a big yellow bulldozer that was advancing up his garden path.
35+
36+
37+
Mr. L. Prosser was, as they say, only human. In other words he was a carbon-based bipedal life form descended from an ape. More specifically he was forty, fat and shabby and worked for the local council. Curiously enough, though he didn’t know it, he was also a direct male-line descendant of Genghis Khan, though intervening generations and racial mixing had so juggled his genes that he had no discernible Mongoloid characteristics, and the only vestiges left in Mr. L. Prosser of his mighty ancestry were a pronounced stoutness about the tum and a predilection for little fur hats.
38+
39+
He was by no means a great warrior; in fact he was a nervous, worried man. Today he was particularly nervous and worried because something had gone seriously wrong with his job, which was to see that Arthur Dent’s house got cleared out of the way before the day was out.
40+
41+
“Come off it, Mr. Dent,” he said, “you can’t win, you know. You can’t lie in front of the bulldozer indefinitely.” He tried to make his eyes blaze fiercely but they just wouldn’t do it.
42+
43+
Arthur lay in the mud and squelched at him.
44+
45+
“I’m game,” he said, “we’ll see who rusts first.”
46+
47+
“I’m afraid you’re going to have to accept it,” said Mr. Prosser, gripping his fur hat and rolling it round the top of his head; “this bypass has got to be built and it’s going to be built!”
48+
49+
“First I’ve heard of it,” said Arthur, “why’s it got to be built?”
50+
51+
Mr. Prosser shook his finger at him for a bit, then stopped and put it away again.
52+
53+
“What do you mean, why’s it got to be built?” he said. “It’s a bypass. You’ve got to build bypasses.”
54+
55+
Bypasses are devices that allow some people to dash from point A to point B very fast while other people dash from point B to point A very fast. People living at point C, being a point directly in between, are often given to wonder what’s so great about point A that so many people from point B are so keen to get there, and what’s so great about point B that so many people from point A are so keen to get there. They often wish that people would just once and for all work out where the hell they wanted to be.
56+
57+
Mr. Prosser wanted to be at point D. Point D wasn’t anywhere in particular, it was just any convenient point a very long way from points A, B and C. He would have a nice little cottage at point D, with axes over the door, and spend a pleasant amount of time at point E, which would be the nearest pub to point D. His wife of course wanted climbing roses, but he wanted axes. He didn’t know why—he just liked axes. He flushed hotly under the derisive grins of the bulldozer drivers.
58+
59+
He shifted his weight from foot to foot, but it was equally uncomfortable on each. Obviously somebody had been appallingly incompetent and he hoped to God it wasn’t him.
60+
61+
Mr. Prosser said, “You were quite entitled to make any suggestions or protests at the appropriate time, you know.”
62+
63+
“Appropriate time?” hooted Arthur. “Appropriate time? The first I knew about it was when a workman arrived at my home yesterday. I asked him if he’d come to clean the windows and he said no, he’d come to demolish the house. He didn’t tell me straight away of course. Oh no. First he wiped a couple of windows and charged me a fiver. Then he told me.”
64+
65+
“But Mr. Dent, the plans have been available in the local planning office for the last nine months.”
66+
67+
“Oh yes, well, as soon as I heard I went straight round to see them, yesterday afternoon. You hadn’t exactly gone out of your way to call attention to them, had you? I mean, like actually telling anybody or anything.”
68+
69+
“But the plans were on display . . .”
70+
71+
“On display? I eventually had to go down to the cellar to find them.”
72+
73+
“That’s the display department.”
74+
75+
“With a flashlight.”
76+
77+
“Ah, well, the lights had probably gone.”
78+
79+
“So had the stairs.”
80+
81+
“But look, you found the notice, didn’t you?”
82+
83+
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.’”
84+
85+
A cloud passed overhead. It cast a shadow over Arthur Dent as he lay propped up on his elbow in the cold mud. It cast a shadow over Arthur Dent’s house. Mr. Prosser frowned at it.
86+
87+
“It’s not as if it’ s a particularly nice house,” he said.
88+
89+
“I’m sorry, but I happen to like it.”
90+
91+
“You’ ll like the bypass.”
92+
93+
“Oh, shut up,” said Arthur Dent. “Shut up and go away, and take your bloody bypass with you. You haven’t got a leg to stand on and you know it.”
94+
95+
Mr. Prosser’s mouth opened and closed a couple of times while his mind was for a moment filled with inexplicable but terribly attractive visions of Arthur Dent’ s house being consumed with fire and Arthur himself running screaming from the blazing ruin with at least three hefty spears protruding from his back. Mr. Prosser was often bothered with visions like these and they made him feel very nervous. He stuttered for a moment and then pulled himself together.
96+
97+
“Mr. Dent,” he said.
98+
99+
“Hello? Yes?” said Arthur.
100+
101+
“Some factual information for you. Have you any idea how much damage that bulldozer would suffer if I just let it roll straight over you?”
102+
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
"""
2+
un programme qui calcule le nombre d'occurrences des mots dans un fichier texte
3+
- on entre le nom du fichier par la ligne de commandes
4+
- par défaut on montre les 3 mots les plus fréquents, on peut changer ce nombre
5+
également sur la ligne de commandes
6+
"""
7+
8+
# dans cette v0 on ignore la ponctuation
9+
# le résultat est donc un peu plus "brut"
10+
# puisque "Hello," ne va pas être compté comme "hello"
11+
12+
from collections import Counter
13+
from argparse import ArgumentParser
14+
15+
def compute_counter(filename) -> Counter:
16+
try:
17+
# ici le fait de lire tout le fichier d'un coup
18+
# simplifie le code
19+
with open(filename) as reader:
20+
text = reader.read().lower().split()
21+
counter = Counter(mots)
22+
return counter
23+
except FileNotFoundError:
24+
print(f"OOPS le fichier {filename} n'existe pas")
25+
return Counter()
26+
27+
28+
def main():
29+
parser = ArgumentParser()
30+
# par défaut on montre les 3 mots les plus fréquents
31+
parser.add_argument(
32+
"-n", "--number", default=3, type=int,
33+
help="the number of words to show")
34+
parser.add_argument("filename")
35+
args = parser.parse_args()
36+
37+
filename = args.filename
38+
n = args.number
39+
40+
counter = compute_counter(filename)
41+
if len(counter) == 0:
42+
print("vide")
43+
else:
44+
for word, occurrences in counter.most_common(n):
45+
print(word, occurrences)
46+
47+
48+
if __name__ == '__main__':
49+
main()
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
"""
2+
un programme qui calcule le nombre d'occurrences des mots dans un fichier texte
3+
- on entre le nom du fichier par la ligne de commandes
4+
- par défaut on montre les 3 mots les plus fréquents, on peut changer ce nombre
5+
également sur la ligne de commandes
6+
"""
7+
8+
# v1: un peu mieux car on traite la ponctuation
9+
# mais c'est clairement sous-optimal en termes de complexité
10+
# (près d'un millier de caractères de ponctuation)
11+
12+
from collections import Counter
13+
from argparse import ArgumentParser
14+
15+
# plusieurs options pour calculer une chaine
16+
# qui contient les caractères de ponctuation
17+
18+
# option 0: on entre à la main les caractères de ponctuation
19+
# dans notre texte; bon c'est sous-optimal mais en dernier recours..
20+
# notez l'utilisation de """ comme délimiteur
21+
# PONCTUATION = """,.;'"“”"""
22+
23+
# option 1: ne marche pas bien car seulement ASCII
24+
# from string import punctuation
25+
26+
# option 2: voir le palindrome
27+
from palindrom_v2 import unicode_punctuation
28+
PUNCTUATION = unicode_punctuation()
29+
30+
def compute_counter(filename) -> Counter:
31+
try:
32+
with open(filename) as reader:
33+
text = reader.read().lower()
34+
# on remplace les caractères de ponctuation par un espace
35+
for char in PUNCTUATION:
36+
text = text.replace(char, " ")
37+
mots = text.split()
38+
counter = Counter(mots)
39+
return counter
40+
except FileNotFoundError:
41+
print(f"OOPS le fichier {filename} n'existe pas")
42+
return Counter()
43+
44+
45+
def main():
46+
parser = ArgumentParser()
47+
# par défaut on montre les 3 mots les plus fréquents
48+
parser.add_argument(
49+
"-n", "--number", default=3, type=int,
50+
help="the number of words to show")
51+
parser.add_argument("filename")
52+
args = parser.parse_args()
53+
54+
filename = args.filename
55+
n = args.number
56+
57+
counter = compute_counter(filename)
58+
if len(counter) == 0:
59+
print("vide")
60+
else:
61+
for word, occurrences in counter.most_common(n):
62+
print(word, occurrences)
63+
64+
65+
if __name__ == '__main__':
66+
main()
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
"""
2+
un programme qui calcule le nombre d'occurrences des mots dans un fichier texte
3+
- on entre le nom du fichier par la ligne de commandes
4+
- par défaut on montre les 3 mots les plus fréquents, on peut changer ce nombre
5+
également sur la ligne de commandes
6+
"""
7+
8+
# une deuxième version en utilisant les expressions régulières
9+
10+
from collections import Counter
11+
from argparse import ArgumentParser
12+
13+
import re
14+
15+
def compute_counter(filename) -> Counter:
16+
try:
17+
# ici le fait de lire tout le fichier d'un coup
18+
# simplifie le code
19+
with open(filename) as reader:
20+
text = reader.read().lower()
21+
words_stream = re.findall(r'\w+', text)
22+
counter = Counter(words_stream)
23+
return counter
24+
# on aurait pu raccourcir ces 4 lignes en une seule:
25+
# return Counter(re.findall(r'\w+', reader.read().lower()))
26+
except FileNotFoundError:
27+
print(f"OOPS le fichier {filename} n'existe pas")
28+
return Counter()
29+
30+
31+
def main():
32+
parser = ArgumentParser()
33+
# par défaut on montre les 3 mots les plus fréquents
34+
parser.add_argument(
35+
"-n", "--number", default=3, type=int,
36+
help="the number of words to show")
37+
parser.add_argument("filename")
38+
args = parser.parse_args()
39+
40+
filename = args.filename
41+
n = args.number
42+
43+
counter = compute_counter(filename)
44+
if len(counter) == 0:
45+
print("vide")
46+
else:
47+
for word, occurrences in counter.most_common(n):
48+
print(word, occurrences)
49+
50+
51+
if __name__ == '__main__':
52+
main()

0 commit comments

Comments
 (0)