-
Notifications
You must be signed in to change notification settings - Fork 0
Description
def get_average_sentence_length(text):
sentence_lengths = [] #initialise a list to save the number of words per sentence (it isn't necesary)
words_total = 0 #initialise an integer to save the number of words in total
text_r = text.replace(".", "|") #we substitute all the puntuaction marks that delimit sentences by the same character
text_r = text_r.replace("?", "|")
text_r = text_r.replace("!", "|")
sentences_of_text = text_r.split("|") #we split the text into sentences
for sentence in sentences_of_text:
words_in_sentence = sentence.split() #we split the sentences into words
sentence_lengths.append(len(words_in_sentence)) #we calculate the length of the sentence and we add it to a list
words_total += len(words_in_sentence) #we add the length of the sentence to an integer
total_sentences = len(sentences_of_text)-1 #the .split() creates an empty item, so we substract 1 from the total of valid sentences
return words_total/total_sentences
Great job with this function! It returns the exact numbers we want, and it handles the tiny bug where split() creates an empty item due to an extra space. Right now it looks like we are getting around this bug by always subtracting 1 from our total number of sentences. However, this approach will only work if there is one extra white space; if there is no extra space or there are more than one extra spaces, then our calculation will be slightly off. So, I just wanted to quickly give a slightly different way to handle this bug.
To get around this, we may be able to use an if statement. We can use an if statement in our for loop to filter out all the non-sentences from our results. Here is an example implementation:
for sentence in sentences_of_text:
if sentence.strip():
words_in_sentence = sentence.split() # we split the sentences into words
sentence_lengths.append(
len(words_in_sentence)) # we calculate the length of the sentence and we add it to a list
words_total += len(words_in_sentence) # we add the length of the sentence to an integer
total_sentences = len(sentences_of_text)
The if sentence.strip() line will test to see if the sentence contains anything more than just white spaces. If the sentence variable only contains white spaces, then the strip() function will return ''. This will then be evaluated as False by the if statement because it is equivalent to None. Thus, all non-sentences will be skipped and not considered in our calculations. Note that this approach will work for any number of non-sentences, and that we no longer have to subtract 1 from our total.
In any case, this is just a suggestion; for the purposes of this project the first approach was perfectly acceptable and worked great! So, overall, very nice job!