diff --git a/.gitignore b/.gitignore new file mode 100644 index 00000000..a3f545f0 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +**en/ \ No newline at end of file diff --git a/README.md b/README.md index e326eb60..9a9834a7 100644 --- a/README.md +++ b/README.md @@ -1,86 +1,50 @@ -# Backend Engineering Challenge +# Backend Engineering Challenge -- Moving Average Calculator +## Overview -Welcome to our Engineering Challenge repository 🖖 +This Python script calculates the moving average delivery time based on events read from a JSON file. The moving average is computed over a specified time window. -If you found this repository it probably means that you are participating in our recruitment process. Thank you for your time and energy. If that's not the case please take a look at our [openings](https://unbabel.com/careers/) and apply! +## Requirements -Please fork this repo before you start working on the challenge, read it careful and take your time and think about the solution. Also, please fork this repository because we will evaluate the code on the fork. +* Python 3.x +* Dependencies (install via `pip install -r requirements.txt`) -This is an opportunity for us both to work together and get to know each other in a more technical way. If you have any questions please open and issue and we'll reach out to help. +## Installation -Good luck! - -## Challenge Scenario - -At Unbabel we deal with a lot of translation data. One of the metrics we use for our clients' SLAs is the delivery time of a translation. - -In the context of this problem, and to keep things simple, our translation flow is going to be modeled as only one event. - -### *translation_delivered* - -Example: - -```json -{ - "timestamp": "2018-12-26 18:12:19.903159", - "translation_id": "5aa5b2f39f7254a75aa4", - "source_language": "en", - "target_language": "fr", - "client_name": "airliberty", - "event_name": "translation_delivered", - "duration": 20, - "nr_words": 100 -} +1. Clone the repository: +```bash +git clone https://github.com/thisIsMailson/moving-average-calculator.git +cd moving-average-calculator ``` -## Challenge Objective - -Your mission is to build a simple command line application that parses a stream of events and produces an aggregated output. In this case, we're interested in calculating, for every minute, a moving average of the translation delivery time for the last X minutes. - -If we want to count, for each minute, the moving average delivery time of all translations for the past 10 minutes we would call your application like (feel free to name it anything you like!). - - unbabel_cli --input_file events.json --window_size 10 - -The input file format would be something like: - - {"timestamp": "2018-12-26 18:11:08.509654","translation_id": "5aa5b2f39f7254a75aa5","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 20} - {"timestamp": "2018-12-26 18:15:19.903159","translation_id": "5aa5b2f39f7254a75aa4","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 31} - {"timestamp": "2018-12-26 18:23:19.903159","translation_id": "5aa5b2f39f7254a75bb3","source_language": "en","target_language": "fr","client_name": "taxi-eats","event_name": "translation_delivered","nr_words": 100, "duration": 54} - -Assume that the lines in the input are ordered by the `timestamp` key, from lower (oldest) to higher values, just like in the example input above. - -The output file would be something in the following format. - +2. Install dependencies +```bash +pip install -r requirements.txt ``` -{"date": "2018-12-26 18:11:00", "average_delivery_time": 0} -{"date": "2018-12-26 18:12:00", "average_delivery_time": 20} -{"date": "2018-12-26 18:13:00", "average_delivery_time": 20} -{"date": "2018-12-26 18:14:00", "average_delivery_time": 20} -{"date": "2018-12-26 18:15:00", "average_delivery_time": 20} -{"date": "2018-12-26 18:16:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:17:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:18:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:19:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:20:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:21:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:22:00", "average_delivery_time": 31} -{"date": "2018-12-26 18:23:00", "average_delivery_time": 31} -{"date": "2018-12-26 18:24:00", "average_delivery_time": 42.5} + +# Usage +The code to calculate the moving average of an event resides inside the **main.py** file. +To calculate the moving average delivery time, run the script with the input JSON file and window size. Example: +```bash +python3 main.py --input_file=input.json --window_size=10 ``` +* input_file: Path to the input JSON file. +* window_size: Size of the time window for the moving average. -#### Notes +The results will be saved to an output file. -Before jumping right into implementation we advise you to think about the solution first. We will evaluate, not only if your solution works but also the following aspects: +# Running Tests +The code to calculate the moving average of an event resides inside the **events_test.py** file. +```bash +python -m unittest events_test.py +``` -+ Simple and easy to read code. Remember that [simple is not easy](https://www.infoq.com/presentations/Simple-Made-Easy) -+ Comment your code. The easier it is to understand the complex parts, the faster and more positive the feedback will be -+ Consider the optimizations you can do, given the order of the input lines -+ Include a README.md that briefly describes how to build and run your code, as well as how to **test it** -+ Be consistent in your code. +## Sample Data -Feel free to, in your solution, include some your considerations while doing this challenge. We want you to solve this challenge in the language you feel most comfortable with. Our machines run Python (3.7.x or higher) or Go (1.16.x or higher). If you are thinking of using any other programming language please reach out to us first 🙏. +For testing purposes, you can use the provided sample JSON file sample_data.json. -Also, if you have any problem please **open an issue**. +# File Structure -Good luck and may the force be with you + * calculate_moving_average.py: Main script for calculating the moving average. + * test_calculate_moving_average.py: Test cases for the script. + * sample_data.json: Sample input data for testing. diff --git a/__pycache__/events_test.cpython-37.pyc b/__pycache__/events_test.cpython-37.pyc new file mode 100644 index 00000000..dc460656 Binary files /dev/null and b/__pycache__/events_test.cpython-37.pyc differ diff --git a/__pycache__/events_tests.cpython-37.pyc b/__pycache__/events_tests.cpython-37.pyc new file mode 100644 index 00000000..4bd88fc1 Binary files /dev/null and b/__pycache__/events_tests.cpython-37.pyc differ diff --git a/__pycache__/main.cpython-37.pyc b/__pycache__/main.cpython-37.pyc new file mode 100644 index 00000000..d1ae70eb Binary files /dev/null and b/__pycache__/main.cpython-37.pyc differ diff --git a/events.json b/events.json new file mode 100644 index 00000000..881e5706 --- /dev/null +++ b/events.json @@ -0,0 +1,3 @@ +{"timestamp": "2018-12-26 18:11:08.509654","translation_id": "5aa5b2f39f7254a75aa5","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 20} +{"timestamp": "2018-12-26 18:15:19.903159","translation_id": "5aa5b2f39f7254a75aa4","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 31} +{"timestamp": "2018-12-26 18:23:19.903159","translation_id": "5aa5b2f39f7254a75bb3","source_language": "en","target_language": "fr","client_name": "taxi-eats","event_name": "translation_delivered","nr_words": 100, "duration": 54} \ No newline at end of file diff --git a/events_test.py b/events_test.py new file mode 100644 index 00000000..b9459869 --- /dev/null +++ b/events_test.py @@ -0,0 +1,34 @@ +import unittest +from unittest.mock import patch +from main import calculate_moving_average, save_to_file +import json +import os +class EventsTests(unittest.TestCase): + @patch('builtins.print') # Mock the print function to capture output + def test_calculate_moving_average(self, mock_print): + # Prepare test data + input_file = 'test_input.json' + window_size = 5 + + # Mocking events for testing + test_events = [ + {"timestamp": "2022-01-01 12:00:00.000", "duration": 10}, + {"timestamp": "2022-01-01 12:05:00.000", "duration": 20}, + {"timestamp": "2022-01-01 12:07:00.000", "duration": 30}, + ] + + with patch('builtins.open', create=True) as mock_open: + # Mocking the file read to return test_events + mock_open.return_value.__enter__.return_value.read.return_value = json.dumps(test_events) + + # Function to test + calculate_moving_average(input_file, window_size) + + # Assertions based on the expected output + mock_print.assert_called_with({"date": "2022-01-01 12:00:00", "average_delivery_time": 10.0}) + mock_print.assert_called_with({"date": "2022-01-01 12:05:00", "average_delivery_time": 20.0}) + mock_print.assert_called_with({"date": "2022-01-01 12:07:00", "average_delivery_time": 30.0}) + + +if __name__ == '__main__': + unittest.main() diff --git a/main.py b/main.py new file mode 100644 index 00000000..ab95c716 --- /dev/null +++ b/main.py @@ -0,0 +1,112 @@ +from typing import List, Dict, Union +import argparse +import json +from datetime import datetime, timedelta + +def read_events_from_file(input_file: str) -> List[Dict]: + """ + Read events from a JSON file and return a list of events. + + Parameters: + input_file (str): Path to the input JSON file. + + Returns: + list: List of event dictionaries. + """ + events = [] + if input_file: + with open(input_file, 'r') as file: + for line in file: + yield json.loads(line) + return events + +def remove_old_events(event_queue: List[tuple], timestamp: datetime, window_size: int) -> None: + """ + Remove events outside the current time window from the event queue. + + Parameters: + event_queue (list): List of tuples containing (timestamp, duration). + timestamp (datetime): Current event timestamp. + window_size (int): Size of the time window for moving average. + """ + while event_queue and timestamp - event_queue[0][0] > timedelta(minutes=window_size): + event_queue.pop(0) + +def filter_events_within_window(event_queue: List[tuple], window_start_time: datetime, current_time: datetime) -> List[tuple]: + """ + Filter events within the current time window. + + Parameters: + event_queue (list): List of tuples containing (timestamp, duration). + window_start_time (datetime): Start time of the current window. + current_time (datetime): Current time. + + Returns: + list: List of tuples containing (timestamp, duration) within the window. + """ + + return [(time, duration) for time, duration in event_queue if window_start_time <= time <= current_time] + +def calculate_moving_average(input_file: str, window_size: int) -> None: + """ + Calculate moving average delivery time. + + Parameters: + input_file (str): Path to the input JSON file. + window_size (int): Size of the time window for moving average. + """ + event_queue: List[tuple] = [] + average_delivery_times: List[Dict[str, Union[str, float]]] = [] + + events = read_events_from_file(input_file) + + for event in events: + timestamp = datetime.strptime(event['timestamp'], '%Y-%m-%d %H:%M:%S.%f') + timestamp = timestamp.replace(second=0, microsecond=0) + duration = event['duration'] + event_queue.append((timestamp, duration)) + + remove_old_events(event_queue, timestamp, window_size) + + current_time = timestamp + window_start_time = current_time - timedelta(minutes=window_size) + + while event_queue and event_queue[0][0] <= current_time: + # Filter events within the current time window [current_minute - window_size, current_minute] + events_within_window = filter_events_within_window(event_queue, window_start_time, current_time) + + if events_within_window: + moving_average = round(sum(duration for _, duration in events_within_window) / len(events_within_window), 2) + else: + moving_average = 0 + + average_delivery_times.append({"date": current_time.strftime('%Y-%m-%d %H:%M:%S'), "average_delivery_time": moving_average}) + + current_time -= timedelta(minutes=1) + window_start_time = current_time - timedelta(minutes=window_size) + + save_to_file(average_delivery_times) + +def save_to_file(average_time: List[Dict[str, Union[str, float]]], output_file: str = 'output.json') -> None: + """ + Save moving average delivery times to file. + + Parameters: + average_time (list): List of dictionaries containing date and average delivery time. + output_file (str): Path to the output JSON file. Default is 'output.json'. + """ + average_time.sort(key=lambda x: x["date"]) + + with open(output_file, 'w') as file: + json.dump(average_time, file, indent=2) + + print(f"Moving average delivery times saved to {output_file}") + + +if __name__ == '__main__': + parser = argparse.ArgumentParser(description='Calculate moving average delivery time.') + parser.add_argument('--input_file', type=str, help='Path to the input JSON file') + parser.add_argument('--window_size', type=int, help='Size of the time window for moving average') + args = parser.parse_args() + input_file, window_size = args.input_file, args.window_size + calculate_moving_average(input_file=input_file, window_size=window_size) diff --git a/output.json b/output.json new file mode 100644 index 00000000..79b86a1b --- /dev/null +++ b/output.json @@ -0,0 +1,62 @@ +[ + { + "date": "2018-12-26 18:11:00", + "average_delivery_time": 20.0 + }, + { + "date": "2018-12-26 18:11:00", + "average_delivery_time": 20.0 + }, + { + "date": "2018-12-26 18:12:00", + "average_delivery_time": 20.0 + }, + { + "date": "2018-12-26 18:13:00", + "average_delivery_time": 20.0 + }, + { + "date": "2018-12-26 18:14:00", + "average_delivery_time": 20.0 + }, + { + "date": "2018-12-26 18:15:00", + "average_delivery_time": 25.5 + }, + { + "date": "2018-12-26 18:15:00", + "average_delivery_time": 31.0 + }, + { + "date": "2018-12-26 18:16:00", + "average_delivery_time": 31.0 + }, + { + "date": "2018-12-26 18:17:00", + "average_delivery_time": 31.0 + }, + { + "date": "2018-12-26 18:18:00", + "average_delivery_time": 31.0 + }, + { + "date": "2018-12-26 18:19:00", + "average_delivery_time": 31.0 + }, + { + "date": "2018-12-26 18:20:00", + "average_delivery_time": 31.0 + }, + { + "date": "2018-12-26 18:21:00", + "average_delivery_time": 31.0 + }, + { + "date": "2018-12-26 18:22:00", + "average_delivery_time": 31.0 + }, + { + "date": "2018-12-26 18:23:00", + "average_delivery_time": 42.5 + } +] \ No newline at end of file diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 00000000..6dbcc278 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,4 @@ +argparse +json +typing +unittest \ No newline at end of file diff --git a/test_input.json b/test_input.json new file mode 100644 index 00000000..881e5706 --- /dev/null +++ b/test_input.json @@ -0,0 +1,3 @@ +{"timestamp": "2018-12-26 18:11:08.509654","translation_id": "5aa5b2f39f7254a75aa5","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 20} +{"timestamp": "2018-12-26 18:15:19.903159","translation_id": "5aa5b2f39f7254a75aa4","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 31} +{"timestamp": "2018-12-26 18:23:19.903159","translation_id": "5aa5b2f39f7254a75bb3","source_language": "en","target_language": "fr","client_name": "taxi-eats","event_name": "translation_delivered","nr_words": 100, "duration": 54} \ No newline at end of file diff --git a/test_output.json b/test_output.json new file mode 100644 index 00000000..98c62f05 --- /dev/null +++ b/test_output.json @@ -0,0 +1,6 @@ +[ + { + "date": "2022-01-01 12:00:00", + "average_delivery_time": 15.0 + } +] \ No newline at end of file