This repository contains the source code for assignment 1 of the COMP90024 Cluster and Cloud Computing course at the University of Melbourne.
Submission Details:
-
Student name: Matthias Bachfischer
-
Student ID: 1133751
data/-- datasets used for testingdoc/-- documentation and implementation notesoutput/-- Output from previous submission runs on Spartanplayground/-- scripts used for Twitter API communicationslurm/-- slurm scripts for submission to Spartan queuetweetanalyzer/-- helper and utility functions
To submit a job to the Spartan cluster, run the command sbatch path_to_slurm_script and replace path_to_slurm_script with the name of the SLURM script that you want to run.
Identify the top 10 most commonly used hashtags and the number of times they appear. A matching hashtag string can match if it has upper/lower case exact substrings, e.g. #covid19 and #COVID19 are a match. A hashtag should follow the Twitter rules, e.g. no spaces and no punctuation are allowed in a hashtag - any string following a # up until a space or punctuation character is a valid hashtag string (except underscore _).
Identify the languages used for tweeting and the number of times the language is used for the provided tweets
Documentation: https://developer.twitter.com/en/docs/twitter-for-websites/twitter-for-websites-supported-languages/overview
Standard for language code: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
Cloud
This partition is best suited for general-purpose single-node jobs. Multiple node jobs will work, but communication between nodes will be comparatively slow.
Physical
Each node is connected by high-speed 25Gb networking with 1.15 µsec latency, making this partition suited to multi-node jobs (e.g. those using OpenMPI).