Skip to content

simonecolombani/Users-Activity-Tracking

Repository files navigation

Users-Activity-Tracking

Microservice Application with Kafka for Activity Tracking

Scenario: User Activity Tracking with Asynchronous Analytics Processing

Consider a social media platform where user activities (likes, comments, posts) need to be tracked for various purposes like:

  1. User Feed Personalization: Track user interactions to personalize their newsfeed content.
  2. Trend Analysis: Analyze user activities to identify trending topics and hashtags.
  3. Content Moderation: Identify and flag potentially harmful content based on user activities.

Microservices:

  • Activity Service (Producer):
    • Receives and processes user activity events (likes, comments, posts).
    • Publishes these events to a Kafka topic named "user_activities" with details like user ID, timestamp, activity type, and content (e.g., post ID, comment text).
  • Kafka Queue:
    • Acts as a buffer between the activity service and analytics services.
    • Stores activity events reliably and enables asynchronous processing.
  • Analytics Services (Consumers):
    • Multiple independent consumers can subscribe to the "user_activities" topic.
    • Each consumer processes events for their specific purpose:
      • Feed Personalization Service: Analyzes user activity to generate personalized recommendations for the user's feed.
      • Trend Analysis Service: Aggregates and analyzes activity data to identify trending topics and hashtags in real-time.
      • Content Moderation Service: Monitors activity events for potentially harmful content and flags them for human review.

Benefits of Using Kafka Queue:

  • Scalability: Kafka can handle high volumes of user activities without impacting the main application or analytics processing.
  • Asynchronous Processing: User interactions are captured and persisted without blocking the user experience. Analytics services can process events at their own pace.
  • Decoupling: User service is independent of analytics processing requirements and complexity.
  • Real-Time Analytics: Certain analytics, like trending topics, can be partially processed as events arrive, enabling faster insights.
  • Flexibility: Different analytics services can be added or removed without impacting the overall workflow.

Start the Application

Kafka Setup

Docker Compose is used to set up Kafka and Zookeeper locally. Run the following command to start Kafka in the current directory:

docker compose up -d

This will start Kafka and Zookeeper in the background with three Kafka brokers and one Zookeeper instance.

If NodeAlreadyExistsException occurs, run the following command:

docker-compose rm -svf

This command will remove the existing containers and volumes.

To stop only one Kafka broker, run the following command:

docker compose stop kafka1

Activity Service

The activity service is a python application that simulates user activity events and publishes them to the Kafka "user_activities" topic.

To start the activity service, run the following command:

python3 Activity-service/service.py

This will start the activity service and begin publishing user activity events to Kafka.

Analytics Services

The analytics services are python applications that consume user activity events from the Kafka "user_activities" topic and perform specific analytics tasks.

To start the feed personalization service, run the following command:

python3 Analytics-services/service.py

This will start the feed personalization service and begin processing user activity events for feed personalization.

Authentication and Authorization

Authentication

Kafka supports multiple authentication mechanisms to secure access to the cluster. Common authentication methods include:

  • PLAINTEXT: Simple username/password authentication.
  • SASL_PLAINTEXT
  • SSL (better): Secure communication using SSL certificates.
  • SASL_SSL: Secure communication using SSL certificates.

Authorization

Kafka provides fine-grained access control through ACLs (Access Control Lists) to restrict which users or clients can perform specific operations. ACLs can be configured to control access to topics, consumer groups, and administrative operations.

Channel Encryption

Kafka supports SSL/TLS encryption for securing data in transit between clients and brokers. By enabling SSL encryption,

Non functional requirements:

1. Scalability

Scalability is the property of a system to handle a growing amount of work by adding resources to the system. Kafka is designed to be highly scalable, allowing it to handle large volumes of data and high throughput. Some key scalability features of Kafka include:

  • Horizontal Scalability: Kafka is designed to scale horizontally by adding new brokers to the cluster, which increases throughput capacity and storage. Brokers can be added or removed without downtime.
  • Partitioning: Topics can be divided into multiple partitions, allowing parallelism in produce and consume operations. Partitions can be distributed across brokers to balance load.

2. Availability

Availability is the property of a system to remain operational and accessible even in the face of failures. Kafka is designed to be highly available by providing features such as:

  • Replication: Kafka supports data replication across brokers to ensure availability and durability. Replicas are distributed across different brokers to tolerate failures. In case of a broker failure, the leader for a partition can be moved to another broker. In our case, we have three Kafka brokers to ensure high availability. To increase

2. Reliability

  • Data Replication: Kafka supports data replication across brokers to ensure availability and durability. Replicas are distributed across different brokers to tolerate failures.
  • Fault Tolerance: Even if a broker or a group of brokers fails, Kafka can continue operating due to replication and leader re-election for partitions.

3. Performance

  • High Throughput: Kafka is built to handle thousands of messages per second with minimal overhead for both producing and consuming messages due to its design.

4. Durability

  • Data Retention: Kafka retains messages for a configurable period, allowing consumers to replay events if needed.
  • Durability, we can increase the replication factor. In Kafka to set the replication factor, we can set the following property:
    DEFAULT_REPLICATION_FACTOR=3

5. Consistency

  • Partition-Level Guarantee: Kafka ensures consistency at the partition level, making sure messages are read in the order they were written.
  • Exactly Once Semantics: With proper configuration, Kafka can provide exactly-once delivery semantics.

6. Maintainability

  • Monitoring and Logging: Kafka provides tools for monitoring cluster performance and state, including detailed metrics and logging.
  • Administrative Tools: Kafka includes various tools for managing the cluster, such as partition reassignment, replica management, and cluster state monitoring.

7. Security

  • Authentication: Kafka supports various authentication mechanisms, including Kerberos and SSL, ensuring only authorized users and applications can access the cluster.
  • Authorization: Permissions can be configured to control who can produce or consume messages on specific topics.
  • Encryption: Kafka supports encryption of data in transit via SSL.

Kafka Configuration

Here’s a summary of the configuration syntax:

  • process.roles: The role of the server.

  • node.id: These should be unique for each server instance we create. If we want 6 nodes, then each one will have a number 1–6.

  • controller.quorum.voters: these are the IDs and addresses of the controller servers. These should be provided to each server.

  • listeners: These are the ports on which the servers listen for incoming connections.

  • inter.broker.listener.name: This is the name of the port on which the brokers will communicate.

  • advertised.listeners: Clients will connect to the brokers through this port. Controllers should not be advertised to clients.

  • controller.listener.names: The listeners that the server will use for controller connections.

  • listener.security.protocol.map: The list of security protocols to use for each listener. The names should match what has been provided.

  • log.dirs: Where the server stores its log data, and the directory where it stores its metadata.properties file — upon formatting, it will update this file with the cluster ID, which will allow controllers to identify brokers as needed.

    $\begin{table}[ht] \centering \caption{Kafka Broker Configuration} \label{table:kafka-broker-config} \begin{tabular}{|l|l|p{6cm}|} \hline \textbf{Configuration Name} & \textbf{Value} & \textbf{Description} \ \hline advertised.listeners & INTERNAL://kafka1:29092,EXTERNAL://localhost:9092 & The listeners that the broker will advertise to clients. \ \hline auto.create.topics.enable & true & Enable automatic topic creation. \ \hline broker.id & 1 & The unique identifier of the broker. \ \hline controller.listener.names & CONTROLLER & The listener name used by the controller. \ \hline default.replication.factor & 3 & The default replication factor for topics. \ \hline delete.topic.enable & true & Enable topic deletion by the client. \ \hline inter.broker.listener.name & INTERNAL & The listener name used for communication between brokers. \ \hline listener.security.protocol.map & CONTROLLER:SASL_PLAINTEXT,INTERNAL:PLAINTEXT,EXTERNAL:SASL_SSL & The security protocol used by the listeners. \ \hline listeners & INTERNAL://kafka1:29092,CONTROLLER://kafka1:29093,EXTERNAL://0.0.0.0:9092 & The listeners that the broker will listen on. \ \hline log.retention.bytes & -1 & The maximum size of the log before it is deleted. \ \hline log.retention.check.interval.ms & 300000 & The interval at which log segments are checked for deletion. \ \hline log.retention.hours & 168 & The number of hours to keep a log segment before deleting it. \ \hline sasl.enabled.mechanisms & [PLAIN] & The SASL mechanisms enabled for authentication. \ \hline sasl.mechanism.controller.protocol & PLAIN & The SASL mechanism used by the controller. \ \hline security.inter.broker.protocol & PLAINTEXT & The security protocol used for communication between brokers. \ \hline \end{tabular} \end{table}$

Generate certificates for SSL

Run the following command to generate SSL certificates for Kafka:

bash gen-ssl-certs.sh -k server ca-cert broker_kafka_ kafka

broker_kafka_ is the prefix for the broker certificates.

Build the Docker Image for the Activity Service

Run the following command to build the Docker image for the analytics service in the root of the project:

docker build -t analytics-service-test -f ./Dockerfile ../../../Users-Activity-Tracking/

This command will build the Docker image with the name analytics-service-test using the Dockerfile located in the root of the project. To run the Docker container, use the following command:

docker run -d analytics-service-test

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published