Skip to content

Latest commit

 

History

History
39 lines (28 loc) · 2.3 KB

File metadata and controls

39 lines (28 loc) · 2.3 KB

Back to README

Analytics Pipeline

Scheduled Tasks

Recurring tasks are managed by Sidekiq via the sidekiq-cron plugin. See config/schedule.yml.

Key Importers

Imported data is persisted in the database.

Extra Data Importers

Imported data is persisted in the database.

Category Sources Importers

Imported data is persisted in the database.

Revision Fetchers

Since the data-rearchitecture deployment, the Dashboard no longer stores data for each individual wiki revision in the database. Instead, it collects data for revisions within a certain timeframe and stores only the aggregate statistics for each time period (what we call timeslices).

  • RevisionDataManager: Fetches revisions and corresponding scores (it invokes the RevisionScoreImporter).
  • RevisionScoreImporter: Fetches revision scoring data from Lift Wing and reference-counter APIs.

APIs

  • WMFLabsTools: Source for all revision and article data except view counts
  • MediaWiki API: Data about uploads to Wikimedia Commons, revision metadata, and user information. All or nearly all data from WMFLabsTools could be reimplemented to pull directly from MediaWiki, and we need to use MediaWiki directly when it is important to fetch up-to-date data, since the replica database used by WMFLabsTools may have replication lag.