Skip to content

Conversation

@baiqiushi
Copy link
Collaborator

Data Tools

Data Tools is a new module consisting of 3 components that serve the data preparation of the TwitterMap application.

Twitter Ingestion Server

Twitter Ingestion Server is a daemon service that can ingest real-time tweets from Twitter Filter Stream API into local gzip files in a daily rotation manner.
It is also a light-weight HTTP server with 3 endpoints:

  • /stats - HTTP GET endpoint that returns current ingestion status information in JSON format.
  • /proxy - WebSocket endpoint that pushes real-time tweets to any client in connection.
  • / - HTTP GET endpoint that returns an index.html as an example page demonstrating the usage of the above two endpoints.

Twitter GeoTagger

Twitter GeoTagger is Java program to geoTag Twitter JSON with {stateID, stateName, countyID, countyName, cityID, cityName}.
It has 2 modes,

    1. in API mode, it provides a function tagOneTweet that can be called from other programs;
    1. in process mode, it provides a main function that can be started as a JVM process to geotag tweets in shell console pipeline.

AsterixDB Ingestion Server

TBD.

@sadeemsaleh
Copy link
Contributor

NOT ready to be merged

@codecov-io
Copy link

codecov-io commented Dec 29, 2020

Codecov Report

Merging #807 (6bd3e50) into master (9caf3d0) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #807   +/-   ##
=======================================
  Coverage   63.91%   63.91%           
=======================================
  Files          75       75           
  Lines        4076     4076           
  Branches      355      355           
=======================================
  Hits         2605     2605           
  Misses       1471     1471           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9caf3d0...6bd3e50. Read the comment docs.

@codecov-commenter
Copy link

codecov-commenter commented Dec 24, 2021

Codecov Report

Merging #807 (0538333) into master (9caf3d0) will not change coverage.
The diff coverage is n/a.

❗ Current head 0538333 differs from pull request most recent head 3786af4. Consider uploading reports for the commit 3786af4 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master     #807   +/-   ##
=======================================
  Coverage   63.91%   63.91%           
=======================================
  Files          75       75           
  Lines        4076     4076           
  Branches      355      355           
=======================================
  Hits         2605     2605           
  Misses       1471     1471           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9caf3d0...3786af4. Read the comment docs.

…on output file rotation in TwitterIngestioinServer; (2) add parameter for switching between general Twitter and TwitterMap output format in AsterixDBIngestionDriver; (3) Fix the issue of the unexpected end of file for output gzip files in TwitterIngestionServer;
…does not wait for the WebsocketClient to long live waiting for tweets from the Proxy server; (2) fix the bug in AsterixDBAdapterForTwitterMap that the schema should be initilized in the constructor;
…nsafe issue in AsterixDBAdapterForTWitterMap and AsterixDBAdapterForTwitter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants