Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
361e776
0.3.10
miballe Feb 22, 2023
7264400
Created AccountInfo. Reduced UserKey.
miballe Feb 22, 2023
d5951bc
Added useragent config
miballe Feb 22, 2023
dacb56e
0.3.12
miballe Feb 22, 2023
fe0ddc4
Adapted for CI
miballe Feb 22, 2023
ab64ee6
machine cleanup
miballe Mar 10, 2023
8c6b8ff
Implemented AccountInfo
miballe Apr 10, 2023
e602f7b
Alphaless for accinfo and multiple fixes.
miballe Aug 28, 2024
8e319da
Added a supported python version
miballe Aug 29, 2024
8631d83
Updated TZ conversion before deprecation
miballe Aug 29, 2024
ea7ccf1
Updated type check in init
miballe Aug 30, 2024
f912d5f
Increased requirement to Python >=3.10
miballe Aug 30, 2024
176c98d
Chgs forced by Node 20 - GitHub Actions
miballe Aug 30, 2024
491bf0c
Fixed UTC conversion failed in GH test
miballe Aug 30, 2024
bfd309f
Inc version and testing secrets
miballe Aug 30, 2024
f166bf3
Added test skips to avoid throttling
miballe Sep 10, 2024
b68fffc
Updated setup action version to 4
miballe Sep 10, 2024
98c8a28
Added missing test skips
miballe Sep 10, 2024
be19b16
Updated upload artifact action to v4
miballe Sep 10, 2024
9c7849b
Updated actions version to 4
miballe Sep 10, 2024
6c8bbe5
Updated for Trusted Publisher
miballe Sep 10, 2024
d550ecb
Removed Publish section to fix dist
miballe Sep 10, 2024
4ee02f7
Added support to the shard parameter
miballe Sep 14, 2024
222aa3f
Fixed issue when extract job fails
miballe Sep 14, 2024
dd9ce8f
Merge branch 'dev' of github.com:dowjones/factiva-analytics-python in…
miballe Sep 14, 2024
9fcaa04
Exposed SnapshotFiles
miballe Sep 24, 2024
42d64de
Fixed display character for Samples print
miballe Sep 24, 2024
81ab9bb
Patial Refactor between Taxonomies and Companies
miballe Sep 24, 2024
61b562d
Added all_fields param
miballe Sep 26, 2024
fc30b5b
Forced the use of endpoints v3.0
miballe Oct 10, 2024
c465ad4
Added generic avro file read method
miballe Oct 12, 2024
cbb3ce8
Fixed conflict with read_file definition.
miballe Oct 12, 2024
d14957d
Fixed API Version to be used always
miballe Oct 15, 2024
52da6da
Disabled extracted bytes, no longer returned.
miballe Oct 15, 2024
53ae425
Adjusted post/get for Endpoint v3
miballe Oct 15, 2024
6915f76
Optimised code logic
miballe Oct 18, 2024
353f513
New setup categories and prerequisites
miballe Oct 18, 2024
cc3afaa
Minor print improvement + prepare results dwld
miballe Nov 5, 2024
d9fe4ed
Removed some alpha prefixes
miballe Nov 6, 2024
bb1b7eb
Allow override api version
miballe Nov 6, 2024
db2d533
shortened job_id print property
miballe Nov 6, 2024
a1a123f
Explain constructor first param is short Job ID
miballe Nov 6, 2024
959d108
Validated the response shape
miballe Nov 6, 2024
7e78a3d
Implemented results for v2 and v3 automatically
miballe Nov 6, 2024
d011d35
Added language_code to the allowed TS dimensions
miballe Nov 6, 2024
0034160
Updated group_dimension to use only one value
miballe Nov 6, 2024
ebcca41
Updated readthedocs build config to Python 3.12
miballe Nov 6, 2024
3bb2b3e
Fixed Python version to 3.12
miballe Nov 6, 2024
3dc724a
Attempt 001 fixing build readthedocs yaml file
miballe Nov 6, 2024
4dbe4f7
Attempt 002 fixing build readthedocs yaml file
miballe Nov 6, 2024
cd83632
Attempt 003 fixing build readthedocs yaml file
miballe Nov 6, 2024
52cb02d
Attempt 004 fixing build readthedocs yaml file
miballe Nov 6, 2024
f6259d6
Updated paths before factiva import. Version bump.
miballe Nov 6, 2024
790de6a
Added installation options
miballe Nov 6, 2024
da0781b
Enabled editable install according to PEP660
miballe Nov 6, 2024
07c6f5c
Defined dev optional installation packages
miballe Nov 6, 2024
37352c2
Remove the docs announcement.
miballe Nov 6, 2024
e021e0d
Updated to variables as of Nov 2024
miballe Nov 6, 2024
0d8f9fa
Adapted for AccountInfo class. Removed unused attribs.
miballe Nov 6, 2024
df63303
Masked long job IDs.
miballe Nov 6, 2024
8bd78bd
Docs for AccountInfo
miballe Nov 7, 2024
d5c8edf
Auth Module
miballe Nov 7, 2024
5be9429
Updated import order
miballe Nov 7, 2024
73a6287
Auth docs, AccInfo docstrings, index.
miballe Nov 7, 2024
45794c5
Version bump to 0.3.15
miballe Nov 7, 2024
bbd383a
Spaced test requests. Adjusted tests for changes in classes.
miballe Nov 7, 2024
74d299a
Doc Rev 7 Nov 2024
miballe Nov 7, 2024
a512023
Removed test run output redirect to github_step_summary
miballe Nov 7, 2024
8d2f88c
Disabled find_packages to list them explictly.
miballe Nov 7, 2024
f7fb476
Updated fstring to use double quotes
miballe Nov 8, 2024
e6c832b
Updated TestPyPi action to Trusted Publishing
miballe Nov 15, 2024
10f45fc
Enabled id-toke write
miballe Nov 15, 2024
c045c7e
Fixed PyPi package name
miballe Nov 15, 2024
6805b1b
Version bump for test publishing
miballe Nov 15, 2024
6c138df
set verbose and skip existing to true
miballe Nov 15, 2024
597098f
Removed PyPI publication cateogry to fix pub error
miballe Nov 15, 2024
d2cc062
Removed another mismatched classification PyPI
miballe Nov 15, 2024
93f16b7
And again, removed another category PyPI
miballe Nov 15, 2024
668adb0
Removed General News category
miballe Nov 15, 2024
05f8ad4
Removed Office Business Financial Compliance
miballe Nov 15, 2024
69d5314
Sync topics with PyPI official list
miballe Nov 15, 2024
98fec1b
Readme doc updated with latest classes.
miballe Nov 18, 2024
8951343
Delayed pooling times by 5s
miballe Nov 29, 2024
066bd6e
Fixed a broken function reference
miballe Nov 29, 2024
99337b5
Adjusted the logic for failed jobs
miballe Nov 29, 2024
b32f55e
Removed alpha prefixes
miballe Dec 18, 2024
091c99c
Merge branch 'dev' of github.com:dowjones/factiva-analytics-python in…
miballe Dec 18, 2024
b94b5c6
set file_format as class property
miballe Dec 18, 2024
2a2376c
Renamed ArticleRetrieval -> ArticleFetcher
miballe Feb 12, 2025
d2b1766
Added pending task
miballe Feb 12, 2025
26eb411
v0.4
miballe Feb 12, 2025
be5d609
Completed sample output
miballe Feb 12, 2025
8dd5768
ArticleFetcher better parsing and HTML styles
miballe Apr 15, 2025
097ca02
Removed alpha from streams path
miballe Apr 15, 2025
fc7b3a8
Handle empty analytics jobs
miballe May 13, 2025
b1012dd
Handle print custom list classes
miballe May 13, 2025
d7ce427
Updated Time Series cross-version error code
miballe Oct 9, 2025
3371b2d
Merge pull request #3 from dowjones/releaseprep
miballe Oct 9, 2025
b65ac4a
Fixed group_dimensions and refactored most classes
miballe Dec 18, 2025
f8d9acb
Version bump to publish to dev
miballe Dec 18, 2025
2aa841c
Merge pull request #5 from dowjones/releaseprep
miballe Dec 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 86 additions & 45 deletions .github/workflows/dev_test_publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,69 +20,110 @@ env:

jobs:

test:
name: Test
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
ref: 'dev'
# test:
# name: Test 🧪
# runs-on: ubuntu-latest

# steps:
# - name: Checkout code
# uses: actions/checkout@v4
# with:
# ref: 'dev'

- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.9'
# - name: Set up Python
# uses: actions/setup-python@v5
# with:
# python-version: '3.10.9'

- name: Install latest PIP
run: |
python -m pip install --upgrade pip >> $GITHUB_STEP_SUMMARY
# - name: Install latest PIP
# run: |
# python -m pip install --upgrade pip

- name: Install Dependencies
run: |
python -m pip install pytest pytest-cov >> $GITHUB_STEP_SUMMARY
# - name: Install Dependencies
# run: |
# python -m pip install pytest pytest-cov

- name: Setup factiva-analytics <DEV> (this repo)
run: |
python -m pip install -e . >> $GITHUB_STEP_SUMMARY
# - name: Setup factiva-analytics <DEV> (this repo)
# run: |
# python -m pip install .

- name: pytest
run: pytest test/ >> $GITHUB_STEP_SUMMARY
# - name: pytest
# run: pytest test/

build:
name: Build and Publish
name: Build 📦
runs-on: ubuntu-latest
needs: [test]
# needs: [test]
permissions:
id-token: write
steps:
- name : Checkout code
uses : actions/checkout@v3
uses : actions/checkout@v4
with:
ref: 'dev'

- name: Set up Python
uses: actions/setup-python@v3
uses: actions/setup-python@v5
with:
python-version: '3.9'
python-version: '3.10.9'

- name: Install latest pip, setuptools, twine + wheel
# changes

- name: Install pypa/build
run: |
python -m pip install --upgrade pip setuptools wheel >> $GITHUB_STEP_SUMMARY

- name: Build wheels
python -m pip install --upgrade build
- name: Build a binary build and a source tarball
run: |
python setup.py bdist_wheel >> $GITHUB_STEP_SUMMARY
python setup.py sdist >> $GITHUB_STEP_SUMMARY

- name: Upload Artifact
uses: actions/upload-artifact@v3
python -m build
- name: Store the distribution packages
uses: actions/upload-artifact@v4
with:
name: Wheel_library
name: python-package-distributions
path: dist/

# - name: Install latest pip, setuptools, twine + wheel
# run: |
# python -m pip install --upgrade pip setuptools wheel >> $GITHUB_STEP_SUMMARY

- name: Publish package to TestPyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
user: __token__
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository_url: https://test.pypi.org/legacy/
skip_existing: true
# - name: Build wheels
# run: |
# python setup.py bdist_wheel >> $GITHUB_STEP_SUMMARY
# python setup.py sdist >> $GITHUB_STEP_SUMMARY


publish-to-testpypi:
name: Publish 📦 to TestPyPI
needs:
- build
runs-on: ubuntu-latest

environment:
name: testpypi
url: https://test.pypi.org/p/factiva-analytics

permissions:
id-token: write

steps:
- name: Download all the dists
uses: actions/download-artifact@v4
with:
name: python-package-distributions
path: dist/
- name: Publish 📦 to TestPyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
repository-url: https://test.pypi.org/legacy/
verbose: true
skip-existing: true

# - name: GitHub Repo Artifact Upload
# uses: actions/upload-artifact@v4
# with:
# name: Wheel_library
# path: dist/

# - name: Publish package to TestPyPI
# uses: pypa/gh-action-pypi-publish@release/v1
# with:
# repository-url: https://test.pypi.org/legacy/
10 changes: 5 additions & 5 deletions .github/workflows/main_test_publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.9'
python-version: '3.10.9'

- name: Install latest PIP
run: |
Expand All @@ -53,12 +53,12 @@ jobs:
needs: [test]
steps:
- name : Checkout code
uses : actions/checkout@v3
uses : actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.9'
python-version: '3.10.9'

- name: Install latest pip, setuptools, twine + wheel
run: |
Expand All @@ -80,4 +80,4 @@ jobs:
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
skip_existing: false
skip-existing: false
13 changes: 5 additions & 8 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,23 @@
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.9"
python: "3.12"

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/source/conf.py
configuration: docs/source/conf.py

# Optionally build your docs in additional formats such as PDF
formats:
- pdf
- pdf
- epub

# Optionally set the version of Python and requirements required to build your docs
python:
install:
- requirements: docs/requirements.txt
- method: pip
path: .

121 changes: 75 additions & 46 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
Dow Jones Factiva Analytics Python Library
##########################################
.. image:: https://github.com/dowjones/factiva-analytics-python/actions/workflows/main_test_publish.yml/badge.svg
.. image:: https://readthedocs.org/projects/factiva-analytics-python/badge/?version=latest&style=plastic

This library simplifies the integration to Factiva Analytics API services that delivers premium news content.

Expand All @@ -10,6 +11,7 @@ The following services are currently implemented.
* **Snapshots**: Allows to run each snapshot creation, monitoring, download and local exploration, in an individual manner. Also allows to run the whole process within a single method.
* **Streams**: In addition to creating and getting stream details, contains the methods to easily implement a stream listener and push the content to other locations appropriate for high-available setups.
* **Taxonomy**: Operations that return taxonomies applied to classify news content.
* **ArticleFetcher**: Gets article's content by unique identifiers (AN), for display purposes only.

Installation
============
Expand All @@ -23,68 +25,95 @@ Using Library services
======================
Most Factiva Analytics services are implemented in this library. There may be a delay (commonly weeks) when new features are released and their operations are implemented in this package.

Creating a User Instance and Getting its statistics
---------------------------------------------------
Create `UserKey` instance and retrieve a summary of the account statistics.
Getting Account Information
---------------------------
Create an `AccountInfo` instance that contains a summary of the account's basic information and usage statistics.

.. code-block:: python

from factiva.analytics import UserKey
u = UserKey(
key='abcd1234abcd1234abcd1234abcd1234', # Not needed if the ENV variable FACTIVA_USERKEY is set
stats=True) # Connects to the API and pulls the latest account status
from factiva.analytics import AccountInfo
u = AccountInfo(
user_key='abcd1234abcd1234abcd1234abcd1234' # Not needed if the ENV variable FACTIVA_USERKEY is set
)
print(u)

.. code-block::

<class 'factiva.core.userkey.UserKey'>
|-key = ****************************1234
|-cloud_token = **Not Fetched**
|-account_name = AccName1234
|-account_type = account_with_contract_limits
|-active_products = DNA
|-max_allowed_concurrent_extractions = 5
|-max_allowed_extracted_documents = 200,000
|-max_allowed_extractions = 3
|-currently_running_extractions = 0
|-total_downloaded_bytes = 7,253,890
|-total_extracted_documents = 2,515
|-total_extractions = 1
|-total_stream_instances = 4
|-total_stream_subscriptions = 1
|-enabled_company_identifiers = [{'id': 4, 'name': 'isin'}, {'id': 3, 'name': 'cusip'}, {'id': 1, 'name': 'sedol'}, {'id': 5, 'name': 'ticker_exchange'}]
|-remaining_documents = 197,485
|-remaining_extractions = 2

Snapshots
---------
<'factiva.analytics.AccountInfo'>
├─user_key: <'factiva.analytics.UserKey'>
│ ├─key: ****************************1234
│ └─cloud_token: **********************YKB12sJrkHXX
├─account_name: AccName1234
├─account_type: account_with_contract_limits
├─active_product: DNA
├─max_allowed_extracted_documents: 8,000,000
├─max_allowed_extractions: 20
├─currently_running_extractions: 0
├─total_extracted_documents: 5,493,078
├─total_extractions: 4
├─total_stream_instances: 0
├─total_stream_subscriptions: 0
├─extractions_list: <NotLoaded>
├─streams_list: <NotLoaded>
├─enabled_company_identifiers:
│ ├─[1]: sedol
│ ├─[3]: cusip
│ ├─[4]: isin
│ └─[5]: ticker_exchange
├─remaining_documents: 2,506,922
└─remaining_extractions: 16


Snapshot Explain
----------------
Creates an API request that tests the query and returns the number of matching items in the archive.

.. code-block:: python

from factiva.analytics import SnapshotExplain
my_query = "publication_datetime >= '2023-01-01 00:00:00' AND UPPER(source_code) = 'DJDN'"
my_explain = SnapshotExplain(
user_key='abcd1234abcd1234abcd1234abcd1234', # Not needed if the ENV variable FACTIVA_USERKEY is set
query=my_query)
my_explain.process_job() # This operation can take several seconds to complete
print(my_explain)

.. code-block::

<'factiva.analytics.SnapshotExplain'>
├─user_key: <'factiva.analytics.UserKey'>
│ ├─key: ****************************1234
│ └─cloud_token: **********************YKB12sJrkHXX
├─query: <'factiva.analytics.SnapshotExplainQuery'>
│ ├─where: publication_datetime >= '2023-01-01 00:00:00' AND UPPER(source_code) = 'DJDN'
│ ├─includes: <NotSet>
│ ├─excludes: <NotSet>
│ ├─include_lists: <NotSet>
│ └─exclude_lists: <NotSet>
├─job_response: <'factiva.analytics.SnapshotExplainJobResponse'>
│ ├─job_id: 3ee35a80-0406-4f2b-a999-3e4eb5aa94d8
│ ├─job_link: https://api.dowjones...8/_explain
│ ├─job_state: JOB_STATE_DONE
│ ├─volume_estimate: 2,482,057
│ └─errors: <NoErrors>
└─samples: <NotRetrieved>


Snapshot Extraction
-------------------
Create a new snapshot and download to a local repository just require a few lines of code.

.. code-block:: python

from factiva.analytics import Snapshot
my_query = "publication_datetime >= '2020-01-01 00:00:00' AND LOWER(language_code) = 'en'"
my_snapshot = Snapshot(
from factiva.analytics import SnapshotExtraction
my_query = "publication_datetime >= '2023-01-01 00:00:00' AND UPPER(source_code) = 'DJDN'"
my_snapshot = SnapshotExtraction(
user_key='abcd1234abcd1234abcd1234abcd1234', # Can be ommited if exist as env variable
query=my_query)
my_snapshot.process_extract() # This operation can take several minutes to complete
my_snapshot.process_job() # This operation can take several minutes to complete

After the process completes, the output files are stored in a subfolder named as the Extraction Job ID.

In the previous code a new snapshot is created using my_query as selection criteria and user_key for user authentication. After the job is being validated internally, a Snapshot Id is obtained along with the list of files to download. Files are automatically downloaded to a folder named equal to the snapshot ID, and contents are loaded as a Pandas DataFrame to the variable news_articles. This process may take several minutes, but automates the extraction process significantly.

Streams
-------
Create a stream instance and get the details to configure the stream client and listen the content as it is delivered.

.. code-block:: python

from factiva.analytics import Stream

stream_query = Stream(
user_key='abcd1234abcd1234abcd1234abcd1234', # Can be ommited if exist as env variable
user_key_stats=True,
query="publication_datetime >= '2021-04-01 00:00:00' AND LOWER(language_code)='en' AND UPPER(source_code) = 'DJDN'",
)

print(stream_query.create())
4 changes: 4 additions & 0 deletions docs/source/concepts/articlefetch.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Article Fetch
=============

ArticleFetch operations tutorial
4 changes: 0 additions & 4 deletions docs/source/concepts/articleretrieval.rst

This file was deleted.

Loading
Loading