Skip to content

elessarelfstone/advarchs

Repository files navigation

Advarchs: Data retrieval from remote archives

PyPI Version Supported Python Versions Build Status Wheel Status

Overview

Advarchs is simple tool for retrieving data from web archives. It is especially useful if you are working with remote data stored in compressed spreadsheets or of similar format.

Getting Started

Say you need to perform some data anlytics on an excel spreadsheet that gets refreshed every month and stored in RAR format. You can target a that file and convert it to a pandas dataframe with the following procedure:

import pd
import os
import tempfile
from advarchs import webfilename,extract_web_archive

TEMP_DIR = tempfile.gettempdir()

url = "http://www.site.com/archive.rar"
arch_file_name = webfilename(url)
arch_path = os.path.join(TEMP_DIR, arch_file_name)
xlsx_files = extract_web_archive(url, arch_path, ffilter=['xlsx'])
for xlsx_f in xlsx_files:
    xlsx = pd.ExcelFile(xlsx_f)

...

Requirements

  • Python 3.5+
  • p7zip

Special note

On CentOS and Ubuntu <= 16.04, the following packages are needed:

  • unrar

Installation

pip install advarchs

Contributing

See CONTRIBUTING

Code of Conduct

This project adheres to the Contributor Covenant 1.2. By participating, you are advised to adhere to this Code of Conduct in all your interactions with this project.

License

Apache-2.0

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages