Skip to content

Commit c74be0d

Browse files
WEB: Clean up Ecosystem page
1 parent 61d9976 commit c74be0d

File tree

1 file changed

+6
-157
lines changed

1 file changed

+6
-157
lines changed

web/pandas/community/ecosystem.md

Lines changed: 6 additions & 157 deletions
Original file line numberDiff line numberDiff line change
@@ -149,20 +149,6 @@ or MATLAB, modified in a GUI, or embedded in apps and dashboards. Plotly
149149
is free for unlimited sharing, and has cloud, offline, or on-premise
150150
accounts for private use.
151151

152-
### [Lux](https://github.com/lux-org/lux)
153-
154-
Lux is a Python library that facilitates fast and easy experimentation with data by automating the visual data exploration process. To use Lux, simply add an extra import alongside pandas:
155-
156-
```python
157-
import lux
158-
import pandas as pd
159-
160-
df = pd.read_csv("data.csv")
161-
df # discover interesting insights!
162-
```
163-
164-
By printing out a dataframe, Lux automatically [recommends a set of visualizations](https://raw.githubusercontent.com/lux-org/lux-resources/master/readme_img/demohighlight.gif) that highlights interesting trends and patterns in the dataframe. Users can leverage any existing pandas commands without modifying their code, while being able to visualize their pandas data structures (e.g., DataFrame, Series, Index) at the same time. Lux also offers a [powerful, intuitive language](https://lux-api.readthedocs.io/en/latest/source/guide/vis.html) that allow users to create Altair, matplotlib, or Vega-Lite visualizations without having to think at the level of code.
165-
166152
### [D-Tale](https://github.com/man-group/dtale)
167153

168154
D-Tale is a lightweight web client for visualizing pandas data structures. It
@@ -384,92 +370,14 @@ Use `pandas_gbq.read_gbq` and `pandas_gbq.to_gbq`, instead.
384370

385371
### [ArcticDB](https://github.com/man-group/ArcticDB)
386372

387-
ArcticDB is a serverless DataFrame database engine designed for the Python Data Science ecosystem. ArcticDB enables you to store, retrieve, and process pandas DataFrames at scale. It is a storage engine designed for object storage and also supports local-disk storage using LMDB. ArcticDB requires zero additional infrastructure beyond a running Python environment and access to object storage and can be installed in seconds. Please find full documentation [here](https://docs.arcticdb.io/latest/).
388-
389-
#### ArcticDB Terminology
390-
391-
ArcticDB is structured to provide a scalable and efficient way to manage and retrieve DataFrames, organized into several key components:
392-
393-
- `Object Store` Collections of libraries. Used to separate logical environments from each other. Analogous to a database server.
394-
- `Library` Contains multiple symbols which are grouped in a certain way (different users, markets, etc). Analogous to a database.
395-
- `Symbol` Atomic unit of data storage. Identified by a string name. Data stored under a symbol strongly resembles a pandas DataFrame. Analogous to tables.
396-
- `Version` Every modifying action (write, append, update) performed on a symbol creates a new version of that object.
397-
398-
#### Installation
399-
400-
To install, simply run:
401-
402-
```console
403-
pip install arcticdb
404-
```
405-
406-
To get started, we can import ArcticDB and instantiate it:
407-
408-
```python
409-
import arcticdb as adb
410-
import numpy as np
411-
import pandas as pd
412-
# this will set up the storage using the local file system
413-
arctic = adb.Arctic("lmdb://arcticdb_test")
414-
```
415-
416-
> **Note:** ArcticDB supports any S3 API compatible storage, including AWS. ArcticDB also supports Azure Blob storage.
417-
> ArcticDB also supports LMDB for local/file based storage - to use LMDB, pass an LMDB path as the URI: `adb.Arctic('lmdb://path/to/desired/database')`.
418-
419-
#### Library Setup
420-
421-
ArcticDB is geared towards storing many (potentially millions) of tables. Individual tables (DataFrames) are called symbols and are stored in collections called libraries. A single library can store many symbols. Libraries must first be initialized prior to use:
422-
423-
```python
424-
lib = arctic.get_library('sample', create_if_missing=True)
425-
```
426-
427-
#### Writing Data to ArcticDB
428-
429-
Now we have a library set up, we can get to reading and writing data. ArcticDB has a set of simple functions for DataFrame storage. Let's write a DataFrame to storage.
430-
431-
```python
432-
df = pd.DataFrame(
433-
{
434-
"a": list("abc"),
435-
"b": list(range(1, 4)),
436-
"c": np.arange(3, 6).astype("u1"),
437-
"d": np.arange(4.0, 7.0, dtype="float64"),
438-
"e": [True, False, True],
439-
"f": pd.date_range("20130101", periods=3)
440-
}
441-
)
442-
443-
df
444-
df.dtypes
445-
```
446-
447-
Write to ArcticDB.
448-
449-
```python
450-
write_record = lib.write("test", df)
451-
```
452-
453-
> **Note:** When writing pandas DataFrames, ArcticDB supports the following index types:
454-
>
455-
> - `pandas.Index` containing int64 (or the corresponding dedicated types Int64Index, UInt64Index)
456-
> - `RangeIndex`
457-
> - `DatetimeIndex`
458-
> - `MultiIndex` composed of above supported types
459-
>
460-
> The "row" concept in `head`/`tail` refers to the row number ('iloc'), not the value in the `pandas.Index` ('loc').
373+
ArcticDB is a serverless DataFrame database engine designed for the Python Data Science ecosystem.
374+
ArcticDB enables you to store, retrieve, and process pandas DataFrames at scale.
375+
It is a storage engine designed for object storage and also supports local-disk storage using LMDB.
376+
ArcticDB requires zero additional infrastructure beyond a running Python environment and access
377+
to object storage and can be installed in seconds.
461378

462-
#### Reading Data from ArcticDB
379+
Please find full documentation [here](https://docs.arcticdb.io/latest/).
463380

464-
Read the data back from storage:
465-
466-
```python
467-
read_record = lib.read("test")
468-
read_record.data
469-
df.dtypes
470-
```
471-
472-
ArcticDB also supports appending, updating, and querying data from storage to a pandas DataFrame. Please find more information [here](https://docs.arcticdb.io/latest/api/processing/#arcticdb.QueryBuilder).
473381

474382
### [Hugging Face](https://huggingface.co/datasets)
475383

@@ -522,35 +430,6 @@ def process_data():
522430
process_data()
523431
```
524432

525-
526-
### [Cylon](https://cylondata.org/)
527-
528-
Cylon is a fast, scalable, distributed memory parallel runtime with a pandas
529-
like Python DataFrame API. ”Core Cylon” is implemented with C++ using Apache
530-
Arrow format to represent the data in-memory. Cylon DataFrame API implements
531-
most of the core operators of pandas such as merge, filter, join, concat,
532-
group-by, drop_duplicates, etc. These operators are designed to work across
533-
thousands of cores to scale applications. It can interoperate with pandas
534-
DataFrame by reading data from pandas or converting data to pandas so users
535-
can selectively scale parts of their pandas DataFrame applications.
536-
537-
```python
538-
from pycylon import read_csv, DataFrame, CylonEnv
539-
from pycylon.net import MPIConfig
540-
541-
# Initialize Cylon distributed environment
542-
config: MPIConfig = MPIConfig()
543-
env: CylonEnv = CylonEnv(config=config, distributed=True)
544-
545-
df1: DataFrame = read_csv('/tmp/csv1.csv')
546-
df2: DataFrame = read_csv('/tmp/csv2.csv')
547-
548-
# Using 1000s of cores across the cluster to compute the join
549-
df3: Table = df1.join(other=df2, on=[0], algorithm="hash", env=env)
550-
551-
print(df3)
552-
```
553-
554433
### [Dask](https://docs.dask.org)
555434

556435
Dask is a flexible parallel computing library for analytics. Dask
@@ -590,36 +469,6 @@ import modin.pandas as pd
590469
df = pd.read_csv("big.csv") # use all your cores!
591470
```
592471

593-
### [Pandarallel](https://github.com/nalepae/pandarallel)
594-
595-
Pandarallel provides a simple way to parallelize your pandas operations on all your CPUs by changing only one line of code.
596-
It also displays progress bars.
597-
598-
```python
599-
from pandarallel import pandarallel
600-
601-
pandarallel.initialize(progress_bar=True)
602-
603-
# df.apply(func)
604-
df.parallel_apply(func)
605-
```
606-
607-
### [Vaex](https://vaex.io/docs/)
608-
609-
Increasingly, packages are being built on top of pandas to address
610-
specific needs in data preparation, analysis and visualization. Vaex is
611-
a python library for Out-of-Core DataFrames (similar to Pandas), to
612-
visualize and explore big tabular datasets. It can calculate statistics
613-
such as mean, sum, count, standard deviation etc, on an N-dimensional
614-
grid up to a billion (10^9) objects/rows per second. Visualization is
615-
done using histograms, density plots and 3d volume rendering, allowing
616-
interactive exploration of big data. Vaex uses memory mapping, zero
617-
memory copy policy and lazy computations for best performance (no memory
618-
wasted).
619-
620-
- ``vaex.from_pandas``
621-
- ``vaex.to_pandas_df``
622-
623472
### [Hail Query](https://hail.is/)
624473

625474
An out-of-core, preemptible-safe, distributed, dataframe library serving

0 commit comments

Comments
 (0)