Skip to content

Commit 476aed9

Browse files
Convert README from reStructuredText to Markdown
Replace README.rst with README.md for better GitHub and PyPI rendering. Markdown is now the standard format for Python project documentation and provides better readability and tooling support. The conversion maintains all existing content while using cleaner, more widely-adopted syntax: - Headers use # instead of underlines - Code blocks use triple backticks - Links use [text](url) format - Images use ![alt](url) format Both GitHub and PyPI render Markdown beautifully in 2026.
1 parent 339d727 commit 476aed9

File tree

2 files changed

+147
-93
lines changed

2 files changed

+147
-93
lines changed

README.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# scikit-sos
2+
3+
scikit-sos is a Python module for Stochastic Outlier Selection (SOS). It is compatible with scikit-learn. SOS is an unsupervised outlier selection algorithm. It uses the concept of affinity to compute an outlier probability for each data point.
4+
5+
![SOS](https://github.com/jeroenjanssens/scikit-sos/raw/master/doc/sos.png)
6+
7+
For more information about SOS, see the technical report: J.H.M. Janssens, F. Huszar, E.O. Postma, and H.J. van den Herik. [Stochastic Outlier Selection](https://github.com/jeroenjanssens/sos/blob/master/doc/sos-ticc-tr-2012-001.pdf?raw=true). Technical Report TiCC TR 2012-001, Tilburg University, Tilburg, the Netherlands, 2012.
8+
9+
## Install
10+
11+
Using pip:
12+
13+
```bash
14+
pip install scikit-sos
15+
```
16+
17+
Using uv (recommended for fast installation):
18+
19+
```bash
20+
# Install uv if not already installed
21+
curl -LsSf https://astral.sh/uv/install.sh | sh
22+
23+
# Install scikit-sos
24+
uv pip install scikit-sos
25+
```
26+
27+
## Development
28+
29+
This project uses modern Python tooling:
30+
31+
- **uv** for fast package management
32+
- **ruff** for linting and formatting
33+
- **mypy** for type checking
34+
- **pytest** for testing
35+
36+
To set up a development environment:
37+
38+
```bash
39+
# Clone repository
40+
git clone https://github.com/jeroenjanssens/scikit-sos.git
41+
cd scikit-sos
42+
43+
# Create virtual environment and install with dev dependencies
44+
uv venv
45+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
46+
uv pip install -e ".[dev]"
47+
48+
# Install pre-commit hooks
49+
pre-commit install
50+
```
51+
52+
Run tests:
53+
54+
```bash
55+
pytest
56+
```
57+
58+
Run linting:
59+
60+
```bash
61+
ruff check .
62+
```
63+
64+
Run formatting:
65+
66+
```bash
67+
ruff format .
68+
```
69+
70+
Run type checking:
71+
72+
```bash
73+
mypy sksos
74+
```
75+
76+
### Type Hints
77+
78+
This package includes full type hints for better IDE support:
79+
80+
```python
81+
from sksos import SOS
82+
import numpy as np
83+
from numpy.typing import NDArray
84+
85+
# Type hints work automatically
86+
detector: SOS = SOS(perplexity=20)
87+
data: NDArray = np.array([[1, 2], [3, 4]])
88+
scores: NDArray = detector.predict(data)
89+
```
90+
91+
## Usage
92+
93+
```python
94+
>>> import pandas as pd
95+
>>> from sksos import SOS
96+
>>> iris = pd.read_csv("http://bit.ly/iris-csv")
97+
>>> X = iris.drop("Name", axis=1).values
98+
>>> detector = SOS()
99+
>>> iris["score"] = detector.predict(X)
100+
>>> iris.sort_values("score", ascending=False).head(10)
101+
SepalLength SepalWidth PetalLength PetalWidth Name score
102+
41 4.5 2.3 1.3 0.3 Iris-setosa 0.981898
103+
106 4.9 2.5 4.5 1.7 Iris-virginica 0.964381
104+
22 4.6 3.6 1.0 0.2 Iris-setosa 0.957945
105+
134 6.1 2.6 5.6 1.4 Iris-virginica 0.897970
106+
24 4.8 3.4 1.9 0.2 Iris-setosa 0.871733
107+
114 5.8 2.8 5.1 2.4 Iris-virginica 0.831610
108+
62 6.0 2.2 4.0 1.0 Iris-versicolor 0.821141
109+
108 6.7 2.5 5.8 1.8 Iris-virginica 0.819842
110+
44 5.1 3.8 1.9 0.4 Iris-setosa 0.773301
111+
100 6.3 3.3 6.0 2.5 Iris-virginica 0.765657
112+
```
113+
114+
## Command Line Interface
115+
116+
This module also includes a command-line tool called `sos`. To illustrate, we apply SOS with a perplexity of 10 to the Iris dataset:
117+
118+
```bash
119+
$ curl -sL http://bit.ly/iris-csv |
120+
> tail -n +2 | cut -d, -f1-4 |
121+
> sos -p 10 |
122+
> sort -nr | head
123+
0.98189840
124+
0.96438132
125+
0.95794492
126+
0.89797043
127+
0.87173299
128+
0.83161045
129+
0.82114072
130+
0.81984209
131+
0.77330148
132+
0.76565738
133+
```
134+
135+
Adding a threshold causes SOS to output 0s and 1s instead of outlier probabilities. If we set the threshold to 0.8 then we see that out of the 150 data points, 8 are selected as outliers:
136+
137+
```bash
138+
$ curl -sL http://bit.ly/iris-csv |
139+
> tail -n +2 | cut -d, -f1-4 |
140+
> sos -p 10 -t 0.8 |
141+
> paste -sd+ | bc
142+
8
143+
```
144+
145+
## License
146+
147+
All software in this repository is distributed under the terms of the BSD Simplified License. The full license is in the LICENSE file.

README.rst

Lines changed: 0 additions & 93 deletions
This file was deleted.

0 commit comments

Comments
 (0)