Skip to content

Commit 7887bf7

Browse files
authored
Merge branch 'master' into batch_processing
Signed-off-by: Xin Wang <xin.wang@fmr.com>
2 parents 8d993ae + c6a6829 commit 7887bf7

File tree

14 files changed

+4551
-3
lines changed

14 files changed

+4551
-3
lines changed

CONTRIBUTING.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Contributing to Seq2Pat
2+
3+
Thank you for contributing to Seq2Pat! This guide will help you get started and know what to expect. All contributions and project spaces are subject to our [Code of Conduct](https://github.com/fidelity/.github/blob/main/CODE_OF_CONDUCT.md).
4+
5+
We welcome all types of contributions, including:
6+
7+
* Code contributions
8+
* Bug reports
9+
* Responsibly disclosed security concerns
10+
* Documentation fixes
11+
* Feature requests and user stories (although we can't guarantee we'll get to all requests, it's helpful to know where we can improve)
12+
13+
If you end up using our library in a project, give us a star on GitHub!
14+
15+
Please note that we periodically fork upstream repos to stage contributions from Fidelity. We do not accept contributions against these forked repos, and request you make contributions against upstream projects directly.
16+
17+
If you have any questions, please contact [opensource@fmr.com](mailto:opensource@fmr.com).
18+
19+
## How to report a bug
20+
21+
Please [open an issue](https://github.com/fidelity/seq2pat/issues) **unless** you are making a significant security disclosure.
22+
23+
When reporting a bug, please start from a fresh pull of the default branch and document how you encountered the issue. Reports with insufficient detail and which we can't reproduce may be closed without action.
24+
25+
While bugs can be frustrating, we ask participants to contribute positively and professionally to the discourse. While we commit to take the contents of the report seriously, abusive behavior be will not be tolerated.
26+
27+
## How to disclose security concerns responsibly
28+
29+
Please follow the instructions in our [security policy](https://github.com/fidelity/.github/blob/main/SECURITY.md) (also visible in the Security tab on the project's repo).
30+
31+
## How to contribute documentation fixes
32+
33+
Minor documentation fixes can be submitted directly as a pull request without filing an issue in advance. More significant changes (e.g., refactoring to support a new documentation format, major reorganizations of content, etc.) should first be discussed in an issue to ensure everyone's time is used effectively.
34+
35+
When opening a PR or issue with a documentation change, please add a `documentation` label.
36+
37+
## How to request features or submit a user story
38+
39+
To request a feature please open an issue and tag it as `feature enhancement`. If you already have an implementation, please [link the pull request to the issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword).
40+
41+
Please include as much information and context as you can. Understanding how the feature solves a specific problem will help us prioritize the request. Please understand that we will not be able to provide an implementation timeline on all requests, although requests that include an implementation are more likely to land sooner.
42+
43+
If you won't do the work yourself, please also add a `good first issue` or `help wanted` label. These are special issue tags which are intended to help new and existing contributors get involved in a meaningful and accessible way.
44+
45+
* `good first issue` - Small changes that are suitable for a beginner
46+
* `help wanted` - More involved changes This will help match your request with others who are looking for a way to get involved.
47+
48+
## Code contributions
49+
50+
Code contributions are welcome in all of our projects as long as you follow a few rules:
51+
* With any piece of code, please adhere to PEP-8 standards.
52+
* If you're fixing an issue with an existing piece of code, please make sure all the tests pass, and there is no change in functionality.
53+
* If you want to add a new feature, please open up an issue first.
54+
* When adding a new feature, make sure you have relevant test coverage.
55+
* Any changes to the public API should conform to the current standards, be properly documented, typed, and be intuitive.
56+
* Your contribution must be received under the project's open source license.
57+
* You must have permission to make the contribution. We strongly recommend including a Signed-off-by line to indicate your adherence to the [Developer Certificate of Origin](https://developercertificate.org/).
58+
* All code contributions must be made via PR, and all checks must pass before merging.
59+
60+
While not strictly necessary, we encourage you to open an issue prior to your pull request to let the project know to expect your code. This helps the team plan for the next release and may result in your feature being a higher priority, and also decreases the likelihood of two independent contributions that do the same thing.
61+
62+
## Documentation contributions
63+
64+
* Make sure you follow the standards set by the rest of the repo.
65+
* Be concise, but do not omit details. Verbose documentation is preferred to incomplete documentation.
66+
67+
## Getting started (and helping others find their footing)
68+
69+
Anyone may open an issue and apply a `good first issue` or `help wanted` label for others to work on. We only ask that when someone else picks up your issue and decides to work on it that you be responsive to their questions.
70+
71+
## Getting help
72+
73+
If you have other questions about this project, please [open an issue](https://github.com/fidelity/seq2pat/issues). To reach the Fidelity OSPO directly, please email [opensource@fmr.com](mailto:opensource@fmr.com).

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,9 +75,9 @@ aggregation_to_patterns = dichotomic_pattern_mining(seq2pat_pos, seq2pat_neg,
7575
# see also intersection, unique_pos, and unique_neq
7676
dpm_patterns = aggregation_to_patterns[DichotomicAggregation.union]
7777

78-
# Most interestingly, we can generate features from DRPM patterns (pat2feat)
79-
# to create machine learning models in downstream tasks, e.g., intent prediction
80-
# To do that, we can the input sequences into one-hot feature vectors
78+
# Most interestingly, we can generate features from DPM patterns via pat2feat
79+
# These features can be used in ML for downstream tasks, e.g., intent prediction
80+
# To do that, we turn the input sequences into one-hot feature vectors
8181
# Binary features denote existence of found patterns in each sequence
8282
pat2feat = Pat2Feat()
8383
sequences = sequences_pos + sequences_neg
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Evaluations of ML models using Seq2Pat features
2+
This directory contains the source code and notebooks to train and evaluate various ML models using Seq2Pat features.
3+
This source repo has been used to implement the experiments in our papers:
4+
* [[AAAI-IAAI'22] Seq2Pat: Sequence-to-Pattern Generation for Constraint-based Sequential Pattern Mining](https://ojs.aaai.org/index.php/AAAI/article/view/21542),
5+
* [[KDF'22] Dichotomic Pattern Mining with Applications to Intent Prediction from Semi-Structured Clickstream Datasets](https://arxiv.org/abs/2201.09178),
6+
* [[Frontiers in AI 2022] Dichotomic Pattern Mining Integrated With Constraint Reasoning for Digital Behavior Analysis](https://www.frontiersin.org/articles/10.3389/frai.2022.868085/full).
7+
8+
## Running on sample data
9+
We are running on the same sample dataset that has been introduced in the `dichotomic_pattern_mining` notebook, while the generated features are combined with the original sequences data for training the downstream ML models.
10+
The new sample dataset containing features can be found in the `data` folder.
11+
12+
## Requirements
13+
Please run `pip install -r -q requriements.txt` to install the packages used in the source codes and notebooks.
14+
15+
## Compared ML models
16+
Please find the notebooks that are provided under the `notebooks` directory, to see how the following models using different combinations of features are trained and evaluated:
17+
18+
| MODEL | FEATURE SPACE |
19+
| ----------- | ----------- |
20+
| LightGBM | Seq2Pat Patterns |
21+
| Shallow_NN | Seq2Pat Patterns |
22+
| LSTM | Clickstream |
23+
| LSTM_seq2pat | Clickstream + Seq2Pat Patterns |
24+
25+
`scripts/benchmark.py` provides the implementations to run on multiple times of train/test data partition, in order to compare the averaged performances of above models.
26+
27+

0 commit comments

Comments
 (0)