Skip to content

python script to modify date format #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 319 commits into
base: upload-data-to-db
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
319 commits
Select commit Hold shift + click to select a range
ab57209
:rocket: Blogs Updated
actions-user Oct 5, 2021
541a1d3
:rocket: Blogs Updated
actions-user Oct 6, 2021
317fcd2
:rocket: Blogs Updated
actions-user Oct 7, 2021
a542bc5
:rocket: Blogs Updated
actions-user Oct 8, 2021
fbddae9
:rocket: Blogs Updated
actions-user Oct 9, 2021
09a1cce
:rocket: Blogs Updated
actions-user Oct 10, 2021
11f9a8d
:rocket: Blogs Updated
actions-user Oct 11, 2021
c75cb2d
:rocket: Blogs Updated
actions-user Oct 12, 2021
7008a13
:rocket: Blogs Updated
actions-user Oct 13, 2021
1a2a5bf
:rocket: Blogs Updated
actions-user Oct 14, 2021
fefc322
:rocket: Blogs Updated
actions-user Oct 15, 2021
4526537
:rocket: Blogs Updated
actions-user Oct 16, 2021
20147f6
:rocket: Blogs Updated
actions-user Oct 17, 2021
cb82177
:rocket: Blogs Updated
actions-user Oct 18, 2021
a09856f
:rocket: Blogs Updated
actions-user Oct 19, 2021
c2fd589
:rocket: Blogs Updated
actions-user Oct 20, 2021
ceb7e5b
:rocket: Blogs Updated
actions-user Oct 21, 2021
af5a92f
:rocket: Blogs Updated
actions-user Oct 22, 2021
e8b821c
:rocket: Blogs Updated
actions-user Oct 23, 2021
551d3b0
:rocket: Blogs Updated
actions-user Oct 24, 2021
94f9696
:rocket: Blogs Updated
actions-user Oct 25, 2021
88303c0
:rocket: Blogs Updated
actions-user Oct 26, 2021
44602a5
:rocket: Blogs Updated
actions-user Oct 27, 2021
5d79270
:rocket: Blogs Updated
actions-user Oct 28, 2021
d46f636
:rocket: Blogs Updated
actions-user Oct 29, 2021
f024f3b
:rocket: Blogs Updated
actions-user Oct 30, 2021
5dce8e2
:rocket: Blogs Updated
actions-user Oct 31, 2021
19f9d51
:rocket: Blogs Updated
actions-user Nov 1, 2021
0c30173
:rocket: Blogs Updated
actions-user Nov 2, 2021
9fd61b5
:rocket: Blogs Updated
actions-user Nov 3, 2021
0dd2a38
:rocket: Blogs Updated
actions-user Nov 4, 2021
9012d27
:rocket: Blogs Updated
actions-user Nov 5, 2021
390e4c5
:rocket: Blogs Updated
actions-user Nov 6, 2021
9a6d1a5
:rocket: Blogs Updated
actions-user Nov 7, 2021
0b38193
:rocket: Blogs Updated
actions-user Nov 8, 2021
64c7570
:rocket: Blogs Updated
actions-user Nov 9, 2021
8e6eee7
:rocket: Blogs Updated
actions-user Nov 10, 2021
ea29186
:rocket: Blogs Updated
actions-user Nov 11, 2021
9145ad7
:rocket: Blogs Updated
actions-user Nov 12, 2021
22490ba
:rocket: Blogs Updated
actions-user Nov 13, 2021
7ad1546
:rocket: Blogs Updated
actions-user Nov 14, 2021
c1df173
:rocket: Blogs Updated
actions-user Nov 15, 2021
a25dd48
:rocket: Blogs Updated
actions-user Nov 16, 2021
409e2a7
:rocket: Blogs Updated
actions-user Nov 17, 2021
f2a908b
:rocket: Blogs Updated
actions-user Nov 18, 2021
68c72ec
:rocket: Blogs Updated
actions-user Nov 19, 2021
9e21a7e
:rocket: Blogs Updated
actions-user Nov 20, 2021
45baaf3
:rocket: Blogs Updated
actions-user Nov 21, 2021
534d82f
:rocket: Blogs Updated
actions-user Nov 22, 2021
68136b5
:rocket: Blogs Updated
actions-user Nov 23, 2021
efddbf7
:rocket: Blogs Updated
actions-user Nov 24, 2021
e568cce
:rocket: Blogs Updated
actions-user Nov 25, 2021
c72876d
:rocket: Blogs Updated
actions-user Nov 26, 2021
6b08601
:rocket: Blogs Updated
actions-user Nov 27, 2021
8652ae2
:rocket: Blogs Updated
actions-user Nov 28, 2021
0a46b05
:rocket: Blogs Updated
actions-user Nov 29, 2021
fecd1ae
:rocket: Blogs Updated
actions-user Nov 30, 2021
e020392
:rocket: Blogs Updated
actions-user Dec 1, 2021
2090689
:rocket: Blogs Updated
actions-user Dec 2, 2021
bfbf367
:rocket: Blogs Updated
actions-user Dec 3, 2021
ab5c87a
:rocket: Blogs Updated
actions-user Dec 4, 2021
07deded
:rocket: Blogs Updated
actions-user Dec 5, 2021
1c00994
:rocket: Blogs Updated
actions-user Dec 6, 2021
3c9fef4
:rocket: Blogs Updated
actions-user Dec 7, 2021
bd6a87f
:rocket: Blogs Updated
actions-user Dec 8, 2021
b3a31f3
:rocket: Blogs Updated
actions-user Dec 9, 2021
2b40703
:rocket: Blogs Updated
actions-user Dec 10, 2021
ea3c316
:rocket: Blogs Updated
actions-user Dec 11, 2021
564cf6d
:rocket: Blogs Updated
actions-user Dec 12, 2021
5d50d91
:rocket: Blogs Updated
actions-user Dec 13, 2021
0ab7489
:rocket: Blogs Updated
actions-user Dec 14, 2021
a9efcb6
:rocket: Blogs Updated
actions-user Dec 15, 2021
7d100f4
:rocket: Blogs Updated
actions-user Dec 16, 2021
53a9199
:rocket: Blogs Updated
actions-user Dec 17, 2021
a87563a
:rocket: Blogs Updated
actions-user Dec 18, 2021
e536616
:rocket: Blogs Updated
actions-user Dec 19, 2021
225173e
:rocket: Blogs Updated
actions-user Dec 20, 2021
aa30fd3
:rocket: Blogs Updated
actions-user Dec 21, 2021
684f126
:rocket: Blogs Updated
actions-user Dec 22, 2021
117c493
:rocket: Blogs Updated
actions-user Dec 23, 2021
beaba39
:rocket: Blogs Updated
actions-user Dec 24, 2021
433b42f
:rocket: Blogs Updated
actions-user Dec 25, 2021
5df5b2f
:rocket: Blogs Updated
actions-user Dec 26, 2021
43039d3
:rocket: Blogs Updated
actions-user Dec 28, 2021
f259f47
:rocket: Blogs Updated
actions-user Dec 29, 2021
8fb6691
:rocket: Blogs Updated
actions-user Dec 30, 2021
a4a4488
:rocket: Blogs Updated
actions-user Dec 31, 2021
9267823
:rocket: Blogs Updated
actions-user Jan 1, 2022
e7b3158
:rocket: Blogs Updated
actions-user Jan 2, 2022
09f7774
:rocket: Blogs Updated
actions-user Jan 3, 2022
ae311f8
:rocket: Blogs Updated
actions-user Jan 4, 2022
fea3a75
:rocket: Blogs Updated
actions-user Jan 5, 2022
f0bdb77
:rocket: Blogs Updated
actions-user Jan 6, 2022
5925c88
:rocket: Blogs Updated
actions-user Jan 7, 2022
dab4254
:rocket: Blogs Updated
actions-user Jan 8, 2022
743f744
:rocket: Blogs Updated
actions-user Jan 9, 2022
a4d472b
:rocket: Blogs Updated
actions-user Jan 10, 2022
be6fab7
:rocket: Blogs Updated
actions-user Jan 11, 2022
aa902b9
:rocket: Blogs Updated
actions-user Jan 12, 2022
7f8d2e6
:rocket: Blogs Updated
actions-user Jan 13, 2022
a2df804
:rocket: Blogs Updated
actions-user Jan 14, 2022
2e03a2e
:rocket: Blogs Updated
actions-user Jan 15, 2022
7eb84cb
:rocket: Blogs Updated
actions-user Jan 16, 2022
3d18e1e
:rocket: Blogs Updated
actions-user Jan 17, 2022
042a74f
:rocket: Blogs Updated
actions-user Jan 18, 2022
282d703
:rocket: Blogs Updated
actions-user Jan 19, 2022
d56bec0
:rocket: Blogs Updated
actions-user Jan 20, 2022
b3ae335
:rocket: Blogs Updated
actions-user Jan 21, 2022
fb7f847
:rocket: Blogs Updated
actions-user Jan 22, 2022
ab769c7
:rocket: Blogs Updated
actions-user Jan 23, 2022
ae603be
:rocket: Blogs Updated
actions-user Jan 24, 2022
d1a97f8
:rocket: Blogs Updated
actions-user Jan 25, 2022
c5d2ee7
:rocket: Blogs Updated
actions-user Jan 26, 2022
db153b2
:rocket: Blogs Updated
actions-user Jan 27, 2022
60421b1
:rocket: Blogs Updated
actions-user Jan 28, 2022
87247c6
:rocket: Blogs Updated
actions-user Jan 29, 2022
6cca1f8
:rocket: Blogs Updated
actions-user Jan 30, 2022
81bb039
:rocket: Blogs Updated
actions-user Jan 31, 2022
57fc5a6
:rocket: Blogs Updated
actions-user Feb 1, 2022
368bdb9
:rocket: Blogs Updated
actions-user Feb 2, 2022
7769907
:rocket: Blogs Updated
actions-user Feb 3, 2022
e96dd6f
:rocket: Blogs Updated
actions-user Feb 4, 2022
5051b60
:rocket: Blogs Updated
actions-user Feb 5, 2022
3e4dd09
:rocket: Blogs Updated
actions-user Feb 6, 2022
85a9e03
:rocket: Blogs Updated
actions-user Feb 7, 2022
7fbac90
:rocket: Blogs Updated
actions-user Feb 8, 2022
28d3f40
:rocket: Blogs Updated
actions-user Feb 9, 2022
5f38ff7
:rocket: Blogs Updated
actions-user Feb 10, 2022
93221f5
:rocket: Blogs Updated
actions-user Feb 11, 2022
338d10f
:rocket: Blogs Updated
actions-user Feb 12, 2022
5938b32
:rocket: Blogs Updated
actions-user Feb 13, 2022
39bfaff
:rocket: Blogs Updated
actions-user Feb 14, 2022
ba8d8da
:rocket: Blogs Updated
actions-user Feb 15, 2022
391840c
:rocket: Blogs Updated
actions-user Feb 16, 2022
3f6f6b1
:rocket: Blogs Updated
actions-user Feb 17, 2022
6d1b530
:rocket: Blogs Updated
actions-user Feb 18, 2022
e0b5f50
:rocket: Blogs Updated
actions-user Feb 19, 2022
27c7300
:rocket: Blogs Updated
actions-user Feb 20, 2022
1f6fa52
:rocket: Blogs Updated
actions-user Feb 21, 2022
c2a1b37
:rocket: Blogs Updated
actions-user Feb 22, 2022
ed268e6
:rocket: Blogs Updated
actions-user Feb 23, 2022
bcf9f1a
:rocket: Blogs Updated
actions-user Feb 24, 2022
9eb2d1a
:rocket: Blogs Updated
actions-user Feb 25, 2022
25a8bee
:rocket: Blogs Updated
actions-user Feb 26, 2022
e2c165c
:rocket: Blogs Updated
actions-user Feb 27, 2022
b108a15
:rocket: Blogs Updated
actions-user Feb 28, 2022
dbde8eb
:rocket: Blogs Updated
actions-user Mar 1, 2022
49f99cf
:rocket: Blogs Updated
actions-user Mar 2, 2022
936181b
:rocket: Blogs Updated
actions-user Mar 3, 2022
2cf01f3
:rocket: Blogs Updated
actions-user Mar 4, 2022
9befa74
:rocket: Blogs Updated
actions-user Mar 5, 2022
e06392f
:rocket: Blogs Updated
actions-user Mar 6, 2022
a6323b9
:rocket: Blogs Updated
actions-user Mar 7, 2022
ac5a066
:rocket: Blogs Updated
actions-user Mar 8, 2022
df293a1
:rocket: Blogs Updated
actions-user Mar 9, 2022
329a75e
:rocket: Blogs Updated
actions-user Mar 10, 2022
16b16cd
:rocket: Blogs Updated
actions-user Mar 11, 2022
f0d46a7
:rocket: Blogs Updated
actions-user Mar 12, 2022
94e5af9
:rocket: Blogs Updated
actions-user Mar 13, 2022
3357748
:rocket: Blogs Updated
actions-user Mar 14, 2022
692e9b4
:rocket: Blogs Updated
actions-user Mar 15, 2022
19647a9
:rocket: Blogs Updated
actions-user Mar 16, 2022
55f7ee3
:rocket: Blogs Updated
actions-user Mar 17, 2022
417eda6
:rocket: Blogs Updated
actions-user Mar 18, 2022
85b9ab1
:rocket: Blogs Updated
actions-user Mar 19, 2022
b11a1f6
:rocket: Blogs Updated
actions-user Mar 20, 2022
9f2a553
:rocket: Blogs Updated
actions-user Mar 21, 2022
13dc9ad
:rocket: Blogs Updated
actions-user Mar 22, 2022
02881d2
:rocket: Blogs Updated
actions-user Mar 23, 2022
4ee2a4d
:rocket: Blogs Updated
actions-user Mar 24, 2022
390ee86
:rocket: Blogs Updated
actions-user Mar 25, 2022
cbcd12b
:rocket: Blogs Updated
actions-user Mar 26, 2022
3e0c21c
:rocket: Blogs Updated
actions-user Mar 27, 2022
5f1d341
:rocket: Blogs Updated
actions-user Mar 28, 2022
126dc56
:rocket: Blogs Updated
actions-user Mar 29, 2022
adbdcf2
:rocket: Blogs Updated
actions-user Mar 30, 2022
a0d3f12
:rocket: Blogs Updated
actions-user Mar 31, 2022
ae6e26c
:rocket: Blogs Updated
actions-user Apr 1, 2022
b532d0e
:rocket: Blogs Updated
actions-user Apr 2, 2022
7356f97
:rocket: Blogs Updated
actions-user Apr 3, 2022
cfddbdc
:rocket: Blogs Updated
actions-user Apr 4, 2022
6c18efd
:rocket: Blogs Updated
actions-user Apr 5, 2022
faf99d7
:rocket: Blogs Updated
actions-user Apr 6, 2022
872f9cd
:rocket: Blogs Updated
actions-user Apr 7, 2022
c3ddde7
:rocket: Blogs Updated
actions-user Apr 8, 2022
e9febb5
:rocket: Blogs Updated
actions-user Apr 9, 2022
46b65ba
:rocket: Blogs Updated
actions-user Apr 10, 2022
2226356
:rocket: Blogs Updated
actions-user Apr 11, 2022
38337f8
:rocket: Blogs Updated
actions-user Apr 12, 2022
2dfb26c
:rocket: Blogs Updated
actions-user Apr 13, 2022
c90b797
:rocket: Blogs Updated
actions-user Apr 14, 2022
7821d1c
:rocket: Blogs Updated
actions-user Apr 15, 2022
50411a2
:rocket: Blogs Updated
actions-user Apr 16, 2022
efaa462
:rocket: Blogs Updated
actions-user Apr 17, 2022
a8a65ce
:rocket: Blogs Updated
actions-user Apr 18, 2022
8693acd
:rocket: Blogs Updated
actions-user Apr 19, 2022
9cbc17d
:rocket: Blogs Updated
actions-user Apr 20, 2022
84a01fc
:rocket: Blogs Updated
actions-user Apr 21, 2022
6c1f75e
:rocket: Blogs Updated
actions-user Apr 22, 2022
e3dcb4e
:rocket: Blogs Updated
actions-user Apr 23, 2022
b386435
:rocket: Blogs Updated
actions-user Apr 24, 2022
b533913
:rocket: Blogs Updated
actions-user Apr 25, 2022
7524cbc
:rocket: Blogs Updated
actions-user Apr 26, 2022
49c192c
:rocket: Blogs Updated
actions-user Apr 27, 2022
14b93cf
:rocket: Blogs Updated
actions-user Apr 28, 2022
85f38bb
:rocket: Blogs Updated
actions-user Apr 29, 2022
4e252ea
:rocket: Blogs Updated
actions-user Apr 30, 2022
9bff413
:rocket: Blogs Updated
actions-user May 1, 2022
d28ae0a
:rocket: Blogs Updated
actions-user May 2, 2022
619b46f
:rocket: Blogs Updated
actions-user May 3, 2022
15aea30
:rocket: Blogs Updated
actions-user May 4, 2022
a8e09dc
:rocket: Blogs Updated
actions-user May 5, 2022
8775c09
:rocket: Blogs Updated
actions-user May 6, 2022
729f5ca
:rocket: Blogs Updated
actions-user May 7, 2022
15b7699
:rocket: Blogs Updated
actions-user May 8, 2022
24a335c
:rocket: Blogs Updated
actions-user May 9, 2022
5d6a757
:rocket: Blogs Updated
actions-user May 10, 2022
6444fac
:rocket: Blogs Updated
actions-user May 11, 2022
79f3ee6
:rocket: Blogs Updated
actions-user May 12, 2022
83cef3c
:rocket: Blogs Updated
actions-user May 13, 2022
525af6d
:rocket: Blogs Updated
actions-user May 14, 2022
b1a02bc
:rocket: Blogs Updated
actions-user May 15, 2022
551a6dd
:rocket: Blogs Updated
actions-user May 16, 2022
944bf3b
:rocket: Blogs Updated
actions-user May 17, 2022
8e0c083
:rocket: Blogs Updated
actions-user May 18, 2022
f7090d4
:rocket: Blogs Updated
actions-user May 19, 2022
aa8e64f
:rocket: Blogs Updated
actions-user May 20, 2022
e06bcce
:rocket: Blogs Updated
actions-user May 21, 2022
20430f4
:rocket: Blogs Updated
actions-user May 22, 2022
eb4dd05
:rocket: Blogs Updated
actions-user May 23, 2022
2bd68aa
:rocket: Blogs Updated
actions-user May 24, 2022
104e944
:rocket: Blogs Updated
actions-user May 25, 2022
973552d
:rocket: Blogs Updated
actions-user May 26, 2022
1737365
:rocket: Blogs Updated
actions-user May 27, 2022
fa116c1
:rocket: Blogs Updated
actions-user May 28, 2022
fa5046d
:rocket: Blogs Updated
actions-user May 29, 2022
fc6828f
:rocket: Blogs Updated
actions-user May 30, 2022
529b53e
:rocket: Blogs Updated
actions-user May 31, 2022
918d942
:rocket: Blogs Updated
actions-user Jun 1, 2022
84eb329
:rocket: Blogs Updated
actions-user Jun 2, 2022
460c4ef
:rocket: Blogs Updated
actions-user Jun 3, 2022
4a2d864
:rocket: Blogs Updated
actions-user Jun 4, 2022
d8c1f32
:rocket: Blogs Updated
actions-user Jun 5, 2022
35103ce
:rocket: Blogs Updated
actions-user Jun 6, 2022
676dd36
:rocket: Blogs Updated
actions-user Jun 7, 2022
047275f
:rocket: Blogs Updated
actions-user Jun 8, 2022
7df650e
:rocket: Blogs Updated
actions-user Jun 9, 2022
b89111a
:rocket: Blogs Updated
actions-user Jun 10, 2022
7e3402d
:rocket: Blogs Updated
actions-user Jun 11, 2022
648948d
:rocket: Blogs Updated
actions-user Jun 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/blogs-data.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Automated WebScraping & Uploading of Blogs

on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *" # runs every day at 0:00

jobs:
scrape-latest:
runs-on: ubuntu-latest
steps:

- name: Checkout repo
uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.8'
- name: Install dependencies & execute script
run: |
python -m pip install --upgrade pip
python -m pip install yake
python -m pip install scrapy
python -m pip install pymongo
python -m pip install bs4
python -m pip install dnspython
ls -la
cd ./Auto_Update_Data
python Auto_Update_Data/spiders/Auto-Update.py
- name: Adding Log Files
run: |
git config --local user.email "[email protected]"
git config --local user.name "GitHub Action"
git pull
git add .
git commit -m ":rocket: Blogs Updated"
git push

51 changes: 51 additions & 0 deletions .github/workflows/papers-data.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: Automated Uploading of Papers Data

on:
workflow_dispatch:
# schedule:
# - cron: "0 4 * * SUN" # runs every Sunday at 04:00 UTC

jobs:
scrape-latest:
runs-on: ubuntu-latest
steps:

- name: Checkout repo
uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.8'
- name: Downloading Papers Dataset from Kaggle
run: |
python -m pip install --upgrade pip
python -m pip install kaggle
kaggle datasets download Cornell-University/arxiv
env:
KAGGLE_USERNAME: ${{ secrets.KAGGLEUSERNAME }}
KAGGLE_KEY: ${{ secrets.KAGGLEKEY }}
- name: Unzip Downloaded File
uses: montudor/action-zip@v1
with:
args: unzip -qq arxiv.zip -d arxiv
- name: Moving Files to Data Upload Directory
run: |
ls -la
sudo mv ./arxiv/arxiv-metadata-oai-snapshot.json ./data-upload
- name: Installing Dependencies
run: |
python -m pip install --upgrade pip
python -m pip install dask[bag] --upgrade
python -m pip install pymongo dnspython pdfplumber uuid yake pandas tqdm
- name: Executing Python Script to Upload Papers
run: |
cd ./data-upload
python PaperUpdater.py
- name: Adding Log Files
run: |
git config --local user.email "[email protected]"
git config --local user.name "GitHub Action"
git pull
git add .
git commit -m ":rocket: Papers Updated"
git push
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
303 changes: 303 additions & 0 deletions Auto_Update_Data/Auto_Update_Data/appendkeywords.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,303 @@
# -*- coding: utf-8 -*-
"""AppendKeywords.ipynb

Automatically generated by Colaboratory.

Original file is located at
https://colab.research.google.com/drive/1ytvi7MyZlGZfUhinWHJiqKe3agIyW4_x
"""


# Dictionary of keywords
# Key: Searching words
# Value: Displayed words

keywords = {"Machine Learning": "Machine Learning",
"Supervised Learning": "Supervised Learning",
"Unsupervised Learning": "Unsupervised Learning",
"Multilabel Classification": "Multilabel Classification",
"Clustering": "Clustering",
"K-Means": "K-Means",
"DBSCAN": "DBSCAN",
"Hierarchical Clustering": "Hierarchical Clustering",
"Deep Learning": "Deep Learning",
"Data Mining": "Data Mining",
"Linear regression": "Linear regression",
"Logistic regression": "Logistic regression",
"SVM": "SVM",
"Natural Language Processing": "Natural Language Processing",
"Computer Vision": "Computer Vision",
"KNN": "KNN",
"Random forest": "Random forest",
"Decision Tree": "Decision Tree",
"Regularization": "Regularization",
"Ensemble Learning": "Ensemble Learning",
"Gradient Boosting": "Gradient Boosting",
"Feature Selection": "Feature Selection",
"Reinforcement Learning": "Reinforcement Learning",
"Virtual Reality": "Virtual Reality",
"Augmented reality": "Augmented reality",
"Autonomous driving": "Autonomous driving",
"Optics": "Optics",
"Biology": "Biology",
"C++": "C++",
"Java": "Java",
"Python": "Python",
"React JS": "React JS",
"Computer Network": "Computer Networks", # remove s
"Frontend": "Frontend",
"Backend": "Backend",
"High Scalability": "High Scalability",
"Cloud computing": "Cloud computing",
"Parallel Computing": "Parallel Computing",
"CUDA": "CUDA",
"Distributed System": "Distributed Systems", # remove s
"Apache ZooKeeper": "Apache ZooKeeper",
"Streaming analytic": "Streaming analytics",
"Model Selection": "Model Selection",
"Model Evaluation": "Model Evaluation",
"Apache Kafka": "Apache Kafka",
"HDFS": "HDFS",
"Amazon S3": "Amazon S3",
"Pub-Sub": "Pub-Sub",
"Leader Election": "Leader Election",
"Clock Synchronization": "Clock Synchronization",
"Graph": "Graphs", # remove s
"Information Retrieval": "Information Retrieval",
"SQL": "SQL",
"Graph Database": "Graph Database",
"Database Management": "Database Management",
"Storage": "Storage",
"Memor": "Memory",
"Garbage Collection": "Garbage Collection",
"Map-Reduce": "Map-Reduce",
"Network Protocol": "Network Protocols", # remove s
"Cyber Security": "Cyber Security",
"Assembly Language": "Assembly Language",
"Computational Complexity Theor": "Computational Complexity Theory",
"Computer Architecture": "Computer Architecture",
"Human-Computer Interface": "Human-Computer Interface",
"Data Structure": "Data Structures", # remove s
"Discrete Mathematic": "Discrete Mathematics",
"Hacking": "Hacking",
"Quantum Computing": "Quantum Computing",
"Robotic": "Robotics", # remove s
"Engineering Practice": "Engineering Practices", # remove s
"Software Tool": "Software Tools", # remove s
"Mathematical Logic": "Mathematical Logic",
"Graph Theor": "Graph Theory",
"Computational Geometr": "Computational Geometry",
"Compiler": "Compilers", # remove s
"Distributed Computing": "Distributed Computing",
"Software Engineering": "Software Engineering",
"Bioinformatic": "Bioinformatics", # remove s
"Computational Chemistry": "Computational Chemistry",
"Computational Neuroscience": "Computational Neuroscience",
"Computational physics": "Computational physics",
"Numerical algorithm": "Numerical algorithms", # remove s
"JavaScript": "JavaScript",
"HTML": "HTML",
"Web Development": "Web Development",
"App Development": "App Development",
"CSS": "CSS",
"PHP": "PHP",
"BlockChain": "BlockChain",
"Hardware": "Hardware",
"VLSI": "VLSI",
"Cluster Computing": "Cluster Computing",
"Kubernetes": "Kubernetes",
"Go": "Go-Lang",
"File System": "File Systems", # remove s
"Statistic": "Statistics", # remove s
"Optimization": "Optimization",
"Knowledge Graph": "Knowledge Graph",
"RNN": "RNN",
"CNN": "CNN",
"Physical Design": "Physical Design",
"Memory management": "Memory management",
"PCA": "PCA",
"LDA": "LDA",
"Feature Engineering": "Feature Engineering",
"Data manipulation": "Data manipulation",
"ACID": "ACID",
"BASE": "BASE",
"Consistency": "Consistency",
"Disaster recovery": "Disaster recovery",
"Replication": "Replication",
"Fault tolerance": "Fault tolerance",
"Deployment": "Deployment",
"Processor": "Processors", # remove s
"Multi-Threading": "Multi-Threading",
"Queue": "Queue",
"Stack": "Stack",
"Dynamic Programming": "Dynamic Programming",
"Graph Traversal": "Graph Traversal",
"Device": "Devices", # remove s
"Data analysis": "Data analysis",
"Probability": "Probability",
"Mathematic": "Mathematics", # remove s
"Genomic": "Genomics", # remove s
"Data Infrastructure": "Data Infrastructure",
"Software Principles and Practices": "Software Principles and Practices",
"Image Processing": "Image Processing",
"Audio Processing": "Audio Processing",
"Signal Processing": "Signal Processing",
"Pattern Recognition": "Pattern Recognition",
"Computation and Language": "Computation and Language",
"Artificial Intelligence": "Artificial Intelligence",
"Computation and Language": "Computation and Language",
"Computational Complexit": "Computational Complexity",
"Computational Engineering": "Computational Engineering",
"Finance": "Finance", # remove "and Science" from "Finance, and Science"
"Computational Geometry": "Computational Geometry",
"Game Theory": "Game Theory", # remove "Computer Science" from "Computer Science and Game Theory"
"Computer Vision": "Computer Vision", # break down from "Computer Vision and Pattern Recognition"
"Pattern Recognition": "Pattern Recognition", # break down from "Computer Vision and Pattern Recognition"
"Computers and Society": "Computers and Society",
"Cryptography and Security": "Cryptography and Security",
"Data Structure": "Data Structures", # break down from "Data Structures and Algorithms"
"Algorithm": "Algorithms", # break down from "Data Structures and Algorithms"
"Database": "Databases", # break down from "Databases; Digital Libraries"
"Digital Librar": "Digital Libraries", # break down from "Databases; Digital Libraries"
"Distributed Computing": "Distributed Computing", # break down from "Distributed, Parallel, and Cluster Computing"
"Parallel Computing": "Parallel Computing", # break down from "Distributed, Parallel, and Cluster Computing"
"Cluster Computing": "Cluster Computing", # break down from "Distributed, Parallel, and Cluster Computing"
"Emerging Technolog": "Emerging Technologies",
"Formal Language": "Formal Languages", # break down from "Formal Languages and Automata Theory"
"Automata Theory": "Automata Theory", # break down from "Formal Languages and Automata Theory"
"General Literature": "General Literature",
"Graphic": "Graphics", # remove s
"Human-Computer Interaction": "Human-Computer Interaction",
"Information Theory": "Information Theory",
"Logic in Computer Science": "Logic in Computer Science",
"Mathematical Software": "Mathematical Software",
"Multiagent System": "Multi-agent Systems", # remove s from "Systems"
"Multi-agent System": "Multi-agent Systems", # remove s from "Systems" and add -
"Multimedia": "Multimedia",
"Networking and Internet Architecture": "Networking and Internet Architecture",
"Neural and Evolutionary Computing": "Neural and Evolutionary Computing",
"Numerical Analysis": "Numerical Analysis",
"Operating System": "Operating Systems", # remove s from "Systems"
"Performance": "Performance",
"Programming Language": "Programming Languages", # remove s
"Social and Information Networks": "Social and Information Networks",
"Software Engineering": "Software Engineering",
"Sound": "Sound",
"Symbolic Computation": "Symbolic Computation",
"Systems and Control": "Systems and Control"
}

def countKeywords(text, keywords):
''' Count occurence of keywords in the text, return a dict of words and its occurence'''
d = {}
text = ' ' + text + ' '
# Abbreviations list
abbreviations = ["SVM", "KNN", "CUDA", "HDFS", "SQL", "HTML", "CSS", "PHP",
"VLSI", "RNN", "CNN", "PCA", "LDA", "ACID", "BASE"]

for search_word, display_word in keywords.items():
# 'Go' can be a sub-string of many words, to be precise, we'll search for the word " Go "
if search_word == "Go":
search_word = ' ' + search_word + ' '

# Lower the word if it's not an abbreviation
elif search_word not in abbreviations:
search_word = search_word.lower()

# Count occurence of searching word (case sensitive)
oc = text.count(search_word)

# Append to the dictionary if the word occurs 1 or more time
if oc > 0:
d[display_word] = oc

return d

from bs4 import BeautifulSoup
import re
import string
import copy

# Funtion to remove HTML tags
# def removeHTMLTags(text):
# return BeautifulSoup(text, 'html.parser').get_text()


# Function to remove more special characters and escape characters
def removeExtraWhitespaceEsc(text):
#pattern = r'^\s+$|\s+$'
pat = r'^\s*|\s\s*'
return re.sub(pat, ' ', text).strip()


# Function to remove commas and periods
def removeCommasPeriods(text):
pat = r'[.,]+'
return re.sub(pat, '', text)


# Function to remove words that include special character
def removeSpecialCharacterWords(text):
# define the pattern to keep only letters, numbers, dash and white spaces
pat = r'[a-zA-Z0-9]*[^a-zA-Z0-9_\s]+[a-zA-Z0-9]*'
return re.sub(pat, '', text)


def clean_data(text):
'''
Clean text
'''
#clean_text = removeHTMLTags(text)
clean_text = removeExtraWhitespaceEsc(text)
clean_text = removeCommasPeriods(clean_text)
clean_text = removeSpecialCharacterWords(clean_text)

return clean_text

import yake
def keywordsFromYAKE(text, numOfKeywords):
'''
Extracts keywords from text by using YAKE
'''

kw_extractor = yake.KeywordExtractor()
language = "en"
max_ngram_size = 2 # max number of words in generated keywords
deduplication_threshold = 0.1
numOfKeywords = numOfKeywords
custom_kw_extractor = yake.KeywordExtractor(lan=language, n=max_ngram_size, dedupLim=deduplication_threshold, top=numOfKeywords, features=None)
kws = custom_kw_extractor.extract_keywords(text)
result = [x for x, y in kws]

return result

def keywordsFromBlog(blog, keywords):
'''
Extracts keywords from a blog.
Dict blog: a dictionary of blog content
Dict keywords: a dictionary of searched word and displayed word
Return
A list of five words
'''

text = blog

N_KEYWORDS = 5

# Keywords from performing keywords matching
occ = countKeywords(text, keywords)

# get a list of top 5 words
result = list(dict(sorted(occ.items(), key=lambda x: x[1], reverse=True)).keys())[:N_KEYWORDS]
#print('kwm: {}'.format(result))

# If the result has less than 5 keywords then use YAKE
if len(result) < N_KEYWORDS:
n_left = N_KEYWORDS - len(result)
yake_kws = keywordsFromYAKE(clean_data(text), n_left)
#print('yake_kws: {}'.format(yake_kws))
result += [x for x in yake_kws if x not in result]

return result


Loading
Loading