Skip to content

Commit 1b081a7

Browse files
committed
Avoid checking whole list everytime to reduce transfer cost to DB
1 parent f667aeb commit 1b081a7

File tree

4 files changed

+17
-4
lines changed

4 files changed

+17
-4
lines changed

.github/workflows/probe.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,13 @@ jobs:
3535
python-version: 3.9
3636
cache: 'pip'
3737

38+
- name: Python Runtime Cache
39+
id: python-runtime-cache
40+
uses: actions/cache@v3
41+
with:
42+
path: ${{ env.pythonLocation }}
43+
key: ${{ runner.os }}-${{ env.pythonLocation }}-${{ hashFiles('requirements.txt') }}
44+
3845
- name: Install Python Dependencies
3946
run: pip install --upgrade -r requirements.txt
4047

.github/workflows/static.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ on:
1313
types: [ generate-gh-pages ]
1414

1515
schedule:
16-
- cron: "*/20 * * * *"
16+
- cron: "*/15 * * * *"
1717

1818
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
1919
permissions:
@@ -40,6 +40,7 @@ jobs:
4040
steps:
4141
- name: My Host
4242
run: |
43+
date
4344
sudo apt install -y -qq moreutils # https://unix.stackexchange.com/questions/26728/prepending-a-timestamp-to-each-line-of-output-from-a-command
4445
echo -e "free -h:\n`free -h`" | ts
4546
echo
@@ -51,7 +52,7 @@ jobs:
5152
echo
5253
echo "nproc: `nproc`" | ts
5354
echo
54-
echo -e "curl -s ifconfig.me/all:\n`curl -s ifconfig.me/all`" | ts
55+
echo -e "curl -s ifconfig.me/all:\n`curl -s --max-time 30 ifconfig.me/all`" | ts
5556
5657
- name: Export vars to env
5758
env:

db/image.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# TODO: need a separate model?
22
import logging
33
import os
4+
import random
45
import time
56

67
from sqlalchemy import String, column, Values, select
@@ -21,7 +22,10 @@
2122
def expire():
2223
start = time.time()
2324
removed = 0
24-
for img_files in chunks(os.listdir(config.image_dir), 500):
25+
all_files = list(os.listdir(config.image_dir))
26+
random.shuffle(all_files) # avoid checking whole list everytime to reduce transfer cost to DB
27+
candidates = all_files[:1000]
28+
for img_files in chunks(candidates, 500):
2529
values = Values(column('name', String), name='v').data(list(map(lambda x: (x,), img_files)))
2630
stmt = select(values).join(Summary, Summary.image_name == values.c.name,
2731
isouter=True # Add this to implement left outer join
@@ -32,7 +36,7 @@ def expire():
3236
os.remove(os.path.join(config.image_dir, image_name[0]))
3337
removed += 1
3438
cost = (time.time() - start) * 1000
35-
logger.info(f'removed {removed} feature images, cost(ms): {cost:.2f}')
39+
logger.info(f'removed {removed}/{len(candidates)} feature images, cost(ms): {cost:.2f}')
3640

3741

3842
def chunks(lst, n):

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ openai==0.28.1
1717
torch==2.1.0
1818
bert-extractive-summarizer==0.10.1
1919
transformers==4.36.0
20+
numpy==1.26.4
2021
python-dotenv==1.0.0
2122
python_slugify==8.0.1
2223
sqlalchemy==2.0.21

0 commit comments

Comments
 (0)