Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
ec55ecd
chg: [cookiejar] import lacus local storage
Terrtia Apr 9, 2025
5e6002a
chg: [merge] merge master
Terrtia Oct 31, 2025
351f0d6
chg: [cookiejar] add UI local storage view + fix add cookies acl
Terrtia Nov 3, 2025
81fd358
new: [gpt] added claude-code and openai codex files
SteveClement Nov 9, 2025
0a219d8
fix: [cicd] Brought ci/cd to this century ;)
SteveClement Nov 9, 2025
0d58d0d
chg: [CI/CD] optimize pip cache useage to prevent no space left issues
SteveClement Nov 9, 2025
f0ed109
chg: [crawler] crawling request, user cookiejar local storage
Terrtia Nov 10, 2025
d4438ec
chg: [PDF] add support for PDF Files. processing + correlation + cont…
Terrtia Nov 13, 2025
7409e1a
chg: [file metadata] save pdf file metadata + new author object + cor…
Terrtia Nov 14, 2025
498fca5
chg: [pdf] add option to translate a PDF
Terrtia Nov 18, 2025
ee2a6fd
fix: [pdf] fix correlation show object
Terrtia Nov 18, 2025
11637e7
Add some more tests for crawler API endpoints
cavedave Nov 18, 2025
7666d8e
Fix GitHub Actions workflow: add disk cleanup and optimize matrix
cavedave Nov 19, 2025
00004ad
Merge pull request #614 from cavedave/feature/api-crawler-tests
Terrtia Nov 21, 2025
dd7522d
chg: [file-name] process file-name: trackers + modules
Terrtia Nov 25, 2025
a33ae37
Merge branch 'master' into local_storage
Terrtia Nov 25, 2025
cfe27f5
Merge branch 'master' into local_storage
Terrtia Nov 26, 2025
e0c241d
chg: [v6.6] add update
Terrtia Nov 26, 2025
0e457c8
fix: [api] remove crawler duplicate cookiejar importer
Terrtia Nov 26, 2025
6c26ada
Merge branch 'master' of github.com:CIRCL/AIL-framework
Terrtia Nov 26, 2025
3715306
fix: [pdf template] add missing template
Terrtia Nov 26, 2025
71e1253
fix: [Global + Tracker term] fix warning + content of invalid text mi…
Terrtia Nov 26, 2025
af73a12
fix: [v6.6 update] update pylacus
Terrtia Nov 26, 2025
b82d08b
chg: [tracker regex] improve perf
Terrtia Nov 26, 2025
a437cd6
fix: [file-name] fix tracker file-name filtering + template
Terrtia Nov 26, 2025
7dc83de
chg: [pdf card] show Author
Terrtia Nov 26, 2025
42dfdd8
fix: [file-name] fix base url
Terrtia Nov 26, 2025
5d1ce31
fix: [pdf] fix empty author set
Terrtia Nov 27, 2025
621fd3e
fix: [correlation] fix file-name -> pdf correlation
Terrtia Nov 27, 2025
56c6701
fix: [pdf] fix pdf limit size to 100 mb by default
Terrtia Nov 27, 2025
e91c4b6
fix: [pdf] fix max_pdf_size config name
Terrtia Nov 27, 2025
2947358
fix: [tracker] add space between : and url
Terrtia Nov 27, 2025
460355e
fix: [pdf] fix max_size_config
Terrtia Nov 27, 2025
888a7ff
fix: [trackers] fix remove object + improve url navigation
Terrtia Nov 27, 2025
e90ad71
fix: [test] fix crawler test rel path + kwargs
Terrtia Nov 27, 2025
99c37a0
fix: [Images/Screenshots] Add missing return statement in get_descrip…
cavedave Dec 1, 2025
7871665
Merge pull request #312 from cavedave/fix/images-screenshots-issues-v2
Terrtia Dec 2, 2025
7bb14dc
chg: [pdf + translation] improve pdf translation + save translated pd…
Terrtia Dec 4, 2025
1275478
Merge branch 'master' of github.com:ail-project/ail-framework
Terrtia Dec 4, 2025
4c1548e
chg: [pdf] reduce translated pdf size
Terrtia Dec 4, 2025
ebfb0b1
fix: [Translation module] fix launcher
Terrtia Dec 4, 2025
6af98ad
fix: [card pdf] hide author if None
Terrtia Dec 4, 2025
a611d6f
fix: [pdf] fix temp dir
Terrtia Dec 5, 2025
d8ffd55
chg [pdf] improve translation layout
Terrtia Dec 5, 2025
c63d410
fix: [pdf] fix downloaded pdfa name
Terrtia Dec 5, 2025
f694c5d
fix: [chat importer] None if image size > 5Mb
Terrtia Dec 5, 2025
2fc4fdd
chg: [onion lookup] hide titles fron unsafe domains
Terrtia Dec 12, 2025
817c426
chg: [trackers] match objs, add view pdf btn if correlation filename …
Terrtia Dec 16, 2025
50da50c
fix: [pdf] fix translation overlapping bbox
Terrtia Dec 18, 2025
96222d9
fix: [chat importer] fix none images if size image > limit
Terrtia Dec 19, 2025
4cf60c0
fix: [flask] fix error 500 logger for not logged users
Terrtia Dec 22, 2025
2b2dc55
fix: [tracker] fix none pdf btn when multiple pdfs correlate with the…
Terrtia Dec 22, 2025
7489c81
fix: [correlation engine] load direct correlations first in the corre…
Terrtia Dec 22, 2025
24ad7cf
chg: [show correlation] add btns: Show Direct Correlations (Level 0) …
Terrtia Dec 22, 2025
7d988da
chg: [onion lookup] remove non onion strings
Terrtia Jan 5, 2026
baeefbc
chg: [codereader] chg gif logger level
Terrtia Jan 5, 2026
ab09bbd
chg: [stats] add fct to reset feeders names
Terrtia Jan 8, 2026
870358f
chg: [tracker + retro hunt] item, only extract current tracker/retro …
Terrtia Jan 8, 2026
5fb8750
chg: [tracker + retro hunt] item, only extract current tracker/retro …
Terrtia Jan 8, 2026
c9d0508
fix: [item] fix extractor tracker uuid
Terrtia Jan 8, 2026
635f5db
Merge branch 'master' into CI/CD
SteveClement Jan 8, 2026
578d10b
chg: [installer] Check if a precompiled kvrocks can be installed
SteveClement Jan 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 41 additions & 3 deletions .github/workflows/ail_framework_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,27 @@ jobs:
# The type of runner that the job will run on
runs-on: ubuntu-latest

strategy:
matrix:
python-version: ['3.7', '3.8', '3.9', '3.10']
# TODO: Matrix strategy for Python versions is defined but never used.
# Currently all jobs use the same system Python, making this redundant.
# Either add 'actions/setup-python' to use matrix.python-version, or remove the matrix.
#
# To enable multi-version Python testing:
#
# Step 1: Uncomment the matrix below (defines the Python versions to test):
# strategy:
# matrix:
# python-version: ['3.7', '3.8', '3.9', '3.10']
#
# Step 2: Add this step after checkout (before "Free up disk space"):
# - name: Set up Python ${{ matrix.python-version }}
# uses: actions/setup-python@v4
# with:
# python-version: ${{ matrix.python-version }}
#
# ORIGINAL (commented out - not used, makes tests 4x slower with no benefit):
# strategy:
# matrix:
# python-version: ['3.7', '3.8', '3.9', '3.10']


# Steps represent a sequence of tasks that will be executed as part of the job
Expand All @@ -30,6 +48,26 @@ jobs:
submodules: 'recursive'
fetch-depth: 500

# ---------------------------------------------
# NEW STEP: clean up disk BEFORE installing deps
# ---------------------------------------------
- name: Free up disk space
run: |
echo "Disk usage BEFORE cleanup:"
df -h
# Safe: Clear APT cache and lists (can be regenerated)
sudo apt-get clean
sudo rm -rf /var/lib/apt/lists/*
# Probably safe: Remove tools AIL doesn't need (check if exist first)
[ -d /usr/share/dotnet ] && sudo rm -rf /usr/share/dotnet || true
[ -d /opt/ghc ] && sudo rm -rf /opt/ghc || true
[ -d /usr/local/lib/android ] && sudo rm -rf /usr/local/lib/android || true
# Risky but needed: Remove hosted tool cache (contains Python, Node, etc.)
# AIL workflow uses system Python, so this should be safe
[ -d /opt/hostedtoolcache ] && sudo rm -rf /opt/hostedtoolcache || true
echo "Disk usage AFTER cleanup:"
df -h
# ---------------------------------------------

# Runs a single command using the runners shell
- name: Install AIL
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ PASTES
CRAWLED_SCREENSHOT
IMAGES
FAVICONS
FILES
BASE64
HASHS
DATA_ARDB
Expand Down
4 changes: 3 additions & 1 deletion bin/LAUNCH.sh
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,8 @@ function launching_scripts {
sleep 0.1
screen -S "Script_AIL" -X screen -t "D4_client" bash -c "cd ${AIL_BIN}/core; ${ENV_PY} ./D4_client.py; read x"
sleep 0.1
screen -S "Script_AIL" -X screen -t "Translation" bash -c "cd ${AIL_BIN}/modules; ${ENV_PY} ./Translation.py; read x"
sleep 0.1

screen -S "Script_AIL" -X screen -t "UpdateBackground" bash -c "cd ${AIL_BIN}; ${ENV_PY} ./update-background.py; read x"
sleep 0.1
Expand Down Expand Up @@ -619,7 +621,7 @@ function launch_tests() {
echo -e $GREEN"\t* Flask: $isflasked"$DEFAULT
echo -e ""
echo -e ""
python3 -m nose2 --start-dir $tests_dir --coverage $bin_dir --with-coverage test_api test_modules
python3 -m nose2 --start-dir $tests_dir --coverage $bin_dir --with-coverage test_api test_modules test_api_crawler
exit $?
}

Expand Down
1 change: 1 addition & 0 deletions bin/crawlers/Crawler.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,7 @@ def enqueue_capture(self, task_uuid, priority):
user_agent=task.get_user_agent(),
proxy=task.get_proxy(),
cookies=task.get_cookies(),
storage=task.get_local_storage(),
with_favicon=True,
force=force,
general_timeout_in_sec=90) # TODO increase timeout if onion ????
Expand Down
2 changes: 1 addition & 1 deletion bin/exporter/MailExporter.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ def export(self, tracker, obj, matches=[]):
body += f'\nMatch {nb}: {match[0]}\nExtract:\n{match[1]}\n\n'
nb += 1

ail_link = f'AIL url:{obj.get_link()}\n\n'
ail_link = f'AIL url: {obj.get_link()}\n\n'
for mail in tracker.get_mails():
if ail_users.exists_user(mail):
body = ail_link + body
Expand Down
5 changes: 4 additions & 1 deletion bin/importer/feeders/Default.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ def get_json_meta(self):
def get_meta(self):
return self.json_data.get('meta')

def get_meta_field(self, field, default=None):
return self.json_data.get('meta', {}).get(field, default)

def get_payload(self):
return self.json_data.get('data')

Expand All @@ -77,7 +80,7 @@ def get_gzip64_content(self):
return self.json_data.get('data')

def get_obj_type(self):
meta = self.get_json_meta()
meta = self.get_meta()
return meta.get('type', 'item')

## OVERWRITE ME ##
Expand Down
85 changes: 67 additions & 18 deletions bin/importer/feeders/abstract_chats_feeder.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
import sys
import time

import pymupdf4llm

from abc import ABC

sys.path.append(os.environ['AIL_BIN'])
Expand All @@ -20,12 +22,14 @@
##################################
from importer.feeders.Default import DefaultFeeder
from lib.ail_core import get_chat_instance_name
from lib.objects import Authors
from lib.objects.Chats import Chat
from lib.objects import ChatSubChannels
from lib.objects import ChatThreads
from lib.objects import Images
from lib.objects import Items
from lib.objects import Messages
from lib.objects import PDFs
from lib.objects import FilesNames
# from lib.objects import Files
from lib.objects import UsersAccount
Expand Down Expand Up @@ -168,6 +172,8 @@ def get_obj(self):
instance_name = get_chat_instance_name(self.get_chat_instance_uuid())
item_id = f'{instance_name}/{d[0:4]}/{d[4:6]}/{d[6:8]}/{self.json_data["data-sha256"]}.gz'
self.obj = Items.Item(item_id)
elif obj_type == 'pdf':
self.obj = PDFs.PDF(self.json_data['data-sha256'])
else:
obj_id = Messages.create_obj_id(self.get_chat_instance_uuid(), chat_id, message_id, timestamp, thread_id=thread_id)
self.obj = Messages.Message(obj_id)
Expand All @@ -191,10 +197,11 @@ def _process_chat(self, meta_chat, date, new_objs=None): #TODO NONE DATE???

if meta_chat.get('icon'):
img = Images.create(meta_chat['icon'], b64=True)
img.add(date, chat)
chat.set_icon(img.get_global_id())
if new_objs:
new_objs.add(img)
if img:
img.add(date, chat)
chat.set_icon(img.get_global_id())
if new_objs:
new_objs.add(img)

if meta_chat.get('username'):
username = Username(meta_chat['username'], self.get_chat_protocol())
Expand Down Expand Up @@ -225,9 +232,10 @@ def process_chat(self, new_objs, obj, date, timestamp, feeder_timestamp, reply_i

if meta.get('icon'):
img = Images.create(meta['icon'], b64=True)
img.add(date, chat)
chat.set_icon(img.get_global_id())
new_objs.add(img)
if img:
img.add(date, chat)
chat.set_icon(img.get_global_id())
new_objs.add(img)

if meta.get('username'):
username = Username(meta['username'], self.get_chat_protocol())
Expand Down Expand Up @@ -324,9 +332,10 @@ def _process_user(self, meta, date, timestamp, new_objs=None):

if meta.get('icon'):
img = Images.create(meta['icon'], b64=True)
img.add(date, user_account)
user_account.set_icon(img.get_global_id())
new_objs.add(img)
if img:
img.add(date, user_account)
user_account.set_icon(img.get_global_id())
new_objs.add(img)

if meta.get('info'):
user_account.set_info(meta['info'])
Expand Down Expand Up @@ -363,9 +372,10 @@ def process_sender(self, new_objs, obj, date, timestamp):

if meta.get('icon'):
img = Images.create(meta['icon'], b64=True)
img.add(date, user_account)
user_account.set_icon(img.get_global_id())
new_objs.add(img)
if img:
img.add(date, user_account)
user_account.set_icon(img.get_global_id())
new_objs.add(img)

if meta.get('info'):
user_account.set_info(meta['info'])
Expand Down Expand Up @@ -405,7 +415,8 @@ def process_meta(self): # TODO CHECK MANDATORY FIELDS
media_name = self.get_media_name()
if media_name:
print(media_name)
FilesNames.FilesNames().create(media_name, date, obj)
f = FilesNames.FilesNames().create(media_name, date, obj)
objs.add(f)

for reaction in self.get_reactions():
obj.add_reaction(reaction['reaction'], int(reaction['count']))
Expand All @@ -430,13 +441,50 @@ def process_meta(self): # TODO CHECK MANDATORY FIELDS

if self.obj.type == 'image':
obj = Images.create(self.get_message_content())
obj.add(date, message)
obj.set_parent(obj_global_id=message.get_global_id())
if obj:
obj.add(date, message)
obj.set_parent(obj_global_id=message.get_global_id())

# FILENAME
media_name = self.get_media_name()
if media_name:
f = FilesNames.FilesNames().create(media_name, date, message, file_obj=obj)
objs.add(f)

elif self.obj.type == 'pdf':
# content
if not self.obj.exists():
obj = PDFs.create(self.obj.id, self.get_message_content())
if not obj:
raise Exception('PDF not created, Size limit reached')
obj.set_parent(obj_global_id=message.get_global_id())

pdf_meta = self.get_meta_field('file_metadata')
if pdf_meta:
obj.set_file_meta(pdf_meta)
print(pdf_meta)
if 'Author' in pdf_meta:
print(pdf_meta['Author'])
author = Authors.create(pdf_meta['Author'], obj)
author.add(date, obj)

md_content = pymupdf4llm.to_markdown(obj.get_filepath())
item_id = f'pdf/{date[0:4]}/{date[4:6]}/{date[6:8]}/{obj.id}.gz'
item = Items.Item(item_id)
if not item.exists():
item.create(md_content, content_type='str')
objs.add(item)
print(item_id)
obj.add_children('item', '', item_id)
obj.add_correlation('item', '', item_id)

self.obj.add(date, message)

# FILENAME
media_name = self.get_media_name()
if media_name:
FilesNames.FilesNames().create(media_name, date, message, file_obj=obj)
f = FilesNames.FilesNames().create(media_name, date, message, file_obj=self.obj)
objs.add(f)

elif self.obj.type == 'item':
obj = self.obj
Expand All @@ -447,8 +495,9 @@ def process_meta(self): # TODO CHECK MANDATORY FIELDS
# FILENAME
media_name = self.get_media_name()
if media_name:
file_name = FilesNames.FilesNames().create(media_name, date, message, file_obj=obj)
f = file_name = FilesNames.FilesNames().create(media_name, date, message, file_obj=obj)
file_name.add_correlation('item', '', obj.id)
objs.add(f)

for obj in objs: # TODO PERF avoid parsing metas multiple times

Expand Down
Loading