Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Commit a068ad7

Browse files
authored
Add information on uploaded media to user export command. (#15107)
1 parent 452b009 commit a068ad7

File tree

5 files changed

+136
-16
lines changed

5 files changed

+136
-16
lines changed

changelog.d/15107.feature

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add media information to the command line [user data export tool](https://matrix-org.github.io/synapse/v1.79/usage/administration/admin_faq.html#how-can-i-export-user-data).

docs/usage/administration/admin_faq.md

Lines changed: 58 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -70,13 +70,55 @@ output-directory
7070
│ ├───state
7171
│ ├───invite_state
7272
│ └───knock_state
73-
└───user_data
74-
├───account_data
75-
│ ├───global
76-
│ └───<room_id>
77-
├───connections
78-
├───devices
79-
└───profile
73+
├───user_data
74+
│ ├───account_data
75+
│ │ ├───global
76+
│ │ └───<room_id>
77+
│ ├───connections
78+
│ ├───devices
79+
│ └───profile
80+
└───media_ids
81+
└───<media_id>
82+
```
83+
84+
The `media_ids` folder contains only the metadata of the media uploaded by the user.
85+
It does not contain the media itself.
86+
Furthermore, only the `media_ids` that Synapse manages itself are exported.
87+
If another media repository (e.g. [matrix-media-repo](https://github.com/turt2live/matrix-media-repo))
88+
is used, the data must be exported separately.
89+
90+
With the `media_ids` the media files can be downloaded.
91+
Media that have been sent in encrypted rooms are only retrieved in encrypted form.
92+
The following script can help with download the media files:
93+
94+
```bash
95+
#!/usr/bin/env bash
96+
97+
# Parameters
98+
#
99+
# source_directory: Directory which contains the export with the media_ids.
100+
# target_directory: Directory into which all files are to be downloaded.
101+
# repository_url: Address of the media repository resp. media worker.
102+
# serverName: Name of the server (`server_name` from homeserver.yaml).
103+
#
104+
# Example:
105+
# ./download_media.sh /tmp/export_data/media_ids/ /tmp/export_data/media_files/ http://localhost:8008 matrix.example.com
106+
107+
source_directory=$1
108+
target_directory=$2
109+
repository_url=$3
110+
serverName=$4
111+
112+
mkdir -p $target_directory
113+
114+
for file in $source_directory/*; do
115+
filename=$(basename ${file})
116+
url=$repository_url/_matrix/media/v3/download/$serverName/$filename
117+
echo "Downloading $filename - $url"
118+
if ! wget -o /dev/null -P $target_directory $url; then
119+
echo "Could not download $filename"
120+
fi
121+
done
80122
```
81123

82124
Manually resetting passwords
@@ -87,7 +129,7 @@ can reset a user's password using the [admin API](../../admin_api/user_admin_api
87129

88130
I have a problem with my server. Can I just delete my database and start again?
89131
---
90-
Deleting your database is unlikely to make anything better.
132+
Deleting your database is unlikely to make anything better.
91133

92134
It's easy to make the mistake of thinking that you can start again from a clean
93135
slate by dropping your database, but things don't work like that in a federated
@@ -102,7 +144,7 @@ Come and seek help in https://matrix.to/#/#synapse:matrix.org.
102144

103145
There are two exceptions when it might be sensible to delete your database and start again:
104146
* You have *never* joined any rooms which are federated with other servers. For
105-
instance, a local deployment which the outside world can't talk to.
147+
instance, a local deployment which the outside world can't talk to.
106148
* You are changing the `server_name` in the homeserver configuration. In effect
107149
this makes your server a completely new one from the point of view of the network,
108150
so in this case it makes sense to start with a clean database.
@@ -115,7 +157,7 @@ Using the following curl command:
115157
curl -H 'Authorization: Bearer <access-token>' -X DELETE https://matrix.org/_matrix/client/r0/directory/room/<room-alias>
116158
```
117159
`<access-token>` - can be obtained in riot by looking in the riot settings, down the bottom is:
118-
Access Token:\<click to reveal\>
160+
Access Token:\<click to reveal\>
119161

120162
`<room-alias>` - the room alias, eg. #my_room:matrix.org this possibly needs to be URL encoded also, for example %23my_room%3Amatrix.org
121163

@@ -152,13 +194,13 @@ What are the biggest rooms on my server?
152194
---
153195

154196
```sql
155-
SELECT s.canonical_alias, g.room_id, count(*) AS num_rows
156-
FROM
157-
state_groups_state AS g,
158-
room_stats_state AS s
159-
WHERE g.room_id = s.room_id
197+
SELECT s.canonical_alias, g.room_id, count(*) AS num_rows
198+
FROM
199+
state_groups_state AS g,
200+
room_stats_state AS s
201+
WHERE g.room_id = s.room_id
160202
GROUP BY s.canonical_alias, g.room_id
161-
ORDER BY num_rows desc
203+
ORDER BY num_rows desc
162204
LIMIT 10;
163205
```
164206

synapse/app/admin_cmd.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
)
4545
from synapse.storage.databases.main.events_worker import EventsWorkerStore
4646
from synapse.storage.databases.main.filtering import FilteringWorkerStore
47+
from synapse.storage.databases.main.media_repository import MediaRepositoryStore
4748
from synapse.storage.databases.main.profile import ProfileWorkerStore
4849
from synapse.storage.databases.main.push_rule import PushRulesWorkerStore
4950
from synapse.storage.databases.main.receipts import ReceiptsWorkerStore
@@ -86,6 +87,7 @@ class AdminCmdSlavedStore(
8687
RegistrationWorkerStore,
8788
RoomWorkerStore,
8889
ProfileWorkerStore,
90+
MediaRepositoryStore,
8991
):
9092
def __init__(
9193
self,
@@ -235,6 +237,14 @@ def write_account_data(
235237
with open(account_data_file, "a") as f:
236238
json.dump(account_data, fp=f)
237239

240+
def write_media_id(self, media_id: str, media_metadata: JsonDict) -> None:
241+
file_directory = os.path.join(self.base_directory, "media_ids")
242+
os.makedirs(file_directory, exist_ok=True)
243+
media_id_file = os.path.join(file_directory, media_id)
244+
245+
with open(media_id_file, "w") as f:
246+
json.dump(media_metadata, fp=f)
247+
238248
def finished(self) -> str:
239249
return self.base_directory
240250

synapse/handlers/admin.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,23 +252,49 @@ async def export_user_data(self, user_id: str, writer: "ExfiltrationWriter") ->
252252
profile = await self.get_user(UserID.from_string(user_id))
253253
if profile is not None:
254254
writer.write_profile(profile)
255+
logger.info("[%s] Written profile", user_id)
255256

256257
# Get all devices the user has
257258
devices = await self._device_handler.get_devices_by_user(user_id)
258259
writer.write_devices(devices)
260+
logger.info("[%s] Written %s devices", user_id, len(devices))
259261

260262
# Get all connections the user has
261263
connections = await self.get_whois(UserID.from_string(user_id))
262264
writer.write_connections(
263265
connections["devices"][""]["sessions"][0]["connections"]
264266
)
267+
logger.info("[%s] Written %s connections", user_id, len(connections))
265268

266269
# Get all account data the user has global and in rooms
267270
global_data = await self._store.get_global_account_data_for_user(user_id)
268271
by_room_data = await self._store.get_room_account_data_for_user(user_id)
269272
writer.write_account_data("global", global_data)
270273
for room_id in by_room_data:
271274
writer.write_account_data(room_id, by_room_data[room_id])
275+
logger.info(
276+
"[%s] Written account data for %s rooms", user_id, len(by_room_data)
277+
)
278+
279+
# Get all media ids the user has
280+
limit = 100
281+
start = 0
282+
while True:
283+
media_ids, total = await self._store.get_local_media_by_user_paginate(
284+
start, limit, user_id
285+
)
286+
for media in media_ids:
287+
writer.write_media_id(media["media_id"], media)
288+
289+
logger.info(
290+
"[%s] Written %d media_ids of %s",
291+
user_id,
292+
(start + len(media_ids)),
293+
total,
294+
)
295+
if (start + limit) >= total:
296+
break
297+
start += limit
272298

273299
return writer.finished()
274300

@@ -359,6 +385,18 @@ def write_account_data(
359385
"""
360386
raise NotImplementedError()
361387

388+
@abc.abstractmethod
389+
def write_media_id(self, media_id: str, media_metadata: JsonDict) -> None:
390+
"""Write the media's metadata of a user.
391+
Exports only the metadata, as this can be fetched from the database via
392+
read only. In order to access the files, a connection to the correct
393+
media repository would be required.
394+
395+
Args:
396+
media_id: ID of the media.
397+
media_metadata: Metadata of one media file.
398+
"""
399+
362400
@abc.abstractmethod
363401
def finished(self) -> Any:
364402
"""Called when all data has successfully been exported and written.

tests/handlers/test_admin.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
from synapse.api.room_versions import RoomVersions
2424
from synapse.rest.client import knock, login, room
2525
from synapse.server import HomeServer
26+
from synapse.types import UserID
2627
from synapse.util import Clock
2728

2829
from tests import unittest
@@ -323,3 +324,31 @@ def test_account_data(self) -> None:
323324
args = writer.write_account_data.call_args_list[1][0]
324325
self.assertEqual(args[0], "test_room")
325326
self.assertEqual(args[1]["m.per_room"]["b"], 2)
327+
328+
def test_media_ids(self) -> None:
329+
"""Tests that media's metadata get exported."""
330+
331+
self.get_success(
332+
self._store.store_local_media(
333+
media_id="media_1",
334+
media_type="image/png",
335+
time_now_ms=self.clock.time_msec(),
336+
upload_name=None,
337+
media_length=50,
338+
user_id=UserID.from_string(self.user2),
339+
)
340+
)
341+
342+
writer = Mock()
343+
344+
self.get_success(self.admin_handler.export_user_data(self.user2, writer))
345+
346+
writer.write_media_id.assert_called_once()
347+
348+
args = writer.write_media_id.call_args[0]
349+
self.assertEqual(args[0], "media_1")
350+
self.assertEqual(args[1]["media_id"], "media_1")
351+
self.assertEqual(args[1]["media_length"], 50)
352+
self.assertGreater(args[1]["created_ts"], 0)
353+
self.assertIsNone(args[1]["upload_name"])
354+
self.assertIsNone(args[1]["last_access_ts"])

0 commit comments

Comments
 (0)