Skip to content

Commit 49de5ee

Browse files
committed
Merge branch 'master' into stable
2 parents 28ccd5b + 1433f8f commit 49de5ee

21 files changed

+1038
-457
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ tests/dumps/dump_dealers_vins.rdb
99
tests/dumps/dump_random_lists.rdb
1010
tests/dumps/dump_sorted_sets.rdb
1111

12+
.idea/*

.travis.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@ language: python
22
python:
33
- "2.6"
44
- "2.7"
5+
- "3.4"
6+
- "3.5"
57

68
script: python run_tests
79

CHANGES

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
1-
* 0.1.1
2-
* Fixed lzf decompression
3-
* Standard python project layout
4-
* Python script to automatically create test RDB files
5-
* Adds test cases
6-
* Adds setup.py to easily install this library
7-
* Adds MIT license
1+
* 0.1.9
2+
* python 3 support
3+
* rdb v8 (redis 4.0) support
4+
* binary to string conversion fixes
5+
* use ujson/cStringIO/python-lzf if they're available
6+
* filter keys by size
7+
* bugfixes parsing sorted sets
8+
* fix setup.py dependancies and remove requirements.txt file
9+
10+
* 0.1.8
11+
* fix a crash in the memory profiler recently introduced.
12+
13+
* 0.1.7
14+
* rdb v7 (redis 3.2) support
815

9-
* 0.1.0
10-
* Initial version
11-
* Specification for RDB file format

README.md

Lines changed: 105 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@ Rdbtools is written in Python, though there are similar projects in other langua
1414

1515
Pre-Requisites :
1616

17-
1. python 2.x and pip.
18-
2. redis-py is optional and only needed to run test cases.
17+
1. redis-py is optional and only needed to run test cases.
1918

2019
To install from PyPI (recommended) :
2120

@@ -27,29 +26,83 @@ To install from source :
2726
cd redis-rdb-tools
2827
sudo python setup.py install
2928

30-
## Converting dump files to JSON ##
29+
# Command line usage examples
30+
31+
Every run of RDB Tool requires to specify a command to indicate what should be done with the parsed RDB data.
32+
Valid commands are: json, diff, justkeys, justkeyvals and protocol.
33+
34+
JSON from a two database dump:
35+
36+
> rdb --command json /var/redis/6379/dump.rdb
3137

32-
Parse the dump file and print the JSON on standard output
38+
[{
39+
"user003":{"fname":"Ron","sname":"Bumquist"},
40+
"lizards":["Bush anole","Jackson's chameleon","Komodo dragon","Ground agama","Bearded dragon"],
41+
"user001":{"fname":"Raoul","sname":"Duke"},
42+
"user002":{"fname":"Gonzo","sname":"Dr"},
43+
"user_list":["user003","user002","user001"]},{
44+
"baloon":{"helium":"birthdays","medical":"angioplasty","weather":"meteorology"},
45+
"armadillo":["chacoan naked-tailed","giant","Andean hairy","nine-banded","pink fairy"],
46+
"aroma":{"pungent":"vinegar","putrid":"rotten eggs","floral":"roses"}}]
3347

34-
rdb --command json /var/redis/6379/dump.rdb
48+
## Filter parsed output
49+
50+
Only process keys that match the regex, and only print key and values:
51+
52+
> rdb --command justkeyvals --key "user.*" /var/redis/6379/dump.rdb
53+
54+
user003 fname Ron,sname Bumquist,
55+
user001 fname Raoul,sname Duke,
56+
user002 fname Gonzo,sname Dr,
57+
user_list user003,user002,user001
3558

36-
Only process keys that match the regex
59+
Only process hashes starting with "a", in database 2:
3760

38-
rdb --command json --key "user.*" /var/redis/6379/dump.rdb
61+
> rdb -c json --db 2 --type hash --key "a.*" /var/redis/6379/dump.rdb
3962

40-
Only process hashes starting with "a", in database 2
63+
[{},{
64+
"aroma":{"pungent":"vinegar","putrid":"rotten eggs","floral":"roses"}}]
65+
66+
## Converting dump files to JSON ##
67+
68+
The `json` command output is UTF-8 encoded JSON.
69+
By default, the callback try to parse RDB data using UTF-8 and escape non 'ASCII printable' characters with the `\U` notation, or non UTF-8 parsable bytes with `\x`.
70+
Attempting to decode RDB data can lead to binary data curroption, this can be avoided by using the `--escape raw` option.
71+
Another option, is to use `-e base64` for Base64 encoding of binary data.
72+
73+
74+
Parse the dump file and print the JSON on standard output:
75+
76+
> rdb -c json /var/redis/6379/dump.rdb
77+
78+
[{
79+
"Citat":["B\u00e4ttre sent \u00e4n aldrig","Bra karl reder sig sj\u00e4lv","Man ska inte k\u00f6pa grisen i s\u00e4cken"],
80+
"bin_data":"\\xFE\u0000\u00e2\\xF2"}]
81+
82+
Parse the dump file to raw bytes and print the JSON on standard output:
4183

42-
rdb --command json --db 2 --type hash --key "a.*" /var/redis/6379/dump.rdb
84+
> rdb -c json /var/redis/6379/dump.rdb --escape raw
4385

86+
[{
87+
"Citat":["B\u00c3\u00a4ttre sent \u00c3\u00a4n aldrig","Bra karl reder sig sj\u00c3\u00a4lv","Man ska inte k\u00c3\u00b6pa grisen i s\u00c3\u00a4cken"],
88+
"bin_data":"\u00fe\u0000\u00c3\u00a2\u00f2"}]
4489

4590
## Generate Memory Report ##
4691

47-
Running with the `-c memory` generates a CSV report with the approximate memory used by that key.
92+
Running with the `-c memory` generates a CSV report with the approximate memory used by that key. `--bytes C` and `'--largest N` can be used to limit output to keys larger than C bytes, or the N largest keys.
4893

49-
rdb -c memory /var/redis/6379/dump.rdb > memory.csv
94+
> rdb -c memory /var/redis/6379/dump.rdb --bytes 128 -f memory.csv
95+
> cat memory.csv
5096

97+
database,type,key,size_in_bytes,encoding,num_elements,len_largest_element
98+
0,list,lizards,241,quicklist,5,19
99+
0,list,user_list,190,quicklist,3,7
100+
2,hash,baloon,138,ziplist,3,11
101+
2,list,armadillo,231,quicklist,5,20
102+
2,hash,aroma,129,ziplist,3,11
51103

52-
The generated CSV has the following columns - Database Number, Data Type, Key, Memory Used in bytes and Encoding.
104+
105+
The generated CSV has the following columns - Database Number, Data Type, Key, Memory Used in bytes and RDB Encoding type.
53106
Memory usage includes the key, the value and any other overheads.
54107

55108
Note that the memory usage is approximate. In general, the actual memory used will be slightly higher than what is reported.
@@ -62,17 +115,13 @@ The memory report should help you detect memory leaks caused by your application
62115

63116
Sometimes you just want to find the memory used by a particular key, and running the entire memory report on the dump file is time consuming.
64117

65-
In such cases, you can use the `redis-memory-for-key` command
66-
67-
Example :
118+
In such cases, you can use the `redis-memory-for-key` command:
68119

69-
redis-memory-for-key person:1
120+
> redis-memory-for-key person:1
70121

71-
redis-memory-for-key -s localhost -p 6379 -a mypassword person:1
72-
73-
Output :
122+
> redis-memory-for-key -s localhost -p 6379 -a mypassword person:1
74123

75-
Key "person:1"
124+
Key person:1
76125
Bytes 111
77126
Type hash
78127
Encoding ziplist
@@ -88,20 +137,20 @@ NOTE :
88137

89138
First, use the --command diff option, and pipe the output to standard sort utility
90139

91-
rdb --command diff /var/redis/6379/dump1.rdb | sort > dump1.txt
92-
rdb --command diff /var/redis/6379/dump2.rdb | sort > dump2.txt
140+
> rdb --command diff /var/redis/6379/dump1.rdb | sort > dump1.txt
141+
> rdb --command diff /var/redis/6379/dump2.rdb | sort > dump2.txt
93142

94143
Then, run your favourite diff program
95144

96-
kdiff3 dump1.txt dump2.txt
145+
> kdiff3 dump1.txt dump2.txt
97146

98-
To limit the size of the files, you can filter on keys using the --key=regex option
147+
To limit the size of the files, you can filter on keys using the `--key` option
99148

100149
## Emitting Redis Protocol ##
101150

102-
You can convert RDB file into a stream of [redis protocol](http://redis.io/topics/protocol) using the "protocol" command.
151+
You can convert RDB file into a stream of [redis protocol](http://redis.io/topics/protocol) using the `protocol` command.
103152

104-
rdb --command protocol /var/redis/6379/dump.rdb
153+
> rdb --c protocol /var/redis/6379/dump.rdb
105154

106155
*4
107156
$4
@@ -113,37 +162,49 @@ You can convert RDB file into a stream of [redis protocol](http://redis.io/topic
113162
$8
114163
Sripathi
115164

116-
You can pipe the output to netcat and re-import a subset of the data.
117-
For example, if you want to shard your data into two redis instances, you can use the --key flag to select a subset of data,
165+
You can pipe the output to netcat and re-import a subset of the data.
166+
For example, if you want to shard your data into two redis instances, you can use the --key flag to select a subset of data,
118167
and then pipe the output to a running redis instance to load that data.
119-
120168
Read [Redis Mass Insert](http://redis.io/topics/mass-insert) for more information on this.
121169

122-
## Using the Parser ##
170+
When printing protocol output, the `--escape` option can be used with `printable` or `utf8` to avoid non printable/control characters.
171+
172+
# Using the Parser ##
123173

124-
import sys
125174
from rdbtools import RdbParser, RdbCallback
175+
from rdbtools.encodehelpers import bytes_to_unicode
126176

127-
class MyCallback(RdbCallback) :
128-
''' Simple example to show how callback works.
177+
class MyCallback(RdbCallback):
178+
''' Simple example to show how callback works.
129179
See RdbCallback for all available callback methods.
130180
See JsonCallback for a concrete example
131-
'''
132-
def set(self, key, value, expiry):
133-
print('%s = %s' % (str(key), str(value)))
134-
181+
'''
182+
183+
def __init__(self):
184+
super(MyCallback, self).__init__(string_escape=None)
185+
186+
def encode_key(self, key):
187+
return bytes_to_unicode(key, self._escape, skip_printable=True)
188+
189+
def encode_value(self, val):
190+
return bytes_to_unicode(val, self._escape)
191+
192+
def set(self, key, value, expiry, info):
193+
print('%s = %s' % (self.encode_key(key), self.encode_value(value)))
194+
135195
def hset(self, key, field, value):
136-
print('%s.%s = %s' % (str(key), str(field), str(value)))
137-
196+
print('%s.%s = %s' % (self.encode_key(key), self.encode_key(field), self.encode_value(value)))
197+
138198
def sadd(self, key, member):
139-
print('%s has {%s}' % (str(key), str(member)))
140-
141-
def rpush(self, key, value) :
142-
print('%s has [%s]' % (str(key), str(value)))
143-
199+
print('%s has {%s}' % (self.encode_key(key), self.encode_value(member)))
200+
201+
def rpush(self, key, value):
202+
print('%s has [%s]' % (self.encode_key(key), self.encode_value(value)))
203+
144204
def zadd(self, key, score, member):
145205
print('%s has {%s : %s}' % (str(key), str(member), str(score)))
146206

207+
147208
callback = MyCallback()
148209
parser = RdbParser(callback)
149210
parser.parse('/var/redis/6379/dump.rdb')

rdbtools/__init__.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
from rdbtools.parser import RdbCallback, RdbParser, DebugCallback
2-
from rdbtools.callbacks import JSONCallback, DiffCallback, ProtocolCallback
3-
from rdbtools.memprofiler import MemoryCallback, PrintAllKeys, StatsAggregator
2+
from rdbtools.callbacks import JSONCallback, DiffCallback, ProtocolCallback, KeyValsOnlyCallback, KeysOnlyCallback
3+
from rdbtools.memprofiler import MemoryCallback, PrintAllKeys, StatsAggregator, PrintJustKeys
44

5-
__version__ = '0.1.8'
5+
__version__ = '0.1.9'
66
VERSION = tuple(map(int, __version__.split('.')))
77

88
__all__ = [
9-
'RdbParser', 'RdbCallback', 'JSONCallback', 'DiffCallback', 'MemoryCallback', 'ProtocolCallback', 'PrintAllKeys']
9+
'RdbParser', 'RdbCallback', 'JSONCallback', 'DiffCallback', 'MemoryCallback', 'ProtocolCallback', 'KeyValsOnlyCallback', 'KeysOnlyCallback', 'PrintJustKeys']
1010

0 commit comments

Comments
 (0)