Skip to content

Commit 0dfa408

Browse files
more on docs
1 parent f1e90d9 commit 0dfa408

File tree

6 files changed

+170
-1
lines changed

6 files changed

+170
-1
lines changed

docs/changelog.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,13 @@ are used for versioning (schema follows below):
1515
0.3.4 to 0.4).
1616
- All backwards incompatible changes are mentioned in this document.
1717

18+
0.17.6
19+
------
20+
2019-04-08
21+
22+
- Minor fixes.
23+
- Additions to the docs.
24+
1825
0.17.5
1926
------
2027
2019-04-03

docs/documentation.rst.distrib

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Contents:
2020
global_aggregations
2121
configuration_tweaks
2222
source_backend
23+
indexing_troubleshooting
2324
demo
2425
frontend_demo
2526
changelog

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,7 @@ Contents:
265265
global_aggregations
266266
configuration_tweaks
267267
source_backend
268+
indexing_troubleshooting
268269
demo
269270
frontend_demo
270271
changelog

docs/indexing_troubleshooting.rst

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
Indexing troubleshooting
2+
========================
3+
When indexing lots of data (millions of records), you might get timeout
4+
exceptions.
5+
6+
A couple of possible solutions (complimentary) are listed below. All of them
7+
are independent and not strictly related to each other. Thus, you may just use
8+
one or a couple or all of them. It's totally up to you.
9+
10+
Timeout
11+
-------
12+
For re-indexing, you might want to increase the timeout to avoid time-out
13+
exceptions.
14+
15+
To do that, make a new settings file (`indexing`) and add the following:
16+
17+
*settings/indexing.py*
18+
19+
.. code-block:: python
20+
21+
from .base import * # Import from your main/production settings.
22+
23+
# Override the elasticsearch configuration and provide a custom timeout
24+
ELASTICSEARCH_DSL = {
25+
'default': {
26+
'hosts': 'localhost:9200',
27+
'timeout': 60, # Custom timeout
28+
},
29+
}
30+
31+
Then rebuild your search index specifying the indexing settings:
32+
33+
.. code-block:: sh
34+
35+
./manage.py search_index --rebuild -f --settings=settings.indexing
36+
37+
Note, that you may as well specify the timeout in your global settings.
38+
However, if you're happy with how things work in production (except for the
39+
indexing part), you may do as suggested (separate indexing settings).
40+
41+
Chunk size
42+
----------
43+
Note, that this feature is (yet) *only available in the forked version*
44+
`barseghyanartur/django-elasticsearch-dsl
45+
<https://github.com/barseghyanartur/django-elasticsearch-dsl/tree/mjl-index-speedup-2-additions>`_.
46+
47+
Install it as follows:
48+
49+
.. code-block:: sh
50+
51+
pip install https://github.com/barseghyanartur/django-elasticsearch-dsl/archive/mjl-index-speedup-2-additions.zip
52+
53+
Specify the `chunk_size` param as follows (we set chunk_size to 50 in
54+
this case):
55+
56+
.. code-block:: sh
57+
58+
./manage.py search_index --rebuild -f --chunk-size=50
59+
60+
Use parallel indexing
61+
---------------------
62+
Parallel indexing speeds things up (drastically). In my tests I got a speedup
63+
boost of 66 percent on 1.6 million records.
64+
65+
Note, that this feature is (yet) *only available in the forked versions*
66+
`barseghyanartur/django-elasticsearch-dsl
67+
<https://github.com/barseghyanartur/django-elasticsearch-dsl/tree/mjl-index-speedup-2-additions>`_.
68+
or
69+
`mjl/django-elasticsearch-dsl <https://github.com/mjl/django-elasticsearch-dsl/tree/mjl-index-speedup>`_.
70+
71+
Install it as follows:
72+
73+
*barseghyanartur/django-elasticsearch-dsl fork*
74+
75+
.. code-block:: sh
76+
77+
pip install https://github.com/barseghyanartur/django-elasticsearch-dsl/archive/mjl-index-speedup-2-additions.zip
78+
79+
*mjl/django-elasticsearch-dsl fork*
80+
81+
.. code-block:: sh
82+
83+
pip install https://github.com/mjl/django-elasticsearch-dsl/archive/mjl-index-speedup.zip
84+
85+
In order to make use of it, define set `parallel_indexing` to True on the
86+
document meta.
87+
88+
*yourapp/documents.py*
89+
90+
.. code-block:: python
91+
92+
class LocationDocument(DocType):
93+
94+
# ...
95+
96+
class Meta(object):
97+
"""Meta options."""
98+
99+
model = Location
100+
parallel_indexing = True
101+
102+
Limit the number of items indexed at once
103+
-----------------------------------------
104+
This is very close to the `chunk_size` shown above, but might work better
105+
on heavy querysets. Instead of processing entire queryset at once, it's
106+
sliced instead. So, if you have 2 million records in your queryset and you
107+
wish to index them by chunks of 20 thousands at once, specify the
108+
`queryset_pagination` on the document meta:
109+
110+
*yourapp/documents.py*
111+
112+
.. code-block:: python
113+
114+
class LocationDocument(DocType):
115+
116+
# ...
117+
118+
class Meta(object):
119+
"""Meta options."""
120+
121+
model = Location
122+
queryset_pagination = 50
123+
124+
You may even make it dynamic based on the settings loaded. So, for instance,
125+
you may have it set to None in production (if you were happy with how things
126+
were) and provide a certain value for it in the dedicated indexing
127+
settings (as already has been mentioned above).
128+
129+
*settings/base.py*
130+
131+
.. code-block:: python
132+
133+
# Main/production settings
134+
ELASTICSEARCH_DSL_QUERYSET_PAGINATION = None
135+
136+
*settings/indexing.py*
137+
138+
.. code-block:: python
139+
140+
# Indexing only settings
141+
ELASTICSEARCH_DSL_QUERYSET_PAGINATION = 1000
142+
143+
*yourapp/documents.py*
144+
145+
.. code-block:: python
146+
147+
from django.conf import settings
148+
149+
# ...
150+
151+
class LocationDocument(DocType):
152+
153+
# ...
154+
155+
class Meta(object):
156+
"""Meta options."""
157+
158+
model = Location
159+
queryset_pagination = settings.ELASTICSEARCH_DSL_QUERYSET_PAGINATION

docs_src/indexing_troubleshooting.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ wish to index them by chunks of 20 thousands at once, specify the
119119
"""Meta options."""
120120
121121
model = Location
122-
queryset_pagination = 1000
122+
queryset_pagination = 50
123123
124124
You may even make it dynamic based on the settings loaded. So, for instance,
125125
you may have it set to None in production (if you were happy with how things

scripts/prepare_docs.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,6 @@ cat docs_src/filtering_usage_examples.rst > docs/filtering_usage_examples.rst
1111
cat docs_src/more_like_this.rst > docs/more_like_this.rst
1212
cat docs_src/source_backend.rst > docs/source_backend.rst
1313
cat docs_src/configuration_tweaks.rst > docs/configuration_tweaks.rst
14+
cat docs_src/indexing_troubleshooting.rst > docs/indexing_troubleshooting.rst
1415
cat docs_src/global_aggregations.rst > docs/global_aggregations.rst
1516
cat examples/frontend/README.rst > docs/frontend_demo.rst

0 commit comments

Comments
 (0)