Skip to content

Commit b112aba

Browse files
committed
add GBit to spelling; clean up text
1 parent a0426eb commit b112aba

File tree

2 files changed

+15
-8
lines changed

2 files changed

+15
-8
lines changed

.github/actions/spelling/allow.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Fawzi
2828
Fock
2929
Foket
3030
GAPW
31+
GBit
3132
GGA
3233
GPFS
3334
GPG

docs/guides/internet-access.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,9 @@ $ curl api.ipify.org
2525

2626
Alps is a an excellent machine to simulate, evaluate and analyze data, and communication within Alps is optimized. Communication with external services is often needed to set up a calculation or communicate results to others.
2727

28-
To enable this CSCS has excellent connection (400GBit/s) to SWITCH.ch, that provides internet services to the research and education infrastructure in Switzerland.
28+
To enable this CSCS has excellent connection (400 GBit/s) to SWITCH.ch, that provides internet services to the research and education infrastructure in Switzerland.
2929

30-
Still communication with external services is not the focus of CSCS, it is rather seen as a way to enable the use of our resources, so for example as explained below from Alps **you should not** put load on services (web sites,...) that do not expect it, for example through **scraping**.
30+
Still communication with external services is not the focus of CSCS, it is rather seen as a way to enable the use of our resources, so for example as explained below from Alps **do not** put load on services that do not expect it, for example through **scraping**.
3131

3232
### Shared resources
3333

@@ -36,19 +36,25 @@ If you need to heavily interact with external systems there are some caveats tha
3636
To avoid abuse there are measures in place at CSCS, on the transit networks, and on the remote systems, but these measures are often very blunt and would affect the CSCS as whole, so care should be taken to avoid triggering them.
3737
We have a good relationship with Switch, so if we trigger some of their fail-safes (for example their anti-DDoS tools), they will contact us. Other might take action without telling us anything.
3838

39-
For example a website might blacklist IPs, or whole subnets from CSCS, which would make the service unavailable for all other CSCS users too. Many sites use content delivery networks (CDN), like Cloudflare, Akamai, or similar, and if those blacklist the CSCS many users will be affected. In addition, once we are blacklisted, it's extremely difficult and long be able to get out of these blacklists.
40-
As far as we know, they don't publish the levels of the number of requests/queries per second that can trigger this kind of action, for some obvious reason that bad-intentioned people would stay just below this limit...
39+
For example a website might blacklist IPs, or whole subnets from CSCS, which would make the service unavailable for all other CSCS users too.
40+
Many sites use content delivery networks (CDN), like Cloudflare, Akamai, or similar, and if those blacklist the CSCS many users will be affected.
41+
In addition, once we are blacklisted, it's extremely difficult and long be able to get out of these blacklists.
42+
43+
!!! info
44+
Sites do not publish the number of requests/queries per second that trigger blacklisting, for some obvious reason that bad-intentioned people would stay just below this limit.
4145

4246
So you should be mindful of your usage, in particular of the number of requests to the DNS and the network bandwidth.
4347
Every access to a different domain will trigger a DNS request, using multiple nodes does not solve the problem, because they will still be hitting the same DNS resolver.
4448

45-
We do have protection in place for our public DNS server, but other DNS servers might decide to blacklist the originator of all those requests.
46-
On Alps currently we use an internal DNS, which is also used to resolve the different nodes in alps, and does not have special protections against abuse. For this reason **avoid scraping from Alps**, as it could lead to it being blacklisted.
47-
Resources outside Alps can do 100s of request per second without problems.
49+
CSCS has protection in place for our public DNS server, but other DNS servers might decide to blacklist the originator of all those requests.
50+
Alps uses an internal DNS, which is also used to resolve the different nodes in alps, and does not have special protections against abuse.
51+
For this reason **avoid scraping from Alps**, as it could lead to it being blacklisted.
4852

49-
Given the excellent connection of the CSCS network with SWITCH a sustained use of it can saturate the connection of a large provider (as Google for example) with SWITCH, thus affecting almost any any user trying to query Google from Switzerland.
53+
!!! info
54+
Given the excellent connection of the CSCS network with SWITCH a sustained use of it can saturate the connection of a large provider like Google, which would affect all Swiss Google users.
5055

5156
### Conclusions
57+
5258
Before any large scale sustained use of external resources think carefully about the load you are putting on the CSCS, network and target, both in number of requests and size of the request.
5359

5460
Try to change the perspective: how quickly do you really need the whole data? Can you or should you use resources outside Alps, or even outside CSCS? Maybe geo-distributed?

0 commit comments

Comments
 (0)