Skip to content

Commit 0c6f8c0

Browse files
Docs: Remove code upload challenge creation docs(#4474)
* [Documentation] Fix host challenge doc * [Docs] Remove code-upload related content --------- Co-authored-by: Rishabh Jain <rishabhjain2018@gmail.com>
1 parent f83901d commit 0c6f8c0

File tree

3 files changed

+0
-448
lines changed

3 files changed

+0
-448
lines changed

docs/source/configuration.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,6 @@ Following fields are required (and can be customized) in the [`challenge_config.
2020

2121
- **remote_evaluation**: True/False (specify whether evaluation will happen on a remote machine or not. Default is `False`)
2222

23-
- **is_docker_based**: True/False (specify whether the challenge is docker based or not. Default is `False`)
24-
25-
- **is_static_dataset_code_upload**: True/False (specify whether the challenge is static dataset code upload or not. Default is `False`)
26-
2723
- **start_date**: Start DateTime of the challenge (Format: YYYY-MM-DD HH:MM:SS, e.g. 2017-07-07 10:10:10) in `UTC` time zone
2824

2925
- **end_date**: End DateTime of the challenge (Format: YYYY-MM-DD HH:MM:SS, e.g. 2017-07-07 10:10:10) in `UTC` time zone

docs/source/evaluation_scripts.md

Lines changed: 0 additions & 279 deletions
Original file line numberDiff line numberDiff line change
@@ -83,196 +83,6 @@ Let's break down what is happening in the above code snippet.
8383
2. Each entry in the list should be a dict that has a key with the corresponding dataset split codename (`train_split` and `test_split` for this example).
8484
3. Each of these dataset split dict contains various keys (`Metric1`, `Metric2`, `Metric3`, `Total` in this example), which are then displayed as columns in the leaderboard.
8585

86-
### Writing Code-Upload Challenge Evaluation
87-
88-
Each challenge has an evaluation script, which evaluates the submission of participants and returns the scores which will populate the leaderboard. The logic for evaluating and judging a submission is customizable and varies from challenge to challenge, but the overall structure of evaluation scripts is fixed due to architectural reasons.
89-
90-
In code-upload challenges, the evaluation is tighly-coupled with the agent and environment containers:
91-
92-
1. The agent interacts with environment via actions and provides a 'stop' signal when finished.
93-
2. The environment provides feedback to the agent until 'stop' signal is received, episodes run out or the time limit is over.
94-
95-
The starter templates for code-upload challenge evaluation can be found [here](https://github.com/Cloud-CV/EvalAI-Starters/tree/master/code_upload_challenge_evaluation).
96-
97-
The steps to configure evaluation for code-upload challenges are:
98-
99-
1. **Create an environment**:
100-
There are few steps involved in creating an environment:
101-
1. *Edit the evaluator_environment*: This class defines the environment (a [gym environment](https://www.gymlibrary.dev/content/environment_creation/) or a [habitat environment](https://github.com/facebookresearch/habitat-lab/blob/b1f2d4791a0065d0791001b72a6c96748a5f9ae0/habitat-lab/habitat/core/env.py)) and other related attributes/methods. Modify the `evaluator_environment` containing a gym environment shown [here](https://github.com/Cloud-CV/EvalAI-Starters/blob/8338085c6335487332f5b57cf7182201b8499aad/code_upload_challenge_evaluation/environment/environment.py#L21-L32):
102-
103-
```python
104-
class evaluator_environment:
105-
def __init__(self, environment="CartPole-v0"):
106-
self.score = 0
107-
self.feedback = None
108-
self.env = gym.make(environment)
109-
self.env.reset()
110-
111-
def get_action_space(self):
112-
return list(range(self.env.action_space.n))
113-
114-
def next_score(self):
115-
self.score += 1
116-
```
117-
118-
There are three methods in this example:
119-
- `__init__`: The initialization method to instantiate and set up the evaluation environment.
120-
- `get_action_space`: Returns the action space of the agent in the environment.
121-
- `next_score`: Returns/updates the reward achieved.
122-
123-
You can add custom methods and attributes which help in interaction with the environment.
124-
125-
2. *Edit the Environment service*: This service is hosted on the [gRPC](https://grpc.io/) server to get actions in form of messages from the agent container. Modify the lines shown [here](https://github.com/Cloud-CV/EvalAI-Starters/blob/8338085c6335487332f5b57cf7182201b8499aad/code_upload_challenge_evaluation/environment/environment.py#L35-L65):
126-
127-
```python
128-
class Environment(evaluation_pb2_grpc.EnvironmentServicer):
129-
def __init__(self, challenge_pk, phase_pk, submission_pk, server):
130-
self.challenge_pk = challenge_pk
131-
self.phase_pk = phase_pk
132-
self.submission_pk = submission_pk
133-
self.server = server
134-
135-
def get_action_space(self, request, context):
136-
message = pack_for_grpc(env.get_action_space())
137-
return evaluation_pb2.Package(SerializedEntity=message)
138-
139-
def act_on_environment(self, request, context):
140-
global EVALUATION_COMPLETED
141-
if not env.feedback or not env.feedback[2]:
142-
action = unpack_for_grpc(request.SerializedEntity)
143-
env.next_score()
144-
env.feedback = env.env.step(action)
145-
if env.feedback[2]:
146-
if not LOCAL_EVALUATION:
147-
update_submission_result(
148-
env, self.challenge_pk, self.phase_pk, self.submission_pk
149-
)
150-
else:
151-
print("Final Score: {0}".format(env.score))
152-
print("Stopping Evaluation!")
153-
EVALUATION_COMPLETED = True
154-
return evaluation_pb2.Package(
155-
SerializedEntity=pack_for_grpc(
156-
{"feedback": env.feedback, "current_score": env.score,}
157-
)
158-
)
159-
```
160-
161-
You can modify the relevant parts of the environment service in order to make it work for your case.
162-
You would need to serialize and deserialize the response/request to pass messages between the agent and environment over gRPC. For this, we have implemented two methods which might be useful:
163-
- `unpack_for_grpc`: This method deserializes entities from request/response sent over gRPC. This is useful for receiving messages (for example, actions from the agent).
164-
- `pack_for_grpc`: This method serializes entities to be sent over a request over gRPC. This is useful for sending messages (for example, feedback from the environment).
165-
166-
**Note**: This is a basic description of the class and the implementations may vary on a case-by-case basis.
167-
168-
3. *Edit the requirements file*: Change the [requirements file](https://github.com/Cloud-CV/EvalAI-Starters/blob/master/code_upload_challenge_evaluation/requirements/environment.txt) according to the packages required by your environment.
169-
170-
4. *Edit environment Dockerfile*: You may choose to modify the [Dockerfile](https://github.com/Cloud-CV/EvalAI-Starters/blob/master/code_upload_challenge_evaluation/docker/environment/Dockerfile) that will set up and run the environment service.
171-
172-
5. *Edit the docker environment variables*: Fill in the following information in the [`docker.env`](https://github.com/Cloud-CV/EvalAI-Starters/blob/master/code_upload_challenge_evaluation/docker/environment/docker.env) file:
173-
174-
```env
175-
AUTH_TOKEN=<Add your EvalAI Auth Token here>
176-
EVALAI_API_SERVER=<https://eval.ai>
177-
LOCAL_EVALUATION = True
178-
QUEUE_NAME=<Go to the challenge manage tab to get challenge queue name.>
179-
```
180-
181-
6. *Create the docker image and upload on ECR*: Create an environment docker image for the created `Dockerfile` by using:
182-
183-
```sh
184-
docker build -f <file_path_to_Dockerfile>
185-
````
186-
187-
Upload the created docker image to ECR:
188-
189-
```sh
190-
aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.<region>.amazonaws.com
191-
docker tag <image_id> <aws_account_id>.dkr.ecr.<region>.amazonaws.com/<my-repository>:<tag>
192-
docker push <aws_account_id>.dkr.ecr.<region>.amazonaws.com/<my-repository>:<tag>
193-
```
194-
195-
Detailed steps for uploading a docker image to ECR can be found [here](https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html).
196-
197-
7. *Add environment image the challenge configuration for challenge phase*: For each challenge phase, add the link to the environment image in the [challenge configuration](https://evalai.readthedocs.io/en/latest/configuration.html):
198-
199-
```yaml
200-
...
201-
challenge_phases:
202-
- id: 1
203-
...
204-
- environment_image: <docker image uri>
205-
...
206-
```
207-
208-
Example References:
209-
- [Habitat Benchmark](https://github.com/facebookresearch/habitat-lab/blob/b1f2d4791a0065d0791001b72a6c96748a5f9ae0/habitat-lab/habitat/core/benchmark.py): This file contains description of an evaluation class which evaluates agents on the environment.
210-
211-
2. **Create a starter example**:
212-
The participants are expected to submit docker images for their agents which will contain the policy and the methods to interact with the environment.
213-
214-
Like environment, there are a few steps involved in creating the agent:
215-
1. *Create a starter example script*: Please create a starter agent submission and a local evaluation script in order to help the participants perform sanity checks on their code before making submissions to EvalAI.
216-
217-
The `agent.py` file should contain a description of the agent, the methods that the environment expects the agent to have, and a `main()` function to pass actions to the environment.
218-
219-
We provide a template for `agent.py` [here](https://github.com/Cloud-CV/EvalAI-Starters/blob/master/code_upload_challenge_evaluation/agent/agent.py):
220-
221-
```python
222-
import evaluation_pb2
223-
import evaluation_pb2_grpc
224-
import grpc
225-
import os
226-
import pickle
227-
import time
228-
229-
time.sleep(30)
230-
231-
LOCAL_EVALUATION = os.environ.get("LOCAL_EVALUATION")
232-
233-
if LOCAL_EVALUATION:
234-
channel = grpc.insecure_channel("environment:8085")
235-
else:
236-
channel = grpc.insecure_channel("localhost:8085")
237-
238-
stub = evaluation_pb2_grpc.EnvironmentStub(channel)
239-
240-
def pack_for_grpc(entity):
241-
return pickle.dumps(entity)
242-
243-
def unpack_for_grpc(entity):
244-
return pickle.loads(entity)
245-
246-
flag = None
247-
248-
while not flag:
249-
base = unpack_for_grpc(
250-
stub.act_on_environment(
251-
evaluation_pb2.Package(SerializedEntity=pack_for_grpc(1))
252-
).SerializedEntity
253-
)
254-
flag = base["feedback"][2]
255-
print("Agent Feedback", base["feedback"])
256-
print("*"* 100)
257-
258-
```
259-
260-
**Other Examples**:
261-
- A [random agent](https://github.com/facebookresearch/habitat-challenge/blob/rearrangement-challenge-2022/agents/random_agent.py) from [Habitat Rearrangement Challenge 2022](https://github.com/facebookresearch/habitat-challenge/blob/rearrangement-challenge-2022)
262-
263-
2. *Edit the requirements file*: Change the [requirements file](https://github.com/Cloud-CV/EvalAI-Starters/blob/master/code_upload_challenge_evaluation/requirements/agent.txt) according to the packages required by an agent.
264-
265-
3. *Edit environment Dockerfile*: You may choose to modify the [Dockerfile](https://github.com/Cloud-CV/EvalAI-Starters/blob/master/code_upload_challenge_evaluation/docker/agent/Dockerfile) which will run the `agent.py` file and interact with environment.
266-
267-
4. *Edit the docker environment variables*: Fill in the following information in the [`docker.env`](https://github.com/Cloud-CV/EvalAI-Starters/blob/master/code_upload_challenge_evaluation/docker/agent/docker.env) file:
268-
269-
```env
270-
LOCAL_EVALUATION = True
271-
```
272-
273-
Example References:
274-
- [Habitat Rearrangement Challenge 2022 - Random Agent](https://github.com/facebookresearch/habitat-challenge/blob/rearrangement-challenge-2022/agents/random_agent.py): This is an example of a dummy agent created for the [Habitat Rearrangement Challenge 2022](https://eval.ai/web/challenges/challenge-page/1820/overview) which is then sent to the evaluator (here, [Habitat Benchmark](https://github.com/facebookresearch/habitat-lab/blob/b1f2d4791a0065d0791001b72a6c96748a5f9ae0/habitat-lab/habitat/core/benchmark.py)) for evaluation.
275-
27686
### Writing Remote Evaluation Script
27787

27888
Each challenge has an evaluation script, which evaluates the submission of participants and returns the scores which will populate the leaderboard. The logic for evaluating and judging a submission is customizable and varies from challenge to challenge, but the overall structure of evaluation scripts is fixed due to architectural reasons.
@@ -319,92 +129,3 @@ Here are the steps to configure remote evaluation:
319129
The `evaluate()` method also accepts keyword arguments.
320130

321131
**IMPORTANT** ⚠️: If the `evaluate()` method fails due to any reason or there is a problem with the submission, please ensure to raise an `Exception` with an appropriate message.
322-
323-
### Writing Static Code-Upload Challenge Evaluation Script
324-
325-
Each challenge has an evaluation script, which evaluates the submission of participants and returns the scores which will populate the leaderboard. The logic for evaluating and judging a submission is customizable and varies from challenge to challenge, but the overall structure of evaluation scripts are fixed due to architectural reasons.
326-
327-
The starter template for static code-upload challenge evaluation can be found [here](https://github.com/Cloud-CV/EvalAI-Starters/blob/master/evaluation_script/main.py). Note that the evaluation file provided will be used on our submission workers, just like prediction upload challenges.
328-
329-
The steps for writing an evaluation script for a static code-upload based challenge are the same as that for [prediction-upload based challenges](evaluation_scripts.html#writing-an-evaluation-script) section.
330-
331-
Evaluation scripts are required to have an `evaluate()` function. This is the main function, which is used by workers to evaluate the submission messages.
332-
333-
The syntax of evaluate function is:
334-
335-
```python
336-
def evaluate(test_annotation_file, user_annotation_file, phase_codename, **kwargs):
337-
pass
338-
```
339-
340-
It receives three arguments, namely:
341-
342-
- `test_annotation_file`: It represents the local path to the annotation file for the challenge. This is the file uploaded by the Challenge host while creating a challenge.
343-
344-
- `user_annotation_file`: It represents the local path of the file submitted by the user for a particular challenge phase.
345-
346-
- `phase_codename`: It is the `codename` of the challenge phase from the [challenge configuration yaml](https://github.com/Cloud-CV/EvalAI-Starters/blob/master/challenge_config.yaml). This is passed as an argument so that the script can take actions according to the challenge phase.
347-
348-
After reading the files, some custom actions can be performed. This varies per challenge.
349-
350-
The `evaluate()` method also accepts keyword arguments. By default, we provide you metadata of each submission to your challenge which you can use to send notifications to your slack channel or to some other webhook service. Following is an example code showing how to get the submission metadata in your evaluation script and send a slack notification if the accuracy is more than some value `X` (X being 90 in the example given below).
351-
352-
```python
353-
def evaluate(test_annotation_file, user_annotation_file, phase_codename, **kwargs):
354-
355-
submission_metadata = kwargs.get("submission_metadata")
356-
print submission_metadata
357-
358-
# Do stuff here
359-
# Set `score` to 91 as an example
360-
361-
score = 91
362-
if score > 90:
363-
slack_data = kwargs.get("submission_metadata")
364-
webhook_url = "Your slack webhook url comes here"
365-
# To know more about slack webhook, checkout this link: https://api.slack.com/incoming-webhooks
366-
367-
response = requests.post(
368-
webhook_url,
369-
data=json.dumps({'text': "*Flag raised for submission:* \n \n" + str(slack_data)}),
370-
headers={'Content-Type': 'application/json'})
371-
372-
# Do more stuff here
373-
```
374-
375-
The above example can be modified and used to find if some participant team is cheating or not. There are many more ways for which you can use this metadata.
376-
377-
After all the processing is done, this `evaluate()` should return an output, which is used to populate the leaderboard. The output should be in the following format:
378-
379-
```python
380-
output = {}
381-
output['result'] = [
382-
{
383-
'train_split': {
384-
'Metric1': 123,
385-
'Metric2': 123,
386-
'Metric3': 123,
387-
'Total': 123,
388-
}
389-
},
390-
{
391-
'test_split': {
392-
'Metric1': 123,
393-
'Metric2': 123,
394-
'Metric3': 123,
395-
'Total': 123,
396-
}
397-
}
398-
]
399-
400-
return output
401-
402-
```
403-
404-
Let's break down what is happening in the above code snippet.
405-
406-
1. `output` should contain a key named `result`, which is a list containing entries per dataset split that is available for the challenge phase in consideration (in the function definition of `evaluate()` shown above, the argument: `phase_codename` will receive the _codename_ for the challenge phase against which the submission was made).
407-
2. Each entry in the list should be a dict that has a key with the corresponding dataset split codename (`train_split` and `test_split` for this example).
408-
3. Each of these dataset split dict contains various keys (`Metric1`, `Metric2`, `Metric3`, `Total` in this example), which are then displayed as columns in the leaderboard.
409-
410-
A good example of a well-documented evaluation script for static code-upload challenges is [My Seizure Gauge Forecasting Challenge 2022](https://github.com/seermedical/msg-2022).

0 commit comments

Comments
 (0)