MTurk codebase for the study "SQL vs. Visual Diagrams on time and correctness matching relational query patterns"
Code and instructions for running the study "SQL vs. Visual Diagrams on time and correctness matching relational query patterns" using Amazon Mechanical Turk (MTurk), Heroku, and Postgres.
- Remarks
- MTurk Initial Setup and Overview
- Useful Commands
- Instructions for dealing with MTurk interactions
Notice that some fields such as: DATABASE_URL, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY need to be specified accordingly when setting up the Postgres database on Heroku and using AWS keys with MTurk.
!!Warning!! Tutorial time is not currently captured correctly due to a database bug.
- Register on https://requester.mturk.com/ for deployment and https://requester.mturk.com/developer/sandbox for testing.
- Deploy to Heroku by committing and pushing the repository with
git push heroku master. - Run
post_hits.pyto post the hits on Amazon Mechanical Turk - Amazon Mechanical Turk will post your HIT, and IFrame your URL when a user accepts it.
- Once a user completes the HIT it will be logged in the database. For more options, check the hit_manager.py
-
Update WSL
wsl --update
-
Install Unbuntu from the Windows store to keep it current.
-
With VSCode open from Windows, install https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.vscode-remote-extensionpack
-
Within WSL Bash, run VSCode from the folder with:
code .(See https://code.visualstudio.com/docs/remote/wsl for details.)
-
Upgrade packages:
sudo apt update sudo apt upgrade
-
Upgrade Python to 3.11 (necessary for Heroku deployment)
sudo apt install software-properties-common sudo add-apt-repository ppa:deadsnakes/ppa sudo apt update sudo apt install python3.11-full python3.11-dev python3.11-venv gcc python3.11 -m ensurepip
!Danger! Don't do the following unless you want to risk breaking your terminal! But it does let you set the default
python3to bepython3.11sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
And choose which one to use as Python3 via the command:
sudo update-alternatives --config python3
-
Install Postgres (and libssl-dev)
sudo apt install postgresql postgresql-contrib libssl-dev
-
Optional: Install pgadmin for managing the DB:
-
If using WSL, install on Windows by downloading from pgadmin.org. See details at StackOverflow.
-
If pure Ubuntu:
sudo apt install pgadmin4
-
-
Set a postgres Ubuntu user password:
sudo passwd postgres
E.g.,
it56uZ. -
Set a postgres database user password:
sudo -u postgres psql
Inside the
psqlshell, set the password. Make sure to set your own value forNEWPASSWORDbefore running:ALTER USER postgres PASSWORD 'NEWPASSWORD';
-
Set up the postgres databases and user for the app. Still in the
psqlshell:-
Create the database and list the ones present.
CREATE DATABASE rdstudy; \l
-
Then, create the user
flask. Make sure to set your own value forNEWPASSWORDbefore running:CREATE USER flask WITH PASSWORD 'NEWPASSWORD'; GRANT ALL PRIVILEGES ON DATABASE rdstudy to flask;
-
Exit
psqlby running:\q
-
-
See status with
service postgresql status
-
Start the server with
sudo service postgresql start
-
To avoid getting connection refused errors, edit the
postgresql.conffile.-
Locate the conf file:
sudo -u postgres psql -c 'SHOW config_file' -
Edit the file. E.g.:
sudo nano /etc/postgresql/14/main/postgresql.conf
-
In the file, uncomment
listen_addressesand change it like so:listen_addresses = '*'
-
Then restart postgres using
sudo service postgresql restart
-
-
-
Create a
.envfile that holds your environmental variables.-
Generate a
FLASK_SECRET_KEY, e.g., running this in the Python interpreter:import os os.urandom(24) '\xfd{H\xe5<\x95\xf9\xe3\x96.5\xd1\x01O<!\xd5\xa2\xa0\x9fR"\xa1\xa8' print(os.urandom(24).hex())
-
Then fill out the
.envfile something like this, ensuring that you fill in the values forXXXXXbelow. Use:AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYfrom https://requestersandbox.mturk.com/developer for the Sandbox or https://requester.mturk.com/developer for live deployment.- The
FLASK_SECRET_KEYyou generated - The password you set for the postgres
flaskaccount as part ofLOCAL_SQLALCHEMY_DATABASE_URI.
FLASK_DEBUG=True FLASK_APP=rd_study_server.py LOCAL=True TESTING=True AWS_SANDBOX=True AWS_ACCESS_KEY_ID=XXXXX AWS_SECRET_ACCESS_KEY=XXXXX AWS_CHECK_QUAL=True AWS_ALLOW_QUAL_ERROR=True FLASK_SECRET_KEY=XXXXX LOCAL_SQLALCHEMY_DATABASE_URI=postgresql://flask:XXXXX@localhost:5432/rdstudy SQLALCHEMY_TRACK_MODIFICATIONS=False WEB_CONCURRENCY=2
-
-
Install Python requirements in a virtual environment.
-
Install wheel for building packages and ensure
libpq-fe.his available from libpq-dev:python3.11 -m pip install wheel sudo apt-get install --reinstall libpq-dev
-
Create the virtual environment:
sudo python3.11 -m venv env source env/bin/activate -
Install wheel and then the requirements:
python3.11 -m pip install -r requirements.txt
-
Run db_create.py to populate the database. This currently only works with running debugging from VSCode..., i.e., debugpy. It is unclear why...
sudo -u postgres psql -d rdstudy
in postgres
\dt shows you the tables and SELECT * FROM USERS; shows you an empty table with columns.
flask runTo view the running site, use, for example: http://127.0.0.1:5000/?workerId=AA&assignmentId=BB&hitId=CC
Create a pipeline on Heroku that will be created from GitHub.
Locally:
heroku plugins:install heroku-config
heroku loginto switch to our app
heroku domains -a rd-studywhere rd-study is the app name on Heroku.
This opens the website:
heroku open -a rd-study
To overwrite existing values, use
heroku config:push --file=.env.live -a rd-study -oWarning: This fails silently if the file doesn't exist.
make sure to log in fresh to the latest deployment, then:
heroku run bash --app rd-study
python3 db_create.pyYou can test it with gunicorn like so:
gunicorn --preload rd_study_server:app --log-file - --log-level=debugTo view the running site, use, for example:
- Local: http://127.0.0.1:8000/?workerId=AA&assignmentId=BB&hitId=CC
- Live https://rd-study.herokuapp.com?workerId=AA&assignmentId=BB&hitId=CC
For testing MTurk
heroku config:push --file=.env.sandbox -a rd-study -o
heroku ps:restart -a rd-study
For live MTurk
heroku config:push --file=.env.live -a rd-study -o
heroku ps:restart -a rd-study
Papertrail logging (paid)—Note that this plan has a 65MB/day limit which you can easily exceed even running 60 participants. We recommended you use a higher plan.
heroku addons:create papertrail:fixaTo export logs, you can use the scripts found in /logs/papertrail.
Access Papertrail through the Heroku site.
To see the database with, e.g., PGAdmin:
-
Get the value of
DATABASE_URLon Heroku:heroku config:get DATABASE_URL -a rd-study
It is of the form
postgres://USERNAME:PASSWORD@HOST:PORT/DATABASE -
Set under Connection:
- Hostname/address:
HOST - Port:
PORT - Maintenance database:
DATABASE - Username:
USERNAME - Password:
PASSWORD
- Hostname/address:
-
Set under Advanced:
- DB restriction:
DATABASE
- DB restriction:
-
Click Save.
-
Navigate to the database > Schemas > public > Tables > users. Right-click and select View/Edit Data > All Rows.
Here are some options you can create:
- .env.local.sandbox for local development and sandbox grading
- .env.local.live for local development and MTurk live grading
- .env.sandbox.test to use for testing the MTurk Sandbox site.
- .env.sandbox for more production-ready testing on the MTurk Sandbox site. Turns off error display to users and requires qualifications.
- .env.live.test to use for the live MTurk website.
- .env.live to use for the live MTurk website. Turns off error display to users and requires qualifications.
Create your AWS account and an associated MTurk account.
- Ensure your environment variables are set. You can use a line like this to load one of the environment files into environment variables, in this case
.env.sandbox.text:
set -o allexport && source .env.sandbox.test && set +o allexportlikewise, for the actual grading of the submitted HITs:
set -o allexport && source .env.live && set +o allexportNote! All your .env files need to have LF and not CRLF line endings for this to work properly. Otherwise, you'll get errors like
botocore.exceptions.HTTPClientError: An HTTP Client raised an unhandled exception: Invalid header value. You can check this with, e.g., cat -t .env.sandbox.text.
You can check variables in general with printenv | grep AWS.
Creates a qualification using questions from qualification_questions.xml and answers from qualification_answers.xml.
Uses the AWS_SANDBOX, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY environment variables.
!!!WARNING!!! hard-coded text for the qualification details! Make sure to at least change the Name and hard-coded bits in post_hits.py.
Run in the terminal. Pass in one of these arguments:
test: Creates the basic qualification.custom: Creates a custom qualification for invited workers only, e.g., those who had errors taking the test.test_taken: Creates a test taken qualification to eliminate workers who have taken the test previously.
E.g., inside the virtual environment, you'll need to run both:
python ./create_qualification.py test
python ./create_qualification.py test_taken
Record the QualificationIds to use in post_hits.py for the qualification_id and taken_test_qualification_id variables.
If you get a RequestError about having a QualificationType with this name already, you need to change the hard-coded Name= part of the file or delete the existing qualification at
https://requestersandbox.mturk.com/qualification_types or https://requester.mturk.com/qualification_types.
Creates a HIT. Uses the AWS_SANDBOX, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY environment variables.
!!!WARNING!!! hard-coded text!
!!!WARNING!!! The HITs you create programmatically here Do Not show up on the web management interface! Amazon has deprecated that feature—aargh!
-
Update the
<ExternalURL>tag inexternal_question.xmlto be the URL of your Heroku app. -
Update all these hard-coded elements in
post_hits.py(some docs on MTurk docs), and read the file!qualification_id: The basic qualification.custom_qualification_id: A custom qualification for invited workers.taken_test_qualification_id: A test taken qualification to eliminate workers who previously took the test.base_pay: The lowest level of reward.approval_percentageminimum_qualification_scoretitle_strdescription-strMaxAssignmentsLifetimeInSecondsAssignmentDurationInSeconds
Run inside the virtual environment with one of these arguments:
full: Regular full-duration HIT.pilot: Shorter pilot HIT.custom WID QID: Post a custom hit for the worker with IDWIDwho has been given a custom qualification with IDQID.
E.g.,
python ./post_hits.py full
Has lots of code for various things. Make sure to read the code before running it! Run in the terminal. Pass in one of these arguments followed by parameters:
summary: Provides a summary of the last 100 hits
!!!Warning!!! Everything below needs to be checked to see if it needs a paginator added to handle more than 100 records.
balance: Gets current prepaid HIT balance.clear: Deletes all HITs except the ones in a !!!WARNING!!! hard-codedexcept_list. Will auto-reject all assignments pending in the HIT!extend NUM: AddNUMmore assignments. !!!WARNING!!! hard-codedhit_id.hits_detail HID1 HID2: Get details for two HIT IDs.get_assignments HID STATUS: Get assignments for HIT IDHIDwith statusSTATUSone of['Approved', 'Rejected', 'Submitted'].get_worker_id_list HID: Get worker IDs for HIT with IDHIDthat are Approved or Rejected.approve_qualifications QID: Approve qualifications for qualification IDQID. !!!WARNING!!! hard-codedaccept_listinapprove_qualificationsdefinition.update_expiration HID: Update the expiration for HIT IDHID. !!!WARNING!!! hard-codedExpireAtinupdate_expirationdefinition.give_worker_qualification QID WID: Give qualification with IDQIDto worker with IDWID.set_taken_test_qualification QID WFILE: Read worker IDs fromWFILEwhich has one ID per line and ADD to each worker the qualification with IDQID.remove_qualification QID WFILE: Read worker IDs fromWFILEwhich has one ID per line and REMOVE from each worker the qualification with IDQID.get_workers_with_qualification QID: List the workers with qualification IDQID.get_qualification_score QID WID: Get the qualification score on qualification with IDQIDfor worker with IDWID.notify_workers_with_qualification QFILE TFILE: Notify all workers listed in the qualified workers fileQFILE(one ID per line) that are in the file of workers that haven't taken the HITTFILE(one ID per line). !!!WARNING!!! Hard-coded advertisement message.
Deals with submissions.
!!!Warning!!! Hard-coded messages to workers here, including the reject_message variable.
Depends on the REMOTE_DATABASE_URI environment variable being set to point to the Heroku Postges Database. Note: This will change regularly! There are two ways to get this value:
-
Access through Heroku site, e.g., https://dashboard.heroku.com/apps/rd-study/settings
-
Use the Heroku CLI:
heroku config:get DATABASE_URL -a rd-study
It is of the form
postgres://USERNAME:PASSWORD@HOST:PORT/DATABASE
Ensure the environment variables are set. E.g., for live payment:
set -o allexport && source .env.local.live && set +o allexport
!!!Warning currently these claim to update the DB saying who was paid but do not actually. Use the contents of the /logs folder to check status.
Pass in one of these arguments:
batch_grade HID: Check Submitted assignments for a given HIT ID and Approve them.batch_grade_test HID: Check Submitted assignments for a given HIT ID and Approve them.send_manual_bonus WID AID: Send a bonus to a given worker ID for given assignment ID (because we accepted but didn't send a bonus the first time.)reject AID FEEDBACK: Reject the given assignment ID with a given feedback. E.g., rejecting speeders.approve AID: Approve the given assignment ID like normal.grade AID WID: Grade and approve hits as necessary for a given assignment ID and worker ID
E.g., using your HID:
python ./approve_hits.py batch_grade HID