A Naturally-Occurring Dataset Based on Stack Exchange Data

SEDE
SEDE (Stack Exchange Data Explorer) is new dataset for Text-to-SQL tasks with more than 12,000 SQL queries and their natural language description. It’s based on a real usage of users from the Stack Exchange Data Explorer platform, which brings complexities and challenges never seen before in any other semantic parsing dataset like including complex nesting, dates manipulation, numeric and text manipulation, parameters, and most importantly: under-specification and hidden-assumptions.
Paper (NLP4Prog workshop at ACL2021): Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data.
Setup Instructions
Create a new Python 3.7 virtual environment:
python3.7 -m venv .venv
Activate the virtual environment:
source .venv/bin/activate
Install dependencies:
pip install -r requirements.txt
Add the project directory to python PATH:
export PYTHONPATH=/your/projects-directories/sede:$PYTHONPATH
One can run all