Haphazard scripts for scraping bitcoin/bitcoin data from GitHub
This is a quick-and-dirty tool used to scrape bitcoin/bitcoin pull request and
commentary data.
Each output/
folder contains
comments.json
: an aggregated list of both issue and review comments, in Github’s
original formatcommits.json
: a list of commit objects corresponding to the PR, in Github’s
original formatpr.json
: the pull request object, in Github’s original formatcomments_abbrev.csv
: abbreviated representation of each comment in CSV formatpr_abbrev.csv
: abbreviated representation of the PR in CSV formatdone
: the datetime we retrieved the PR data
Limitations
Right now this doesn’t really handle open PRs (or PRs that are expected to be updated)
properly since it will not refresh data once the done
sentinel is created. This could
be fixed by comparing various timestamps to the done
sentinel and overwriting.