Open source self-hosted web archiving

ArchiveBox is a powerful self-hosted internet archiving solution written in Python. You feed it URLs of pages you want to archive, and it saves them to disk in a variety of formats depending on setup and content within.
?
Run ArchiveBox via Docker Compose (recommended), Docker, Apt, Brew, or Pip (see below).
apt/brew/pip3 install archivebox
archivebox init # run this in an empty folder
archivebox add 'https://example.com' # start adding URLs to archive
curl https://example.com/rss.xml | archivebox add # or add via stdin
archivebox schedule --every=day https://example.com/rss.xml
For each URL added, ArchiveBox saves several types of HTML snapshot (wget, Chrome headless, singlefile), a PDF, a screenshot, a WARC archive, any git repositories,