Download page including assets loaded at runtimeΒΆ

curl and wget can be used to archive a site but even with the most exotic command line switches they will omit dynamically loaded assets pulled via javascript during run time. curl and wget will not follow them and the site will not work for later offline viewing. To download all the assets including dynamically loaded data we can generate a HAR file from the developer tools view in chrome.

This is useful for saving a page for archival purposes where you can be quite certain that the original functionality will work even if all the web assets are no longer online since they will be available locally for you to host.

The HAR file contains all the assets loaded so far, so before generating it, make sure all the runtime stuff you want to capture has been activated.

This is also useful for offline debugging of an SPA as you can instruct users to activate the bug then send you the HAR file. You will see their view of the UI and the API responses.

Once the HAR file is saved you can use this tool to extract it and then serve the content locally, for me everything worked the first time:

npx har-extractor <harfile> --output /path/to/output
# Then serve the content locally
python3 -m http.server 8080


comments powered by Disqus