As per SSWConsulting/SSW.Website#1263, we need a way to track the progress for which pages have been migrated to the v3 website.
This repository serves as a tracker of which pages have been migrated, we will zz each page that we have migrated to the v3 Next.js site.
We are not zzing pages on the original repo because we want those pages to stay live so we can compare the new pages with them.
v1 site - https://dev.azure.com/ssw2/ssw.website
bypass frontdoor (to see old pages) - https://prod.ssw.com.au/
Effectively this you can update /archive
like a project with an automated deployment for pipeline. There's a GitHub action that automatically syncs the contents of /archive
to blob storage. This means all you need to do to update one of the migrated pages is get your PR merged. After your PR is merged with into main you will be able to view your the-updates on the website under ssw.com.au/archive
. These changes are synced, meaning that if a file is deleted from the repo it will also be deleted from blob storage.
Important note If you add or delete one of the pages in the repo make sure you will need to update the sitemap. You can do this by running sitemap_generator.py at the base of the repo (SSW.Website-v1-Progress).
- Python 3.x - https://www.python.org/downloads/
- Latest Chrome Driver - you can download it from here: https://googlechromelabs.github.io/chrome-for-testing/#stable
This script will save every page from the v1 website as HTML files in the history
folder. It will scan each page and locate any images and save them in the history
folder as well, preserving the original path of the images on the v1 site.
-
Open
html_archiver.py
file and go to the line of code of code below and make sure you point it to chrome driver (see prerequisites section above).service = Service("C:\\selenium\\chromedriver.exe")
-
Install required modules
pip install requests pip install selenium pip install bs4 pip install html5lib
-
If using MacOS - go to the line of code below and make sure use forward slash
/
instead of\\
split_path = item_path.split("\\")
-
Run the Python script that generates the
history
folder, run the following command:python html_archiver.py
Script + CSS Linking -
fix_scripts
andfix_css
Modified existing core CSS and JS files exist in the history
folder, and will be added to each page if required. Files include:
jquery.js
- JQuerymenu.js
- Megamenu scriptmoment.js
- Moment.jsssw_pigeon.js
- the most important script to include, the entire JS bundle for most of the v1 sitesssw_raven_print.css
ssw_raven.css
- All contents of the
css
directory (Base.css
,Content.css
, etc.) - only for archiving Standards
The script also removes all iframe
and script
tags to ensure we do not recieve unecessary noise + interaction (e.g. Chatbase, Google Analytics, etc.) when we are viewing these archived pages.
Image Downloading -
fix_images
By default, the script will download all images from every page it saves from the v1 site and save them in the history
folder. It will save them in the same relative path as the original image on the v1 site e.g. ssw.com.au/ssw/images/Raven/SSWLogo.svg
will save in history/images/Raven/SSWLogo.svg
.
Image Replacement -
fix_images
By updating the IMAGE_REPLACEMENTS
dictionary in the html_archiver.py
file, you can replace any image on the v1 site with another image. The key (e.g. adam_thumb.jpg
) will replace any image URL that ends with the key (e.g. https://www.ssw.com.au/ssw/Standards/Images/adam_thumb.jpg
) with a non-broken image (e.g. https://www.ssw.com.au/ssw/Events/Training/Images/adam_thumb.jpg
) and then save that image instead of the broken original image.
Link Replacement -
fix_links
This function replaces broken links, and links already archived (i.e. have za
as the file prefix in the SSW.Website.WebUI
folder).
Banner Addition -
add_archive_header
This function adds a banner to the top of the page to indicate that the page has been archived.
Figure: The banner added to each page
Index Page Creation -
output_index_page
This function generates a table of the pages that have been archived, and the links to the old pages.
Figure: Index page example for /Training
To run the Python script that generates the todos.md
file, run the following command:
python todo_outputter_md.py
This will find and list all the aspx files in the project with associated directories, and tick off pages that have zz or za at the start.
Additionally, to generate a todos-notdone.md
file that lists all the pages that are not yet migrated, run the following command:
python todo_outputter_md.py --notdone-only
To run the Python script that generates the todos.csv
file, run the following command:
python todo_outputter_csv.py
This will output a file called todos.csv
that can be exported to Excel if required. The status of a page is determined by the prefix of the page name (more detail on prefixes below).
All migrated files in your branch will automatically be pushed to blob storage when your PR is merged with main. Alternatively you can push the files to blob storage by triggering the workflow manually here.
- Turn off the v1 website's server
- Remove this repo once all pages have been migrated to v3
- zz - migrated to v3 or contains no content
- zr - redirects to another page
- za - migrated to archive
Page
- URL of the pageStatus
- Either Done, Archived, Redirect or To-doPriority
- value in powers of 2 e.g. 1, 2, 4, etc. - higher no. = higher priorityComplexity
- value in powers of 2 e.g. 1, 2, 4, etc. - higher no. = higher complexityNotes
- additional info if required
The video below explains how to archive pages using the script and submit changes to the site