By Judy Thai. ApplyUC is the online web application that students use to apply for admission to the University of California’s nine undergraduate campuses. In 2020, UC dropped the requirement for standardized test scores like the SAT and ACT, and that led to two years of an unprecedented surge in applications. This increased demand contributed to the applyUC undergraduate application being intermittently unavailable when thousands of “eleventh-hour” applications were submitted before the 11:59PM deadline on November 30.
As a high-profile application in UC Office of the President’s (UCOP) portfolio, applyUC’s availability problems were starting to impact the university’s reputation, and applyUC needed to be able to handle the continued and increased demand. As the application cycle approached in 2022, there could not be another outage.
An aggressive timeline: six weeks to launch
In April 2022, leadership in the IT and the Graduate, Undergraduate and Equity Affairs (GUEA) departments at UC Office of the President decided to move applyUC to Amazon Web Services (AWS). The goal was to provide the system with increased scalability, stability, and swifter responses to potential system outages.
The IT and GUEA teams had already planned other system changes for the year, however, so the AWS move would be in addition to work already in progress. For example, the IT team was in the middle of upgrading the operating system on all the servers, and that is not a task that can be rolled back or abandoned.
ApplyUC operates on the same application cycle every year. We had to complete the move to AWS before July 1, 2022, when applyUC also opens for admission for the winter/spring term. So, the goal was to move to AWS on June 17, 2022, giving the team a couple of weeks to catch any issues before applicants started logging in.
This meant we had six weeks to complete a project that typically takes six to twelve months.
The approach and trade-offs
The project started with two major requirements: (1) the understanding that we needed more and larger servers, and (2) we had to adhere to an aggressive timeline.
There are different ways to move an application to AWS, but, given our six-week timeframe, there was no time to rewrite and retest the code. We could only entertain the fastest option, which is to “rehost” the application as-is in the cloud and make the necessary configuration changes. It wasn’t apparent at first, but we realized that we needed a project manager to ensure we adhered to the timeline, since the IT team was focused on doing the work.
The steps to success
- First, the infrastructure had to be created. That involved building out the network and servers, installing software, and adjusting configurations for AWS. This had to be done for each environment: development, quality assurance (QA), load test, and production.
- Next, we had to test the system. That is how we realized we hadn’t accounted for moving email functionality to AWS. It’s an essential feature because the system sends messages to students about account creation, password resets, confirmation of application submission, etc. Given the short timeframe and the remaining tasks that had to be completed, we continued using the email server in the data center, rather than moving it to AWS and introducing an unknown variable.
- We made other concessions as well. ApplyUC executes dozens of batch processes behind the scenes to support operational processes, such as loading auxiliary data. There wasn’t enough time in the schedule to move the batch processes to AWS, so we left them at the data center and moved them to AWS post-launch. Additionally, we allotted three weeks for GUEA to perform user acceptance testing (UAT), but we ran out of time and UAT occurred in a few days.
- At this point in the project, we had only one week to cut over the development, QA, and production environments to AWS. But we did it! We managed to move all of the environments in time, with the production environment going live as scheduled on June 17, 2022.
Twice as many users
We also needed to load-test applyUC in AWS so we could identify any performance issues and make any necessary adjustments. In a load test session, a software program creates the scenario of a certain number of people simultaneously performing actions on a system. The IT team watches the system metrics as the software simulates the demand that applyUC might experience in the final days of November.
In prior years, we only had the budget to load-test up to 20,000 users, but given the high visibility of the system and potential for issues, we had to test up to 100,000 users. We were able to negotiate with the load test vendor, who agreed to additional load tests at no extra cost!
Since the system sees most of its traffic in November, we completed the load tests in July and August, after moving to AWS. However, the load tests weren’t going well. At around 37,000 users, the system started struggling and was unable to recover. We brought in performance engineers to conduct an end-to-end evaluation of the system. The application and infrastructure teams performed additional research to identify what was leading to the system’s struggles.
We made a few system adjustments based on everyone’s recommendations and research, and that did the trick. In the final load test, applyUC was able to support 75,000 concurrent users performing various actions on the system for over an hour. That’s twice as many concurrent users as we’ve ever seen!
Now we had no more time to continue testing. We had to move the configuration changes to the production environment by October 1, which is when students can begin submitting their applications.
The system did not encounter any issues or outages in 2022, even though it received almost 15,000 submissions in the last hour — a new record (up from 12,503 in 2021)! It’s interesting to note that when the system struggled in 2021, the servers utilized 91% of the CPU. In AWS in 2022, the system barely used 4% of the CPU.
Some things went well. The IT team had been upskilling their experience with monitoring tools since 2021, so they were able to create some informative dashboards that executives used to easily understand how the system was performing. Other, more in-depth dashboards were helpful for the technical team members.
Some things could have gone better. The decision to move to AWS should have been made sooner, giving the team more time and requiring fewer trade-offs. A project manager should have been assigned from the very start, freeing up the teams to focus on existing commitments and the additional AWS work.
We had to make difficult choices and trade-offs. The email server had to stay behind, and we conducted the load test later than usual in the year. Also, when applyUC was located in the data center, we put the business continuity (BC) system in AWS. After we moved the system to AWS, we had to move the BC to a different AWS region to provide redundancy.
Higher education is not known for moving quickly. That we got this done in six weeks is a testament to dedicated employees supporting the UC mission. There is still more work to do, but we accomplished what we set out to achieve: ensuring the stability of the applyUC system during the peak period so students can apply for admission and shape their futures.
About the author
[Cover photo caption: UC Application website screenshot.]