Providing Students with Standardized, Cloud-Based Programming Environments at Term’s Start (for Free)

David J. Malan

CS50
10 min readMar 23, 2023

Each year, upwards of 1,000 educators in computer science gather in some city for the Technical Symposium on Computer Science Education, aka SIGCSE, organized by ACM’s Special Interest Group on Computer Science Education, which “addresses problems common among educators working to develop, implement and/or evaluate computing programs, curricula, and courses.” This year, everyone gathered in Toronto, Canada, where CS50’s team co-authored two workshops, one panel, one paper, one birds (bird?) of a feather, and one lightning talk. Among the workshops was Providing Students with Standardized, Cloud-Based Programming Environments at Term’s Start (for Free), in which we presented CS50’s own experience with the same with students at Harvard College, Harvard Extension School, Yale University, and beyond, ultimately introducing attendees hands-on to GitHub Codespaces, which underlies code.cs50.io, CS50’s own adaptation thereof:

For CS50 at Harvard, we have long provided students with a standardized programming environment, to avoid start-of-term technical difficulties that might otherwise arise if students had to install and configure compilers, interpreters, and debuggers on their own Macs and PCs. (For many students, “hello, world” is challenge enough on day 0, without also encountering “command not found” at the same time!) We originally provided students with shell accounts on a university-managed cluster of systems. We then transitioned to a cloud-based equivalent so as to manage the systems ourselves, root access and all. We transitioned thereafter to client-side virtual machines, to scale to more students and enable GUI-based assignments. We have since transitioned to web-based environments, complete with code tabs, terminal windows, and file explorers, initially implemented atop AWS Cloud9 and now, most recently, GitHub Codespaces, an implementation of Visual Studio (VS) Code in the cloud, free for teachers and students alike. In this workshop, we’ll discuss the pedagogical and technological advantages and disadvantages of every approach and focus most of our time, hands-on, on using and configuring GitHub Codespaces itself for teaching and learning. Along the way, attendees will learn how to create their own Docker images and “devcontainers” for their own classes and any languages they teach. Attendees will learn what is possible educationally by writing their own VS Code extensions as well. And how, at term’s end, to “offboard” students to VS Code itself on their own Macs and PCs, so as to continue programming independent of Codespaces.

In this post, a summary of the workshop itself! And for the curious, the workshop’s slides.

Overview of potential solutions for standardized programming environments

As of 2007, CS50 still provided students with accounts on a cluster of Linux systems to which they could SSH. On that cluster, they had access to their own NFS-mounted home directories, inside of which they could run commands like gcc, gdb, et al. Back in 1996, when I took CS50 myself, we would (insecurely!) telnet to something similar!

Terminal window via which a student like John Harvard might SSH to the university’s Linux cluster, known at the time as the New Instructional Computing Environment (nice.harvard.edu).

Because the university administered the cluster (for courses other than ours as well), we did not have root access, so we couldn’t easily install or update software for CS50 alone. Invariably, too, students would encounter technical difficulties outside of business hours, 9 AM to 5 PM, which we ourselves could not fix. (And students’ own programming hours were more like 5 PM to 9 AM!) And so, in 2008, we re-created this cluster ourselves in the cloud, using Amazon EC2, which had debuted two years prior:

CS50’s own cluster of virtual machines in the cloud.

In the cloud, we had precisely the root access we’d sought, but we soon found system administration a full-time job of its own (and a distraction from teaching), with any technical difficulties now up to us to solve.

We eventually transitioned in 2012 to client-side virtual machines, whereby students would download and install a “CS50 Appliance” (running Linux) on their own Macs and PCs instead. Not only did this approach eliminate our cloud-based stressor, it also allowed us to scale to tens of thousands of students via edX, where CS50 became a massive open online course (MOOC). And, because the appliance included Xfce, it even enabled GUI-based problem sets. The appliance, though, was not without its own tech-support headaches. At one point, because of a bug in VirtualBox, simply closing the lid of one’s laptop with the appliance still running could “brick” it. And Windows updates would often break students’ virtual network adapters.

The CS50 Appliance was a virtual machine running Xfce atop Fedora (later Ubuntu) Linux that students would install on their own Macs and PCs.

Fortunately, by 2015, web-based integrated development environments (IDEs) had become viable, thanks to cloud computing, improved internet speeds, and advances in JavaScript. And we replaced the CS50 Appliance with CS50 IDE, a browser-based programming environment built atop Cloud9, then an Amsterdam-based startup, later AWS Cloud9. The IDE offered a tabbed code editor, a graphical file explorer, and, most important pedagogically, a terminal window, all of which were connected to per-student Docker containers in the cloud. The IDE effectively provided each student with their own server in the cloud, to which they even had root access. A catch, though, was that we ourselves had to maintain those containers, and so we found ourselves system administrators again.

CS50 IDE was a browser-based programming environment built atop AWS Cloud9.

We have since transitioned to GitHub Codespaces, though, atop which we’ve built code.cs50.io, which effectively provides each student with their own installation of Visual Studio Code (VS Code) in the cloud, complete with a tabbed code editor, a file explorer, and a terminal window, all connected to their own Docker container, otherwise known as a “codespace.” Those codespaces, though, are administered by GitHub atop Azure, which has freed us to focus all the more on teaching itself. And, thanks to a VS Code API, we now spend more time developing pedagogically motivated extensions for VS Code itself than on DevOps more generally.

CS50 students now use VS Code in the cloud via code.cs50.io, CS50’s adaptation of GitHub Codespaces for students and teachers.

Client-side and server-side alternatives to Codespaces abound, though none with quite the same extensibility and support model. Over the years, we ourselves have not just tried but have actually used and/or liked (👍) several of the below.

Client-Side

Server-Side

Customizing Codespaces by writing one’s own Dockerfile

The codespaces that code.cs50.io creates for students are based on an image, cs50/codespace, that we ourselves maintain via a Dockerfile. That image inherits from another image, cs50/cli, which we also maintain via a Dockerfile for cli50, which is essentially a headless version of the same.

Collectively, those Dockerfiles pre-install software that students might need for CS50 itself, including clang for compiling, gdb for debugging, and cowsay for cowsaying. Within the Dockerfiles, for instance, are instructions like these:

RUN apt update && apt install --no-install-recommends --yes clang gdb cowsay

Anytime we push changes to those Dockerfiles to GitHub, a GitHub Action rebuilds our images, thereafter pushing them to Docker Hub as well as to GitHub’s own Container registry.

If you have Docker installed on your own computer, in fact, you can try out cs50/cli itself with:

docker run --interactive -tty cs50/cli

Or, more succinctly:

docker run -it cs50/cli

We also pre-install (after compiling from source) other software for students with more-complicated instructions like these, rather than expect them to copy/paste (and troubleshoot!) the same:

RUN cd /tmp && \
curl https://www.python.org/ftp/python/3.11.2/Python-3.11.2.tgz --output Python-3.11.2.tgz && \
tar xzf Python-3.11.2.tgz && \
rm --force Python-3.11.2.tgz && \
cd Python-3.11.2 && \
./configure && \
make && \
make install && \
cd .. && \
rm --force --recursive Python-3.11.2 && \
ln --relative --symbolic /usr/local/bin/pip3 /usr/local/bin/pip && \
ln --relative --symbolic /usr/local/bin/python3 /usr/local/bin/python && \
pip3 install --upgrade pip

Of course, it’s not necessary to use CS50’s own Dockerfiles, or even code.cs50.io, in order to use Codespaces yourself. Students and teachers alike can apply to be verified for benefits at https://education.github.com/discount_requests/application to use Codespaces (and more) for free.

Once verified, create a new (private) repository at https://github.com/new, and be sure to Add a README file (or any other file) so that the new repository isn’t bare. Then, visit the new repository’s Code tab (whose URL should be of the form https://github.com/{OWNER}/{REPO}), click the green <> Code button, click the Codespaces tab beneath it, and Create codespace on main. Within a few seconds, you should see VS Code, connected to a codespace running GitHub’s default image, based on this Dockerfile (because you didn’t supply your own).

Of course, that image might not have everything you want:

$ cowsay
bash: cowsay: command not found

And so you’re welcome to create your own Dockerfile too! Return to your repository’s Code tab, click Add file, click Create new file beneath it, and create a new file called Dockerfile with lines like these:

FROM ubuntu:latest

RUN apt update && apt install --yes git python3-pip
RUN pip install cowsay

Or push the same to the repository using git in the usual way.

Then, via similar steps, create a file called .devcontainer.json (with the leading dot) with lines like these, so that GitHub knows to build an image using your Dockerfile:

{
"build": { "dockerfile": "Dockerfile" }
}

Then, one more time, return to your repository’s Code tab, click the green <> Code button again, and click the + button to Create a codespace on main again, this time using your own Dockerfile. That codespace might take longer to build, because it isn’t yet cached by GitHub. But you should eventually see VS Code atop a codespace running your very own image, which you can confirm with a command like:

# cowsay hello, world
____________
| hello, world |
============
\
\
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||

Rather than create a new codespace each time you edit your Dockerfile (or .devcontainer.json), you can edit the copy of the same within an existing codespace and then Rebuild Container via VS Code’s command palette to see the changes more quickly. But for the changes to persist for future codespaces, too, you should commit them to actual GitHub repository.

Within an existing codespace, meanwhile, you can modify VS Code’s Settings and Themes in the usual way, via the activity bar’s gear icon, and even install Extensions.

Customizing Codespaces further by defining one’s own .devcontainer.json

You can alternatively customize settings, themes, and extensions for VS Code in .devcontainer.json itself. And you can even specify a Docker image that’s already been built and pushed to a registry, a la CS50’s own. For instance, the below would prescribe that a codespace be built using Docker Hub’s official image for Python. (No need for a Dockerfile.) And it would pre-install Microsoft’s Python extension and GitHub CLI, the latter of which happens to be installable as a feature (available among others) so that you needn’t resort to your own Dockerfile just to install it. The below would also activate VS Code’s GitHub Dark Default theme, which CS50 itself uses.

{
"image": "python:latest",
"extensions": [
"ms-python.python"
],
"features": {
"ghcr.io/devcontainers/features/github-cli:1": {}
},
"settings": {
"workbench.colorTheme": "GitHub Dark Default"
}
}

Quite a bit more can be configured via .devcontainer.json as well, so much so that there’s a formal specification for such. In our own JSON, not only do we prescribe our own Docker image and simplify VS Code’s interface for students via settings, we also forward several TCP ports, set environment variables, and even run a postCreateCommand.

Readying Codespaces for one’s own class

CS50’s own adaptation of Codespaces at code.cs50.io essentially automates creation of repositories and codespaces for students. But much of the same can be achieved via GitHub Classroom, too, which uses GitHub’s own APIs similarly.

Once verified for benefits, a teacher can create an organization on GitHub and then, via GitHub Global Campus, click a green button at https://education.github.com/globalcampus/teacher to Upgrade to GitHub Team that organization. A teacher can then create a New classroom at https://classroom.github.com/classrooms using the same. Within that organization can a teacher then create any number of template repositories, each of which can have its own .devcontainer.json (and, optionally, Dockerfile) along with any starter code. Each of those templates can then be used for a New assignment within Classroom. Students can then “accept” each assignment via an invitation URL, which will create within the teacher’s organization a repository based on that template just for that student (to which they can push). And via that repository’s green button can the student create their own codespace for it.

Contact

Email sysadmins@cs50.harvard.edu with questions!

Appendix

Acknowledgements

Special thanks to CS50’s own Bernie Longboy, Carter Zenke, Charlie Liu, Doug Lloyd, and Rongxin Liu as well as GitHub’s own Matthew Dyson and Per Hammer for their help with this workshop!

And our thanks to Amazon, GitHub, and Microsoft for their support of this work. At the time of writing, Malan is also consulting part-time for GitHub as a Professor in Residence.

--

--

CS50

Harvard University’s introduction to the intellectual enterprises of computer science and the art of programming.