All assignments are due at 8:00 PM on the due date. There is a late penalty of 7.5% per-day for up to a maximum of 2 days. All assignments will be posted at least 2 weeks prior to its due date. We will have a mix of both written and programming assignments. All assignments will be posted on this page. All assignments should be submitted using Canvas.
Tentative assignment release schedule is listed below:
|
Posted |
Due |
CSx55-HW1
|
8/25/25 |
9/24/25 |
|
|
|
| |
|
|
CS455-HW2
|
9/10/25 |
10/8/25 |
| CS455-HW3 |
10/1/25 |
10/29/25 |
| CS455-HW4 |
10/22/25 |
11/20/25 |
|
|
|
| |
|
|
| CS555-HW2 |
9/10/25 |
10/8/25 |
| CS555-HW3 |
10/1/25 |
10/29/25 |
| CS555-HW4 |
10/22/25 |
11/19/25 |
|
|
|
| |
|
|
| CSx55-TermProject |
|
Multiple Deliverables
{See below} |
For testing purposes, here's a tmux script that
multiplexes your terminal for easy navigation across multiple computers
CS455/555: HW1 Using Minimum Spanning Trees to Route Packets in a Network Overlay
The objective of this assignment is to get you familiar with coding in a distributed setting where you need to manage the underlying communications between nodes. Upon completion of this assignment you will have a set of reusable classes that you will be able to draw upon. As part of this assignment you will be: (1) constructing a logical overlay over a distributed set of nodes, and then (2) computing minimum spanning trees to route packets in the system. Additional details are available here.
CS455: HW2 Synchronization and Coordination Using Thread Pools
The objective of this assignment is to get you to be comfortable with threads and synchronization mechanisms. Another objective of this assignment is to introduce the role that data structures and locking mechanisms play in designing concurrent programs. Additional details are available here.
CS555: HW2 Distributed Load Balancing of Computational Tasks Using Thread Pools
As part of this assignment, you will be leveraging thread pools in a distributed environment to alleviate computational load imbalances across a set of computational nodes. The computational task being load balanced in this assignment is similar to the proof of work computation that is performed in cryptocurrencies such as BitCoin. Additional details are available here.
CS455: HW3 Analyzing Air Quality Data in the United States Using MapReduce
The objective of this assignment is to gain experience in developing MapReduce programs. As part of this assignment, you will be working with data collected from the EPA’s Air Quality System (AQS). You will be developing MapReduce programs that parse and process recordings of temperature and criteria gas levels at various outdoor monitors. You will be using Apache Hadoop
(version 3.4.0) to implement this assignment. Additional details are available here.
CS555: HW3 Implementing the Pastry P2P System
As part of this assignment you will be implementing a structured P2P system. Specifically, you will be implementing the Pastry P2P network where individual peers have a 16-bit identifers, and thus can support up to 64K peers. This assignment will account for 10 points towards your cumulative course grade. Additional details are available here.
CS455: HW4: Analyzing the MovieLens Dataset Using Spark
The objective of this assignment is to gain experience in developing Spark programs. As part of this assignment, you will be working with the MovieLens dataset that describe ratings and free-text tagging activities from MovieLens, a movie recommendation service. This dataset was created by GroupLens and primarily hosted at Kaggle. You will be using Apache Spark to implement this assignment. Additional details are available here.
CS555: HW4 Building a Distributed, Replicated, and Fault Tolerant File System: Contrasting Replication and Erasure Coding
The objective of this assignment is to build a distributed, failure-resilient file system. The fault tolerance for files is achieved using two techniques: replication and erasure coding. This assignment has several sub-items associated with it. Additional details are available here.
Reed Solomon [Erasure Coding] Jar File
Term Project & Paper: Scalable Distributed Analytics
[Group assignment: Teams of 2-3 CS455 students and 2 CS555 students]
As part of this assignment you will be doing a term project that involves using Apache Spark, TensorFlow, or PyTorch for performing analytics over 2 or more spatial datasets: a rich set of datasets is available at: https://urban-sustain.org . Your system or application should execute on a minimum of 10 machines. The problem should be data-intensive and/or compute-intensive. Additional details are available here.
[TP-D0] Friday, September 26th, 2025, @8:00 pm [Team composition]
[TP-D1] Friday, October 10th, 2025, @ 8:00 pm [Term Project Pitch Presentation Slides]
[TP-D2] Wednesday December 3rd, 2025 @ 8:00 pm [Software Submission]
[TP-D3] Friday, December 5th, 2025 @ 8:00 pm [Report]
[TP-D4] Friday, December 5th, 2025 @ 8:00 pm [PowerPoint Presentation]
|