UE19CS322: Big Data
Undergraduate course covering distributed computing, large-scale data processing, and big data frameworks including Hadoop and Spark. Topics included MapReduce, HDFS, iterative graph algorithms, Spark DataFrames, and Spark Streaming with MLlib.
Instructor: Dr. KV Subramaniam, Dr. Prafullata K Auradkar, Prof. Animesh Giri
Term: Fall
Location: PES University, Bengaluru
Role
Teaching Assistant under Dr. KV Subramaniam, Dr. Prafullata K Auradkar, and Prof. Animesh Giri at PES University during the Fall 2021 semester. Part of a four-person TA team alongside Vishesh Purnananda, Ansh Sarkar, and Pavan Appanna.
Responsibilities
- Designed and graded 4 assignments and 2 projects for 600+ enrolled students
- Built and maintained a custom submission portal with automated evaluation pipelines
- Delivered hands-on lab sessions on Hadoop and Spark setup and configuration
- Conducted beta testing phases and managed changelogs for assignment releases
Course Website
github.com/Cloud-Computing-Big-Data/cloud-computing-big-data.github.io
The original course website is permanently down. This page and its linked materials are an archive of the last released version of the course content, preserved here for reference. No further updates will be made.
Schedule
| Date | Topic | Materials |
|---|---|---|
| Sep 2 | A0: Word Count using MapReduce Ungraded mandatory assignment to verify Hadoop installation. Students ran a MapReduce job to count word occurrences in Alice in Wonderland by Lewis Carroll. | |
| Sep 2 | A1: Analysis of US Road Accident Data 2-task MapReduce assignment analyzing a large US road accident dataset. Tasks involved record count per hour and multi-attribute filtering with sorted output. | |
| Oct 9 | A2: PageRank with Graph Embeddings for Wikipedia 2-task iterative MapReduce assignment implementing a modified PageRank on an English Wikipedia hyperlink network (2013 snapshot, SNAP). Incorporated cosine similarity between page embeddings into rank contribution. | |
| Oct 25 | A3: Earth Surface Temperature Analysis using Spark 2-task Spark DataFrame assignment analyzing Berkeley Earth surface temperature data (1750–present). Tasks involved city-level and country-level temperature comparisons against global averages. | |
| Nov 1 | Project 1: Yet Another Hadoop (YAH) Students built a mini-HDFS from scratch in Python, replicating the NameNode/DataNode architecture with data replication and Hadoop-style job scheduling. | |
| Nov 1 | Project 2: Machine Learning with Spark Streaming Students applied incremental machine learning on large data streams using PySpark MLlib and Spark Streaming, simulating real-world constrained batch processing scenarios. |