UE19CS322: Big Data | Aditeya Baral

Instructor: KV Subramaniam, Prafullata K Auradkar, Animesh Giri

Term: Fall

Location: PES University, Bengaluru

Role

Teaching Assistant under KV Subramaniam, Prafullata K Auradkar, and Animesh Giri at PES University during the Fall 2021 semester. Part of a four-person TA team alongside Vishesh Purnananda, Ansh Sarkar, and Pavan Appanna.

Responsibilities

Designed and graded 4 assignments and 2 projects for 600+ enrolled students
Built and maintained a custom submission portal with automated evaluation pipelines
Delivered hands-on lab sessions on Hadoop and Spark setup and configuration
Conducted beta testing phases and managed changelogs for assignment releases

Course Website

github.com/Cloud-Computing-Big-Data/cloud-computing-big-data.github.io

The original course website is permanently down. This page and its linked materials are an archive of the last released version of the course content, preserved here for reference. No further updates will be made.

Schedule

Date	Topic	Materials
Sep 2	A0: Word Count using MapReduce Ungraded mandatory assignment to verify Hadoop installation. Students ran a MapReduce job to count word occurrences in Alice in Wonderland by Lewis Carroll.	Assignment Spec Hadoop Installation Guide
Sep 2	A1: Analysis of US Road Accident Data 2-task MapReduce assignment analyzing a large US road accident dataset. Tasks involved record count per hour and multi-attribute filtering with sorted output.	Assignment Spec
Oct 9	A2: PageRank with Graph Embeddings for Wikipedia 2-task iterative MapReduce assignment implementing a modified PageRank on an English Wikipedia hyperlink network (2013 snapshot, SNAP). Incorporated cosine similarity between page embeddings into rank contribution.	Assignment Spec
Oct 25	A3: Earth Surface Temperature Analysis using Spark 2-task Spark DataFrame assignment analyzing Berkeley Earth surface temperature data (1750–present). Tasks involved city-level and country-level temperature comparisons against global averages.	Assignment Spec Spark Installation Guide
Nov 1	Project 1: Yet Another Hadoop (YAH) Students built a mini-HDFS from scratch in Python, replicating the NameNode/DataNode architecture with data replication and Hadoop-style job scheduling.	Project Spec
Nov 1	Project 2: Machine Learning with Spark Streaming Students applied incremental machine learning on large data streams using PySpark MLlib and Spark Streaming, simulating real-world constrained batch processing scenarios.	Project Spec