Himel Dev web: [Link] himeldev@gmail.
com
Statement of Purpose for Computer Science Ph.D. Program
The Harvard Magazine article, ‘Why Big Data Is a Big Deal’, featuring Professor Gary King’s views on big data
is an interesting read. I agree with Professor King’s view that the big data revolution is not about revolutionary
quantity of data, rather it’s about the revolutionary things that we can do with it. My scholarship is also based
on such beliefs around the management of big data, particularly focusing on designing improved algorithms for
making sense out of data. I feel that the big data revolution is regularly exposing the realm of data management
to formidable challenges of demanding queries. Some of these queries are primitive though, reappearing with
one or more added challenges such as scalability and privacy, whereas others are new queries emerging from the
availability of large, complex, modern datasets. For example, the classical graph queries such as the shortest
path queries are reappearing in various domains (e.g., road networks, social networks) with latency and privacy
challenges. Again, the availability of heterogeneous interaction data from social networks is regularly giving
birth to newer queries such as specific event and associated member detection. My research objective is to answer
such challenging queries to uncover new opportunities for a better human life.
Research Background: Inspired by practical applications such as load balancing and friend recommen-
dation, I worked on the problem of ‘Interactive Community Detection’ in my undergraduate thesis. I proposed
a novel community detection technique, which is the first one to consider the structure of the social network
and interactions among the users while detecting communities. Based on this work, I published a full paper at
DASFAA 2014, within six months of my thesis proposal. Later on, I realized that community detection from raw
data of a social network might reveal much sensitive information of the involved parties, such as how much a
user is involved in which communities. To address this issue, I incorporated privacy challenge in the commu-
nity detection problem to formulate a novel problem, ‘Privacy Preserving Community Detection’. To resolve the
problem, I proposed a method to generate privacy preserving versions of a social graph, which can be used to
identify contemporary communities analogous to the native communities in the original graph. Based on the
initial outcome of this work and my prior work on community detection, I published two posters at SIGMOD
2014. In addition, for my thesis on community detection and privacy, I have won the Undergraduate Awards
(2014) in the Computer Science category.
During my undergraduate thesis, I have been highly benefited from my early research expedition, which
started long before my thesis; in particular, as a junior student working on cloud data management. I identified
a pivotal privacy problem in single provider cloud architecture, concerning the viability of a provider to mine
a comprehensive client data for extracting sensitive information. To tackle this problem, I proposed a multi-
provider distributed approach that involves (i) splitting data into fragments, where individual fragments cannot
leak meaningful information, and (ii) distributing these fragments among multiple providers. Based on this
work, I published a full paper at SC’12 Companion, which has received 10 citations till now. In addition, I
developed a system that keeps track of computational provenance records in the cloud, and later published the
outcome at DASFAA’14 Companion. While working on these cloud data problems, I came across several active
research issues including load-balancing, which in turn led me to the problem of community detection and thus
conceptualized my thesis. More importantly, these early research experiences bolstered my confidence in doing
independent research and helped me to realize my broad research interest in data management.
Upon realizing my research interest, I worked on some trendy sub-areas of data management such as crowd-
sourcing. I developed a real-time crowd-powered testbed that allows a social media user to get a real-time eval-
uation of a proposed social media post. While developing this system, I experienced that some problems are
better solved by combining human and machine intelligence. I published the primary outcome of this work as
an extended abstract at CHI 2014 and the complete work is currently under review at IUI 2015.
My early stage research works have allowed me to attend reputed conferences like SIGMOD and DASFAA as
an undergraduate student. I have presented my works in these conferences, participated in numerous discussions
with the faculties and graduate students from universities around the world, asked many questions and answered
some. I have also participated in the prestigious Heidelberg Laureate Forum (HLF), where I got the opportunity
to interact with the Abel, Fields and Turing laureates, and received some stimulating research directions. I
strongly believe, attending such wonderful events have bestowed me with invaluable experiences, which later
culminated in a research vision.
Research Vision: My experience so far confirms that I enjoy research, particularly in the field of data
management, and my vision is based on this corroboration. My vision is to actively contribute to the research and
advancement of cutting edge tools and techniques in the data management field, more specifically in the field of
resolving queries. These queries can be either primitive queries with newer challenges or absolutely new queries;
either way, I expect them to have many real-world applications. Again, I do not have any reservation concerning
Himel Dev web: [Link] himeldev@[Link]
the crowd-powered processing of queries, as long as they serve the intended purpose. One such problem is
crowd-powered processing of personalized community detection/verification queries in social networks. Though
community detection is a well-studied problem, there is little work on personalized community detection that
involves identifying a particular community satisfying a set of topological, topical and other (e.g., subjective)
constraints. For example, consider that a jersey store is interested in identifying a group of friends supporting
the Illinois Fighting Illini football team to present them a discount offer on buying Fighting Illini jerseys in
bulk. The state-of-the-art community detection techniques cannot identify such a group as mixed (topological,
topical and subjective) constraints are involved. Again, applying a sequence of methods such as topical filtering
and topological community detection may just pull through this case, however will fail in case of queries with
more sophisticated subjective constraints. A more evolved and natural response to such queries can be crowd-
processing. Human workers can easily deal with the subjective constraints and thus can serve as effective tools for
resolving such personalized queries. This crowd-powered solution approach uncovers a number of research issues
such as how to efficiently divide the problem into unit operations and combine the results of these operations
[optimization issues]; how to verify the results of individual operations and the final result [verification issues]
etc. I believe, exploring these issues can be very interesting and lead to a promising solution to this problem.
Having spent last couple of years in exploring the domain of data management, I am now interested in pursu-
ing doctoral studies, as it would allow me to do more focused research. When I was looking for prospective Ph.D.
programs, the Department of Computer Science at the University of Illinois at Urbana-Champaign has stood out
with its amazing array of research opportunities offered in the field of data management. I am highly motivated
by Professor Jiawei Han’s research on information network analysis. His works on link based data mining have
been a source of inspiration. I am familiar with Professor Aditya Parameswaran’s works on crowd-powered query
processing. His recent works on automating the recommendation of data visualizations strongly match my re-
search interest. I also had the opportunity to go through some of Professor Kevin C. Chang’s works on network
mining over the web and social media, which strongly match my research interest as well. The recent research
projects of these faculties are very intriguing and seem to be opening many sub-areas of research that I would
love to explore. For these reasons, I have applied to the Computer Science Ph.D. program at the University of
Illinois at Urbana-Champaign.