Date of Award

December 2019

Degree Type


Degree Name

Doctor of Philosophy (PhD)


Electrical Engineering and Computer Science


Sucheta Soundarajan


Communities, Sampling, Social Networks

Subject Categories



One of the most important structures in social networks is communities. Understanding communities is useful in many applications, such as suggesting a friend for a user in an online friendship network, recommending a product for a user in an e-commerce network, etc. However, before studying anything about communities, researchers first need to collect appropriate data. Getting complete access to the data for community studies is unrealistic in most cases. In this work, we address the problem of crawling networks to identify community structure. Firstly, we present a network sampling technique to crawl the community structure of dynamic networks when there is a limitation on the number of nodes that can be queried. The process begins by obtaining a sample for the first-time step. In subsequent time steps, the crawling process is guided by community structure discoveries made in the past. Experiments conducted on the proposed approach and certain baseline techniques reveal the proposed approach has at least a 35% performance increase in cases when the total query budget is fixed over the entire period and at least an 8% increase in cases when the query budget is fixed per time step. Secondly, we propose a sampling technique to sample communities in node attributed edge streams when there is a limit on the maximum number of nodes that can be stored. The process learns if the nodal information can characterize communities. The nodal information is leveraged with the structural information to generate representative communities. If the nodal information does not characterize communities, only structural information is considered in assigning nodes to communities. The proposed approach provides a performance improvement of up to about 5 times that of baselines. Finally, we investigate factors that characterize the evolution of communities with respect to the number of active users. We perform this investigation on the Reddit social media platform. We begin by first analyzing individual conversations of one community and sees how that generalizes to other communities. The first community studied is Reddit’s changemyview. The changemyview community, in addition to its rich data source, has an interesting property where members whose view are changed award points to users that successfully changed their minds. From the changemyview community, we observe that the linguistic style and interactions of members of the community can significantly differentiate susceptible and non-susceptible users. Next, we examine other communities (subreddits), and investigate how the user behaviors observed from changemyview relate to patterns of community evolution. We learn that the linguistic style and interactions of members in a community can also significantly differentiate the different parts of the evolution of the community with respect to number of active users.


Open Access

Included in

Engineering Commons