The PIKE Group








Research Projects

$ Last generated: Wed Jan 22 10:34:19 2020 EST $

Current Projects

The Precision Learning (PL) project, supported by NSF, is the collaboration among Penn State, U. Arkansas, and WPI to study causal relationships among diverse factors between students (e.g., skills, interests, demographics) and learning medium (e.g., topic/length of videos, demographics of speakers) and develop: (1) a deep learning based video analysis algorithm to extract fine-grained metadata from videos, and (2) a fairness-aware personalized recommendation algorithm to matchmake students with right educational videos (as in the "Precision Medicine") with little discrimination and bias.
The Label Generation (LabGen) project, supported by NSF, attempts to address the settings where many real-world machine learning and deep learning applications come with only data with limited label information (i.e., a small amount of labeled data or no labeled data), and proposes three novel solutions: (1) labeled data generation with limited labeled data, (2) labeled data generation with weak supervision, and (3) labeled data generation with limited human involvement.
Supported by NSF, the BRUCE project will develop two tools to empower cybersecurity workforce: (1) a comprehensive vertical search engine for finding cybersecurity learning resources and extracting detailed metadata using fine-grained Named Entity Recognition (NER) methods; and (2) a fairness-aware recommendation engine between cybersecurity job postings and job seekers' skills and competencies that avoids discrimination and bias in its recommendation. In particular, the proposed research will design graph homomorphism based mapping algorithms between posted jobs and job seekers with varying structures and vocabulary beyond a simple syntactic/semantic matching.
This interdisciplinary Memory Illusion project (logo borrowed from here), partially supported by SSRI seed grant and NSF, aims to provide reliable information and improve people’s trust in what they read online using the techniques from machine learning and the theory from psychology. The project will advance state-of-the-art machine learning methods to model the psychological phenomenon known as memory illusion, which are memory errors that individuals make when remembering, interpreting and making inferences from past experiences and knowledge.
The SysFake (pronounced as "Cease Fake") project, partially supported by ICS seed grant, ORAU, NSF#1, and NSF#2, aims to develop effective computational techniques to detect and prevent fake news, develop educational materials and pedagogy to raise the awareness on fake news, and investigate ways to train humans better not to fall for fake news.
The REU Site: Machine Learning in Cybersecurity program at Penn State, sponsored by NSF, is a 10-week-long intensive summer research program for undergraduate students in US. The theme of Penn State program lies in the area between Machine Learning and Cybersecurity. The topics of research include fake news mitigation, cognitive minds to predict frauds, smart contract frauds, crowdsourcing and misbehavior, adversarial production-run failure diagnosis, astroturfing, privacy in conversation agents, etc. The program is part of many NSF CISE REU Site programs nationwide.
This NSF-sponsored SFS program (CyberCorps) at Penn State recruits and trains a small number of bright undergraduate and graduate students studying/researching in a broad range of cybersecurity areas, and aims to place them in federal or state government cybersecurity positions upon their graduation. College of IST currently leads the SFS program at Penn State.

Past Projects

The Likes-R-Us project, partially supported by IST seed grant and Samsung, aims to study various properties and latent relationships found among "Like" activities and the "Like" networks formed thereof. We study the questions involving the structures, influences, and contexts found in Like networks, and their influences and impacts toward users in social media.
The HUman-in-the-Loop Computation (HULC) project investigates novel problems and solutions to use both machines and human in harmony for computationally challenging tasks arising in Databases, Information Retrieval, Recommender Systems, and WWW domains. For instance, we develop novel algorithms to efficiently compute Top-k queries via crowdsourcing, or collaborative filtering techniques with the sparsity and cold start problem addressed by crowdsourcing.
The Social Circle project, partially supported by NSF and Microsoft, develops methods to: (1) detect the discrepancies between users' information sharing expectations and actual information disclosure; (2) design a user-centered yet computationally-efficient formal model of user privacy in social networks; and (3) develop a mechanism to effectively enforce privacy policies in the proposed model. In particular, this project develops a concept of Social Circles to model social network access within a Restricted Access and Limited Control framework.
The Samsung-sponsored SUM (Social User Mining) project aims to develop novel algorithmic solutions, working prototypes, and innovative applications for mining diverse demographic and profile information of users (e.g., gender, age, marital/parental status, home location, job, political opinions, religion) in social network sites such as Facebook, Twitter, and Foursquare.
In the MOP (Mobile-Optimized Page) project, we investigate techniques to improve the mobile-friendliness of web pages (i.e., whether pages are rendered well in mobile web browsers) in diverse contexts and applications.
The concept of Web Services has been recently proposed as a means to achieve the true Intelligent Semantic Web paradigm, but it still lacks of many necessary functionalities. In the Atherton project, in particular, we are interested in developing a new Web Services modeling methodology, Web Services generation and composition framework, and Web Services based applications.
The LeeDeo (LEarning and Educational viDEO search engine and digital library framework) project attempts to build web-scale academic video digital libraries using the search engine paradigm. In particular, we are developing techniques to automatically crawl and identify so-called academic videos, and to extract relevant video metadata therein.
The NSF-sponsored I-SEE project develops three creative learning modules in Second Life on topics in computer security: (1) a learning module to improve students' understanding of what can be done to help businesses protect themselves against obvious security threats; (2) a learning module to develop a deep understanding of the inner workings of complex security software and hardware; and (3) a learning module that promotes awareness and education of security issues using Second Life as a digital storytelling platform.
In the Data Linkage project, we re-visit the traditional record linkage problem to cope with novel challenges such as intricate interplay of match vs. merge steps, increased scalability, and agile adaptivility. In particular, as solutions to the challenges, we study four sub-problems: the googled linkage, parallel linkage, group linkage, and adaptive linkage problems.
Is a journal A in a field X better than a journal B in a field Y? Which venues are the best place to submit your papers? Currently, there is no universally agreed method to answer these interesting questions. In the AppleRank project, toward these questions, we aim at developing a novel framework that can rank bibliographic entities (termed as "apples") better.
In order to meet needs for flexible and efficient support of access control systems for the XML model, the L3 project explores how to leverage on techniques developed for relational access control systems.
In the QUAGGA project, we investigate research issues in the "Quality of Data" (QoD) to achieve clean databases. The existence of poor or erroneous data in databases causes the so-called Garbage-in, Garbage-out problem. As the sources of data become diverse, their formats become heterogeneous, and the volume of data grows rapidly, maintaining and improving the quality of such data gets harder. Therefore, in this project, we study database-centric data quality and cleaning problems.
The OpenDBLP project renovates the old DBLP system into a Web Services based on-line digital library, where not only human users but also software agents can issue queries to retrieve bibliographic information. As a killer application, the DBLP system itself was completely simulated with the Google flavour, too.