Question Answering on Linked Open Data: Past, Present, and Future

Friday, February 21, 2020 - 10:15 am
Innovation Center, Room 2277
Speaker: Dr. Saeedeh Shekarpour Affiliation: University of Dayton Location: Innovation Center, Room 2277 Time: Friday 02/21/2020 (10:15 - 11:15am) Question Answering (QA) systems are becoming an inspiring model for the future of search engines. While, recently, datasets underlying QA systems have been promoted from unstructured datasets to structured datasets with semantically highly enriched metadata, question answering systems are still facing serious challenges and are therefore not meeting users’ expectations. Especially, question answering over interlinked data sources is raising new challenges due to two inherent characteristics. First, different datasets employ heterogeneous schemas and each one may only contain a part of the answer to a certain question. Second, constructing a federated formal query across different datasets requires exploiting links between them on both the schema and instance levels. This talk will be conducted in the three following directions:
  1. SINA strategies for addressing the salient challenges of QA systems [1,2,3]
  2. Re-engineering Question Answering Systems [4,5,6]
  3. A look to the future
Minor Part (20 minutes) This part covers the other research directions which I am involving, such as the challenges related to misinformation, information extraction, social media analytics. Saeedeh Shekarpour is an assistant professor at the University of Dayton, Ohio. She accomplished her Ph.D. at the University of Bonn in Germany. Saeedeh also spent one year as a postdoctoral researcher in the EIS research group at the Bonn University and two years as a postdoctoral researcher at Knoesis research center. She is passionate to conduct advanced research in the following fields: (i) Knowledge representation in AI technologies, such as Question Answering, chatbot technology (ii) Information Disorder (fake news, harassing language) (iii) Ontology Development, (iv) Text and knowledge Analytics. She successfully published her research results in the most renowned conferences and journals of her field including the World Wide Web, AAAI, NAACL, Web Intelligence conference, Web science, Journal of Web Semantics, Semantic Web Journal, and PLOS ONE.

Getting Started in Open Source: ACM Student Group

Thursday, February 6, 2020 - 07:00 pm
Swearingen Engineering Center, room 2A17
If you aren't familiar, ACM is one of Computer Science's oldest and most established professional organization. Our university ACM chapter primarily hosts weekly student talks by undergrads and graduate students in CSE. This semester these will usually be held on Thursday nights at 7 in the CSE Student Lounge (SWGN 2a17) We also do a lot else, so consider joining our mailing list (email kennethj@email.sc.edu) or checking out our website: http://acm.cse.sc.edu While most students that come to our meetings are in CSE, we welcome students from across the university to attend. There is no expectation of your time, so come as often or as little as you wish. Furthermore, while some talks will certainly be more advanced than others, we believe there is something of interest in every presentation for all levels of technical experience. Only an interest in computing is required to receive your free pizza! This week we will be returning to our normal cadence with a presentation by undergraduate Josh Nelson. There will be FREE PIZZA from 7-7:15 before Josh's talk begins. The details: --------------------------------------------------- "Getting Started in Open Source" Thursday, February 6th, 2020 7:00-8:15pm Swearingen Engineering Center, room 2A17 FREE PIZZA All majors and backgrounds, both technical or otherwise, welcome! ------------------------------------------------------- Josh will cover the key components of getting started - contributing- to open source projects. This includes: - finding a project - deciding what to work on - communicating with the maintainers - making your first pull request (PR) Josh will use his real PRs and projects as examples of open source best practices (and not-so-best practices). There will be a discussion of how to use common open-source tools like git, markdown, continuous integration (CI) providers, and unit tests as they pertain to contributing to a project. If time permits, we can help attendees find projects they'd be interested in contributing to. While this talk caters to beginners, there is surely something of interest for programmers of all skill levels. Come with questions (and bring a friend)! Information about the talk can be found on our website at https://acm.cse.sc.edu/events/2020-02-06.html. In other news... Be on the lookout for more information about our Spring 2020 ACM Code-a-Thon! Please note that information is still being finalized at this time. I'll send a standalone email out as an announcement with all the details as soon as we divine them ourselves. Details will be posted to the website as well: https://acm.cse.sc.edu/events/2020-code-a-thon.html. As always, please feel free to reach out to me with any questions! (Just respond to this email to get in touch.) Best, Kenneth Johnson ACM Communications Chair

Parsimonius Sociology Theory Construction: From A Computational Framework To Semantic-Based Parsimony Analysis

Wednesday, February 5, 2020 - 03:00 pm
Meeting Room 2267, Innovation Center
Author : Mingzhe Du Advisor : Dr. Jose Vidal Date : Feb 5, 2020 Time : 3 pm Place : Meeting Room 2267, Innovation Center Abstract In social sciences, theories are used to explain and predict observed phenomena in the natural world. Theory construction is the research process of building testable scientific theories to explain and predict observed phenomena in the natural world. The conceptual new ideas and meanings of theories are conveyed through carefully chosen definitions and terms. The principle of parsimony, an important criterion for evaluating the quality of theories (e.g., as exemplified by Occam's Razor) mandates that we minimize the number of definitions (terms) used in a given theory. Conventional methods for theory construction and parsimony analysis are based on the heuristic approaches. However, it is not always easy for young researchers to fully understand the theoretical work in a given area because of the problem of ``tacit knowledge'', which often makes results lack coherence and logical integrity. In this research, we propose to help with this problem in three parts. In particular, for the first part of this study, we present Wikitheoria, a generic knowledge aggregation framework to facilitate the parsimonious approach of theory construction with a cloud-based theory modularization platform and semantic-based algorithms to minimize the number of definitions. The presented approach is demonstrated and evaluated using the modularized theories from the database and sociological definitions retrieved from the system lexicon and sociological literature. This study proves the effectiveness of using the cloud-based knowledge aggregation system and semantic analysis models for promoting the parsimonious sociology theory construction. In the second part, our study is focused on semantic-based parsimony analysis. We introduce an embedding-based approach using machine learning models to reduce the semantically similar sociological definitions, where definitions are encoded with word embeddings and sentence embeddings. Given several types of embeddings exist, we compare the definition's encodings with the goal of understanding what embeddings are more suitable for knowledge representation, and what classifiers are more capable for capturing semantic similarity in the task of parsimonious theory construction. In the final part of this study, we propose SOREC, a novel semantic content-based recommendation system with supervised machine learning model for theoretical parsimony evaluation by checking the semantic consistency of definitions while constructing theories. Specifically, we evaluate the XGBoost tree-based classifier with the combination of low-level features and high-level features on our dataset. The proposed CBRS substantially outperforms conventional matrix factorization-based CBRS in suggesting semantically related sociological definitions. In this study, we provide a solid baseline for future studies in the research area of sociological definition semantic similarity computation. Moreover, theory construction is a common research process in a lot of human science-related disciplines such as psychology, criminology, and other social sciences. The results of this study can be further applied to the theory construction in these disciplines.

Ensembles of Many Weak Defenses are Strong: Defending Deep Neural Networks Against Adversarial Attacks

Friday, December 6, 2019 - 02:20 pm
Storey Innovation Center (Room 1400)
Ying Meng and Jianhai Su Abstract: Despite achieving state-of-the-art performance across many domains, deep neural networks (DNN) are highly vulnerable to subtle adversarial perturbations. Defense approaches have been proposed in recent years, many of which have been shown inefficient by researchers. Early study suggests that ensembles created by combining multiple weak defenses are still weak. However, we observe that it is possible to construct efficient ensembles using many weak defenses. In this work, we implement and present 5 strategies to construct efficient ensembles from many (possibly weak) defenses that comprise transforming the inputs (e.g. rotation, shifting, noising, denoising, and many more) before feeding them to the classifier. We test our ensembles with adversarial examples generated by various adversaries (27 sets generated by 9 different adversarial attack methods, such as FGSM, JSMA, One-Pixel, etc.) on MNIST and investigate the factors that may impact the effectiveness of an ensemble model. We evaluate our ensembles via 4 threat models (i.e., white-box, gray-box, black-box, and zero-knowledge attacks). Also, we study and attempt to explain, empirically, how a transformation blocks perturbations generated by an adversary.

An Overlay Architecture for Pattern Matching

Monday, November 25, 2019 - 09:30 am
Meeting Room 2265, Innovation Center
DISSERTATION DEFENSE Department of Computer Science and Engineering University of South Carolina Author : Rasha Karakchi Advisor : Dr. Bakos Date : Nov 25th, 2019 Time : 9:30 am Place : Meeting Room 2265, Innovation Center Abstract Deterministic and Non-deterministic Finite Automata (DFA and NFA) comprise the fundamental unit of work for many emerging big data applications, motivating recent efforts to develop Domain-Specific Architectures (DSAs) to exploit fine-grain parallelism available in automata workloads. This dissertation presents NAPOLY (Non-Deterministic Automata Processor OverLaY), an overlay architecture and associated software that attempt to maximally exploit on-chip memory parallelism for NFA evaluation. In order to avoid an upper bound in NFA size that commonly affects prior efforts, NAPOLY is optimized for runtime reconfiguration, allowing for full reconfiguration in 10s of microseconds. NAPOLY is also parameterizable, allowing for offline generation of repertoire of overlay configurations with various trade-offs between state capacity and transition capacity. In this dissertation, we evaluate NAPOLY on automata applications packaged in ANMLZoo benchmarks using our proposed state mapping heuristic and off-shelf SAT solver. We compare NAPOLY's performance against existing CPU and GPU implementations. The results show NAPOLY performs best for larger benchmarks with more active states and high report frequency. NAPOLY outperforms in 10 out of 12 benchmark suite to the best of state-of-the-art CPU and GPU implementations. To the best of our knowledge, this is the first example of a runtime-reprogrammable FPGA-based automata processor overlay.

Building AI, with Trust

Monday, November 18, 2019 - 10:15 am
Storey Innovation Center (Room 2277)
Presentation Topics/ Keywords: Artificial Intelligence, Trust, Rating Services, Real World Applications, Conversation Agents Speaker: Dr. Biplav Srivastava, Distinguished Data Scientist and Master Inventor, IBM ACM Distinguished Scientist, ACM Distinguished Speaker, IEEE Senior Member Monday November 18, Storey Innovation Center (Room 2277) from 10:15 am - 11:15 am. Abstract: Artificial Intelligence (AI), a well-established sub-discipline of computer science, is considered as a key technology to address society's pressing challenges in areas as diverse as environment, health, finance and city services. However, early AI adoption has also raised issues like whether humans can trust the system and its output is fair. In this talk, I will discuss our recent work on two promising AI technologies: market intelligence using online product reviews and data exploration using conversation agents (chatbots). Then, I will describe our novel idea of rating AI services for trust as a third-party service and how it can help AI users, developers, business leaders and regulators make better decisions. The talk will conclude with a perspective on open and collaborative multi-disciplinary innovations. Bio: Dr. Biplav Srivastava is presently a Distinguished Data Scientist and Master Inventor at IBM's Chief Analytics Office. With over two decades of research experience in Artificial Intelligence, Services Computing and Sustainability, most of which was at IBM Research, Biplav is also an ACM Distinguished Scientist and Distinguished Speaker, and IEEE Senior Member. Biplav mostly works with open data, APIs and AI-based analytics to create decision-support tools. In AI, his focus is on promoting goal-oriented, ethical, human-machine collaboration via natural interfaces using domain and user models, learning and planning. He applies these techniques in areas of social as well as commercial relevance with focus for developing countries (e.g., transportation, health and governance). Biplav’s work has lead to many science firsts and high-impact commercial innovations ($B+), 150+ papers and 50+ US patents issued, and awards for papers, demos and hacks. He has interacted with commercial customers, universities and governments, been at standard bodies, and assisted business leaders on technical issues. More details about him are at: http://researcher.watson.ibm.com/researcher/view.php?person=us-biplavs

Dynamo: Low latency data distribution from database to servers

Friday, November 15, 2019 - 02:20 pm
Innovation Center, Room 1400
Speaker: Prunthaban Kanthakumar Affiliation: Google Location: Innovation Center, Room 1400 Time: Friday 11/15/2019 (2:20 - 3:10pm) Abstract: Many applications at Google are structured with source of truth stored in a transactional database and its data being required by servers distributed world-wide. For efficient and fast computation servers store this data in memory. Further, the database is changing continuously and we need to update the in-memory view of these large number of servers in real-time. For example, in Google Search Ads application we have Advertisers configuration stored in a database and for computing Ads in a fast and scalable manner this data is loaded in memory of various servers spread world-wide. In this talk, we describe our solution to this data distribution problem and the challenges that we encountered in providing a highly reliable and low latency service. Bio: Technical Lead Manager of a critical infrastructure team within Google Search Ads responsible for building a system that performs large scale data extraction & transformation from database and reliably distribute it to servers globally in near real time.

Data Analysis For Insider’s Misuse Detection

Tuesday, November 12, 2019 - 03:00 pm
Meeting Room 2267, Innovation Center
DISSERTATION DEFENSE Department of Computer Science and Engineering University of South Carolina Author : Ahmed Saaudi Advisor : Dr. Farkas Date : Nov 12th, 2019 Time : 3:00 pm Place : Meeting Room 2267, Innovation Center Abstract Malicious insiders increasingly affect organizations by leaking classified data to unauthorized entities. Detecting insiders’ misuses in computer systems is a challenging problem. In this dissertation, we propose two approaches to detect such threats: a probabilistic graphical model-based approach and a deep learning-based approach. We investigate the logs of computer-based activities to discover patterns of misuse. We model user’s behaviors as sequences of computer-based events. For our probabilistic graphical model-based approach, we propose an unsupervised model for insider’s misuse detection. That is, we develop Stochastic Gradient Descent method to learn Hidden Markov Models (SGD-HMM) with the goal of analyzing user log data. We propose the use of varying granularity levels to represent users’ log data: Session-based, Day-based, and Week-based. A user’s normal behavior is modeled using SGD-HMM. The model is used to detect any deviation from the normal behavior. We also propose a Sliding Window Technique (SWT) to identify malicious activity by considering the near history of the user’s activities. We evaluate the experimental results in terms of Receiver Operating Characteristic (ROC). The area under the curve (AUC) represents the model’s performance with respect to the separability of the classes. The higher the AUC scores, the better the model’s performance. Combining SGD-HMM with SWT resulted in AUC values between 0.81 and 0.9 based on the window size. Our solution is superior to current solutions based on the achieved AUC scores. For our deep learning-based approach, we propose a supervised model for insider’s misuse detection. We present our solution using natural language processing with deep learning. We examine textual event logs to investigate the semantic meaning behind a user’s behavior. The proposed approaches consist of character embeddings and deep learning networks that involve Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM). We develop three deep-learning models: CNN, LSTM, and CNN-LSTM. We run a 10-fold subject-independent cross-validation procedure to evaluate the developed models. Moreover, we use our proposed approach to investigate networks with deeper and wider structures. For this, we study the impact of increasing the number of CNN or LSTM layers, nodes per layer, and both of them at the same time on the model performance. Our deep learning-based approach shows promising behavior. The first model, CNN, presents the best performance of classifying normal samples with an AUC score of 0.88, false-negative rate 32%, and 8% false-positive rate. The second model, LSTM, shows the best performance of detecting malicious samples with an AUC score of 0.873, false-negative rate 0%, and 37% false-positive rate. The third model, CNN-LSTM, presents a moderate behavior of detecting both normal and insider samples with an AUC score of 0.862, false-negative rate 16%, and 17% false-positive rate. Our results indicate that machine learning approaches can be effectively deployed to detect insiders’ misuse. However, it is difficult to obtain labeled data. Furthermore, the high presence of normal behavior and limited misuse activities create a highly unbalanced data set. This impacts the performance of our models.

My Journey in Artificial Intelligence

Thursday, November 7, 2019 - 07:00 pm
Gresette Room, Harper College, 3rd floor
Dr. Marco Valtorta is a professor of Computer Science and Engineering in the College of Engineering and Computing at the University of South Carolina. He received a laurea degree with highest honors in electrical engineering from the Politecnico di Milano, Milan, Italy, in 1980. After his graduate work in Computer Science at Duke University, he joined the Commission of the European Communities in Brussels, Belgium, where he worked as a project officer for the European Strategic Programme in Information Technologies from 1985 to 1988. In August 1988, he joined the faculty at UofSC in the Department of Computer Science, where he primarily does research in Artificial Intelligence. His first research result, known as “Valtorta’s theorem” and obtained in 1980, was recently (2011) described as “seminal” and “an important theoretical limit of usefulness” for heuristics computed by search in an abstracted problem space. Most of his more recent research has been in the area of uncertainty in artificial intelligence. His proof with graduate student Vimin Huang of the completeness of Pearl’s do-calculus of intervention in 2006 settled a 13-year old conjecture. His students have been best paper award winners at the Conference on Uncertainty in Artificial Intelligence (1993, 2006) and the International Conference on Information Quality (2006). He was undergraduate director for the Department of Computer Science from 1993 to 1999. He was awarded the College of Science and Mathematics Outstanding Advisor Award in 1997. In addition to his teaching and research activity, he has served in numerous service capacities at the departmental (e.g., chair of the tenure and promotion committee and of the colloquium committee), college (e.g ., College of Engineering and Computing scholarship committee), and university level (e.g., faculty senator, committee on curricula and courses, committee on instructional development, university committee on tenure and promotion). In April 2016, Valtorta was elected chair of the university faculty senate and became chair in August 2017 for a two-year term.

Development of a national-scale Big data analytics pipeline to study the potential impacts of flooding on critical infrastructure and communities

Thursday, November 7, 2019 - 02:00 pm
Meeting Room 2267, Innovation Center
DISSERTATION DEFENSE Department of Computer Science and Engineering University of South Carolina Author : Nattapon Donratanapat Advisor : Dr. Jose Vidal and Dr. Vidya Samadi Date : Nov 7th , 2019 Time : 2:00 pm Place : Meeting Room 2267, Innovation Center Abstract With the rapid development of the Internet and mobile devices, crowdsourcing techniques have emerged to facilitate data processing and problem solving particularly for flood emergences purposes. We developed Flood Analytics Information System (FAIS) application as a python interface to gather Big data from multiple servers and analyze flood hazards in real time. The interface uses crowd intelligence and machine learning to provide flood warning and river level information, and natural language processing of tweets during flooding events, with the aim to improve situational awareness for flood risk managers and other stakeholders. We demonstrated and tested FAIS across Lower PeeDee Basin in the Carolinas where Hurricane Florence made extensive damages and disruption. Our research aim was to develop and test an integrated solution based on real time Big data for stakeholder map-based dashboard visualizations that can be applicable to other countries and a range of weather-driven emergency situations. The application allows the user to submit search request from USGS and Twitter through criteria, which is used to modify request URL sent to data sources. The prototype successfully identifies a dynamic set of at-risk areas using web-based river level and flood warning API sources. The list of prioritized areas can be updated every 15 minutes, as the environmental information and condition change.