DL-Learner – A Framework for Inductive Learning on the Semantic Web
Authors: Lorenz Bühmann, Jens Lehmann, Patrick Westphal and Simon Bin
Keywords: System Description, Machine Learning, Supervised Learning, Semantic Web, OWL, RDF
The paper is an extended summary of the journal paper DL-Learner—A framework for inductive learning on the Semantic Web in the Journal of Web Semantics, Volume 39, 2016. In this system paper, we describe the DL-Learner framework. It is beneficial in various data and schema analytic tasks with applications in different standard machine learning scenarios, e.g. life sciences, as well as Semantic Web specific applications such as ontology learning and enrichment. Since its creation in 2007, it has become the main OWL and RDF-based software framework for supervised structured machine learning and includes several algorithm implementations, usage examples and has applications building on top of the framework.
Building knowledge maps of Web graphs
Authors: Valeria Fionda, Giuseppe Pirrò and Claudio Gutierrez
Keywords: Web Maps, Web Region, RDF, Web of Linked Data
We research the problem of building knowledge maps of graph-like information. We live in the digital era and similarly to the Earth, the Web is simply too large and its interrelations too complex for anyone to grasp much of it through direct observation. Thus, the problem of applying cartographic principles also to digital landscapes is intriguing. We introduce a mathematical formalism that captures the general notion of map of a graph and enables its development and manipulation in a semi-automated way. We describe an implementation of our formalism on the Web of Linked Data graph and discuss algorithms that efficiently generate and combine (via an algebra) regions and maps. Finally, we discuss examples of knowledge maps built with a tool implementing our framework.
Authors: Ulle Endriss and Umberto Grandi
Keywords: Social Choice Theory, Collective Rationality, Impossibility Theorems, Graph Theory, %Modal Logic, Preference Aggregation, Belief Merging, Consensus Clustering, Argumentation Theory, Social Networks
Graph aggregation is the process of computing a single output graph that constitutes a good compromise between several input graphs, each provided by a different source. One needs to perform graph aggregation in a wide variety of situations, e.g., when applying a voting rule (graphs as preference orders), when consolidating conflicting views regarding the relationships between arguments in a debate (graphs as abstract argumentation frameworks), or when computing a consensus between several alternative clusterings of a given dataset (graphs as equivalence relations). Other potential applications include belief merging, data integration, and social network analysis. In this short paper, we review a recently introduced formal framework for graph aggregation that is grounded in social choice theory. Our focus is on understanding which properties shared by the individual input graphs will transfer to the output graph returned by a given aggregation rule. Our main result is a powerful impossibility theorem that generalises Arrow’s seminal result regarding the aggregation of preference orders to a large collection of different types of graphs., We also provide a discussion of existing and potential applications of graph aggregation.
Comparing sampling methods analytically for graph size estimation
Authors: Jianguo Lu
Keywords: graph sampling, network size estimation, random node, random edge, random walk
This paper shows that random edge sampling outperforms random node sampling in graph size estimation, with a performance ratio proportional to the normalized graph degree variance. This result is particularly important in the era of big data, when data are typically large and scale-free, resulting in large degree variance. , We derive the result by first giving the variances of random node and random edge estimators. , A simpler and more intuitive result is obtained by assuming that the data is large and degree distribution follows a power law. , Most existing works compare sampling methods empirically, hence the conclusions are often data dependent. This is the first work that derives the result analytically.
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-based Features
Authors: Cataldo Musto, Pasquale Lops, Marco De Gemmis and Giovanni Semeraro
Keywords: Recommender Systems, Machine Learning, Linked Open Data
In this contribution we propose a hybrid recommendation framework based on classification algorithms such as Random Forests and Naive Bayes, which are fed with several heterogeneous groups of features. We split our features into two classes: classic features, as popularity-based, collaborative and content-based ones, and extended features gathered from the Linked Open Data (LOD) cloud, as basic ones (i.e. genre of a movie or the writer of a book) and graph-based features calculated on the ground of the different topological characteristics of the tripartite representation connecting users, items and properties in the LOD cloud. In the experimental session we evaluate the effectiveness of our framework on varying of different groups of features, and results show that both LOD-based and graph-based features positively affect the overall performance of the algorithm, especially in highly sparse recommendation scenarios. Our approach also outperforms several state-of-the-art recommendation techniques, thus confirming the insights behind this research., , This extended abstract summarizes the content of the journal paper published on Knowledge-based systems.
Processing Social Media Messages in Mass Emergency: Survey Summary
Authors: Muhammad Imran, Carlos Castillo, Fernando Diaz and Sarah Vieweg
Keywords: social media, disaster response, emergency management
Millions of people increasingly use social media to share information during disasters and mass emergencies. Information available on social media, particularly in the early hours of an event when few other sources are available, can be extremely valuable for emergency responders and decision makers, helping them gain situational awareness and plan relief efforts. Processing social media content to obtain such information, however, involves solving multiple challenges: parsing brief and informal messages, handling information overload, and prioritizing different types of information. These challenges can be mapped to classical information processing operations such as filtering, classifying, ranking, aggregating, extracting, and summarizing. This work highlights these challenges and presents state of the art computational techniques to deal with social media messages, focusing on their application to crisis scenarios.
Surviving the Web: A Journey into Web Session Security
Authors: Stefano Calzavara, Riccardo Focardi, Marco Squarcina and Mauro Tempesta
Keywords: Web sessions, HTTP cookies, web attacks, web defenses
We survey the most common attacks against web sessions, i.e., attacks which target honest web browser users establishing an authenticated session with a trusted web application. We then review existing security solutions which prevent or mitigate the different attacks, by evaluating them along four different axes: protection, usability, compatibility and ease of deployment. Based on this survey, we identify five guidelines that, to different extents, have been taken into account by the designers of the different proposals we reviewed. We believe that these guidelines can be helpful for the development of innovative solutions approaching web security in a more systematic and comprehensive way.
How to Assess and Rank User-Generated Content on Web?
Authors: Elaheh Momeni, Claire Cardie and Nicholas Diakopoulos
Keywords: User-generated Content, Online Media, Ranking, Quality Assessment
User-generated content (UGC) on the Web, especially on social media platforms, facilitates the association of additional information with digital resources and online social topics; so, it can provide valuable supplementary content. However, UGC varies in quality and, consequently, raises the challenge of how to maximize its utility for a variety of end-users, in particular in the age of misinformation. This study aims to provide researchers and Web data curators with comprehensive answers to the following questions: What are the existing approaches and methods for assessing and ranking UGC? What features and metrics have been used successfully to assess and predict UGC value across a range of application domains? This survey is composed of a systematic review of approaches for assessing and ranking UGC: results obtained by identifying and comparing methodologies within the context of short text-based UGC on the Web. This survey categorizes existing assessment and ranking approaches into four framework types and discusses the main contributions and considerations of each type. Furthermore, it suggests a need for further experimentation and encourages the development of new approaches for the assessment and ranking.
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowdsourcing
Authors: Maribel Acosta, Elena Simperl, Fabian Flöck and Maria Esther Vidal
Keywords: SPARQL, Query Processing, Completeness, RDF, Crowdsourcing
We propose HARE, a SPARQL query engine that encompasses human-machine query processing to augment the completeness of query answers. , We empirically assessed the effectiveness of HARE on 50 SPARQL queries over DBpedia. , Experimental results clearly show that our solution accurately enhances answer completeness.
Presenting and Preserving the Change in Taxonomic Knowledge for Linked Data
Authors: Rathachai Chawuthai, Hideaki Takeda, Vilas Wuwongse and Utsugi Jinbo
Keywords: Biodiversity Informatics, Change in Taxonomy, Knowledge Representation, Knowledge Exchange, Linked Data, Ontology, RDF, Semantic Web, Taxonomic Data
Linked Open Data (LOD) technology enables web of data and exchangeable knowledge graphs through the Internet. However, the change in knowledge is happened everywhere and every time, and it becomes a challenging issue of linking data precisely because the misinterpretation and misunderstanding of some terms and concepts may be dissimilar under different context of time and different community knowledge. To solve this issue, we introduce an approach to the preservation of knowledge graph, and we select the biodiversity domain to be our case studies because knowledge of this domain is commonly changed and all changes are clearly documented. Our work produces an ontology, transformation rules, and an application to demonstrate that it is feasible to present and preserve knowledge graphs and provides open and accurate access to linked data. It covers changes in names and their relationships from different time and communities as can be seen in the cases of taxonomic knowledge.
COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution
Authors: Mehrdad Farajtabar, Manuel Gomez-Rodriguez, Yichen Wang, Shuang Li, Hongyuan Zha and Le Song
Keywords: Network Structure, Information Diffusion, Coevolution, Point Process, Hawkes Process, Survival Analysis
Information diffusion in online social networks is affected by the underlying network topology, but it also has the power to change it. Online users are constantly creating new links when they are exposed to new information sources, and in turn these links are alternating the way information spreads. However, these two highly intertwined stochastic processes – information diffusion and network evolution – have been typically studied separately, ignoring their co-evolutionary dynamics. , In this work, we propose a temporal point process model, COEVOLVE, for such joint dynamics, allowing the intensity of one process to be modulated by that of the other. The model allows us to efficiently simulate interleaved diffusion and network events, and generate traces obeying common diffusion and network patterns observed in real-world networks. Moreover, we develop a convex optimization framework to learn the parameters of the model from historical diffusion and network evolution traces. Experiments in both synthetic data and real data gathered from Twitter show that our model provides a good fit to the data as well as more accurate predictions than alternatives.
Joint Label Inference in Networks
Authors: Deepayan Chakrabarti, Stanislav Funiak, Jonathan Chang and Sofus Macskassy
Keywords: semi-supervised, label propagation, label inference
We consider the problem of inferring node labels in a partially labeled graph where each node in the graph has multiple label types and each label type has a large number of possible labels. Existing approaches such as Label Propagation fail to consider interactions between the label types. Our proposed method, called EdgeExplain, explicitly models these interactions, while still allowing scalable inference under a distributed message-passing architecture. On a large subset of the Facebook social network, collected in a previous study, EdgeExplain outperforms label propagation for several label types, with lifts of up to 120% for recall@1 and 60% for recall@3.