This project studies the inference problem in XML documents. First, we consider the problem of generating secure and free of semantic conflicts partial views from XML documents. The techniques to generate single-level DTDs for partial views are designed within the context of DTD-based multi-level security classification. This model defines and manipulates two graphs, a Minimum Semantic Conflict Graph (MSCG) and a Multi-Plane DTD Graph (MPG). MSCG contains all semantic relationships among the XML tags that must be preserved within any partial view. Intuitively, MSCG ensures the generated views will be free of semantic conflict. MPG captures the structural relationships among tags and their security classifications. Secure views can be generated from MPG0 (i.e., an MPG that does not have edges outside the targeted security space), by ignoring unauthorized security planes. A set of procedures is defined to restructure a general MPG into an MPG0 according to the corresponding MSCG. The core technical contribution to the field is the development of MSCG and MPG concepts and the procedures to reduce a general MPG to MPG0 (used to generate a secure and free of semantic conflicts partial view).

The second part of the project studies the vulnerability of distributed XML documents to inference attacks via duplicated data at different locations and under different formats. We show that ontologies represent a threat to information confidentiality. We present a secure solution for this problem using ontologies to detect security violations among distributed XML documents. We propose the Oxsegin architecture, an Ontology guided XML Security Engine, designed to detect illegal data inference via exploitation of ontologies. Oxsegin detects replicated information under different security classification and format within a collection of XML documents. The technical core of the paper is the development of the Probabilistic Inference Engine (PIM) used in Oxsegin. PIM operates on DTD files, corresponding to the XML documents, and ontology class-hierarchies to identify tags that might be involved in illegal inferences. Every illegal inference is marked with a security violation pointer. A confidence level coefficient attached to the security violation pointer measures the probability of the security breach. A user controlled XML data-level analysis is available to confirm detected violation pointers, and establish a maximum confidence level of one.

Third, we show that large collections of distributed XML documents are exposed to inference attacks through correlated data from different locations, under different formats and with different security classification. The inference attack uses custom build ontology to target confidential information. We develop a secure solution for this problem by designing an XML security engine, Oxsegin (Ontology guided XML Security Engine) with integrated access to an ontology module. The main technical contribution to the security field research is the methodology for using ontologies as an aid for a security engine. The Correlated Inference Procedure, designed to detect correlated information under different security classification and format, is the core of the Probabilistic Inference Module (PIM) within the security engine. PIM operates on DTD files, corresponding to the XML documents, and uses an ontology class-hierarchy to identify tags that might be part of potential security violation. Every potential illegal inference is marked with a security violation pointer. A confidence level coefficient attached to the security violation pointer measures the probability of the security breach. A user controlled XML data-level analysis may be further applied to confirm detected violation pointers with a maximum associated confidence level.

Publications:

Implementation: