Neuro-Symbolic Legal Knowledge Graph for LLM Reasoning

Master's thesis & ML research  ·  DiliTrust, Paris

A system that grounds LLM answers over complex enterprise legal data in an explicit knowledge graph, so responses come with traceable evidence and a visible reasoning path instead of unverifiable black-box output.

The core contribution is query-time (ephemeral) KG construction: rather than maintaining a massive pre-built graph, an LLM-driven agent decides what to retrieve from live source systems at query time and assembles a small, task-specific subgraph on the fly. This simultaneously attacks hallucination — the LLM reasons over structured, retrieved facts rather than loosely-matched text — and freshness, since answers are always built from live data. Validated through a three-way architecture comparison (ephemeral KG vs. pre-materialized KG vs. relational baseline) on answer correctness, faithfulness, and latency using LLM-as-judge evaluation (DeepEval).

The ETL pipeline pulls heterogeneous legal data through enterprise APIs into an embedded property graph (Kuzu/LadyBug): 700+ entities, 1,100+ typed edges spanning companies, individuals, governance bodies, ownership stakes, and contracts. Cross-domain linking enables questions that span previously siloed modules — e.g. "which contracts were signed by someone without active signing authority?" — as natural multi-hop graph traversals.

Governance-violation detection combines symbolic rules with the LLM layer: circular-ownership cycle detection and deontic compliance rules, operationalizing the Violation Situation Pattern (EKAW 2026) — each violation is a persistent, auditable graph object with a PROV-O provenance trail. The interactive explainer (above) lets users ask questions in natural language and watch answers unfold as an explorable D3 force-directed graph, with focusable nodes, attribute explosion, relationship expansion, a breadcrumb reasoning trail, and an expandable panel showing the written answer, reasoning trace, and graph query.

Python Kuzu / LadyBug Cypher LLM Agents DeepEval React 18 D3.js v7 RDF / SHACL / PROV-O

Fine-Grained Bird Species Classification

CentraleSupelec  ·  Kaggle Competition (CUB-200 subset)

Built a fine-grained image classification pipeline reaching 93% accuracy on a CUB-200 subset from a 55% baseline, using an EVA-02 Large Vision Transformer with domain-specific pretraining on iNaturalist. The performance gap over a standard ViT baseline comes primarily from pretraining domain alignment — iNaturalist bird imagery matches the CUB distribution far more closely than ImageNet.

The pipeline adds a two-stage detection front-end: YOLOv8 for initial localization, followed by Grounding DINO for 99%+ precision bounding boxes, ensuring the classifier sees tightly-cropped birds rather than noisy full-scene inputs. A 5-fold ensemble with test-time augmentation is combined with specialist binary classifiers trained for taxonomically confused species pairs, identified through systematic error analysis.

PyTorch EVA-02 Vision Transformer YOLOv8 Grounding DINO iNaturalist Fine-Grained Classification

arXiv Semantic Search App

A semantic search application for arXiv articles built to compare two vector database backends: PGVector (PostgreSQL vector extension) and ChromaDB. Users can run single natural-language queries against the arXiv dataset or enter a full benchmarking mode with manual or file-uploaded query batches, then visualize latency results through box plots, density distributions, and latency-over-time charts.

Embeddings are generated via TensorFlow Hub's Universal Sentence Encoder. The project also adds PGVector to the standard ann-benchmarks suite — extending the community benchmark with a relational vector DB that had not previously been included.

Streamlit PGVector ChromaDB PostgreSQL TensorFlow Hub Universal Sentence Encoder Python

ANN Benchmarks: Vector Database Performance Study

A fork and extension of the standard ann-benchmarks framework, adding PGVector and ChromaDB alongside the classical approximate nearest-neighbor libraries. Benchmarks ANN search across GloVe embeddings at four dimensionalities (25 / 50 / 100 / 200), using pre-split HDF5 datasets with ground-truth top-100 neighbors.

This forms the benchmarking backbone behind the arXiv Semantic Search App — the same PGVector vs. ChromaDB comparison at ANN level on standardized public embeddings provides a principled basis for the application-layer latency results.

Python Docker PGVector ChromaDB GloVe HDF5 ANN-Benchmarks