Modeling Archaeological Knowledge as a Graph Database

Divakar Kumar included in categories Neo4j Graph Database Azure and series Archaios

2025-10-05 2025-10-05 2188 words 11 minutes

https://dev-to-uploads.s3.amazonaws.com/uploads/articles/058m4iyme4odyhkszsz1.png

Series -

Contents

1. Why Graph Databases for Archaeological Data?

Archaeological discovery isn’t just about storing data — it’s about understanding relationships:

A site HAS_COMPONENT (hillshade imagery, NDVI data, agent analysis)
A user CREATED a site AT a specific timestamp
A site BELONGS_TO a category (settlement, burial, defensive)
An analysis DETECTED_FEATURE with a confidence score
A feature HAS_COORDINATES at a specific location

Traditional relational databases force you to model these relationships as foreign keys and join tables. Querying “Find all sites within 5km that have burial features detected with >80% confidence, created by users who also discovered defensive sites” becomes a nightmare of JOINs.

In Neo4j, this query is natural:

1
2
3
4
5


MATCH (u:User)-[:CREATED]->(defensiveSite:ArchaeologicalSite)-[:BELONGS_TO]->(:Category {name: 'Defensive'})
MATCH (u)-[:CREATED]->(site:ArchaeologicalSite)-[:HAS_FEATURE]->(f:ArchaeologicalFeature)
WHERE f.name = 'Burial' AND f.confidence > 0.8
  AND distance(site.location, point({latitude: $lat, longitude: $lon})) < 5000
RETURN DISTINCT site

Archaios uses Neo4j to model the entire knowledge graph of archaeological discoveries — from raw LiDAR uploads to AI-detected features to multi-agent debate transcripts.

2. The Archaeological Knowledge Graph Model

Here’s how Archaios structures its graph:

1
2
3
4
5
6
7
8
9


(User)
  ├─[:CREATED]─>(ArchaeologicalSite)
  │               ├─[:HAS_COORDINATES]─>(Coordinates)
  │               ├─[:BELONGS_TO]─>(Category)
  │               ├─[:HAS_COMPONENT]─>(SiteComponent)
  │               ├─[:HAS_ANALYSIS]─>(AnalysisResult)
  │               │                    └─[:DETECTED_FEATURE]─>(ArchaeologicalFeature)
  │               └─[:HAS_FEATURE]─>(ArchaeologicalFeature)
  └─[:DISCOVERED]─>(Discovery)

Node Types

User - Archaeologists and researchers

Properties: id, name, oid, provider, photoUrl, createdAt
Represents authenticated users from Azure AD

ArchaeologicalSite - Discovered sites

Properties: siteId, name, latitude, longitude, status, type, isPossibleArchaeologicalSite
Core entity connecting all data

Coordinates - Geographic locations

Properties: latitude, longitude, location (Neo4j point type)
Enables spatial queries

Category - Site classifications

Properties: name (Settlement, Defensive, Burial, Agricultural, etc.)
Shared across multiple sites for categorization

SiteComponent - Processed imagery and data

Properties: name, imageUrl, componentId, type (Hillshade, NDVI, TrueColor, etc.)
Links to Azure Blob Storage URLs

AnalysisResult - Vision AI analysis groups

Properties: groupName (TopographyGroup, SpectralGroup), caption, tags
Groups related features together

ArchaeologicalFeature - Detected features

Properties: name, confidence, description, featureType
Individual archaeological features (walls, ditches, terraces, etc.)

Relationship Types

CREATED - User created a site

Properties: timestamp

HAS_COORDINATES - Site has geographic location

BELONGS_TO - Site belongs to category

HAS_COMPONENT - Site has imagery/data component

Properties: timestamp

HAS_ANALYSIS - Site has AI analysis result

DETECTED_FEATURE - Analysis detected a feature

Properties: confidence

HAS_FEATURE - Direct site-to-feature link (for quick queries)

Properties: confidence

3. Creating the Site Graph

When a user uploads a LiDAR file, the first step is creating the site node:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81


public async Task CreateArchaeologicalSiteAsync(ArchaeologicalSite site)
{
    // Create site node
    var siteProperties = new
    {
        id = site.Id,
        siteId = site.SiteId,
        name = site.Name,
        description = site.Description ?? string.Empty,
        latitude = site.Latitude,
        longitude = site.Longitude,
        location = site.Location ?? $"point({site.Latitude}, {site.Longitude})",
        category = site.Category ?? "Uncategorized",
        status = site.Status ?? string.Empty,
        lastUpdated = site.LastUpdated.ToUniversalTime().ToString("o"),
        isKnownSite = site.IsKnownSite
    };

    await _neo4jRepository.CreateNodeAsync("ArchaeologicalSite", "id", siteProperties);

    // Create coordinates node
    var coordinates = new
    {
        name = $"{site.Name} Coordinates",
        latitude = site.Latitude,
        longitude = site.Longitude,
        location = $"point({site.Latitude}, {site.Longitude})"
    };

    await _neo4jRepository.CreateNodeAsync("Coordinates", "latitude", coordinates);

    // Link site to coordinates
    await _neo4jRepository.CreateRelationshipAsync(
        "ArchaeologicalSite", "id", site.Id,
        "Coordinates", "location", coordinates.location,
        "HAS_COORDINATES"
    );

    // Create category node
    var categoryProperties = new { name = siteProperties.category };
    await _neo4jRepository.CreateNodeAsync("Category", "name", categoryProperties);

    // Link site to category
    await _neo4jRepository.CreateRelationshipAsync(
        "ArchaeologicalSite", "id", site.Id,
        "Category", "name", categoryProperties.name,
        "BELONGS_TO"
    );

    // Link user to site
    if (site.ArchaiosUser != null)
    {
        await _neo4jRepository.CreateRelationshipAsync(
            "User", "id", site.ArchaiosUser.Id,
            "ArchaeologicalSite", "id", site.Id,
            "CREATED",
            new { timestamp = DateTime.UtcNow.ToString("o") }
        );
    }

    // Create component nodes (imagery)
    foreach (var component in site.Components ?? Enumerable.Empty<SiteComponent>())
    {
        var componentProperties = new
        {
            name = component.Name,
            state = component.State,
            siteId = site.Id,
            componentId = component.ComponentId ?? Guid.NewGuid().ToString(),
            type = component.Type ?? "Feature"
        };

        await _neo4jRepository.CreateNodeAsync("SiteComponent", "name", componentProperties);

        await _neo4jRepository.CreateRelationshipAsync(
            "ArchaeologicalSite", "id", site.Id,
            "SiteComponent", "name", componentProperties.name,
            "HAS_COMPONENT"
        );
    }
}

Key Pattern: MERGE Instead of CREATE

Notice we use CreateNodeAsync with a keyProperty parameter. Under the hood, this uses Neo4j’s MERGE:

1
2
3


MERGE (n:ArchaeologicalSite {id: $keyValue})
ON CREATE SET n = $props
ON MATCH SET n = $props

This ensures idempotency — if the orchestration retries, we don’t create duplicate nodes.

4. Storing Analysis Results and Features

After the multi-agent team analyzes the site, we store their findings as a graph:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65


[Function("StoreAnalysisResultsRelationships")]
public async Task Run([ActivityTrigger] AnalysisRelationshipsRequest request)
{
    _logger.LogInformation($"Creating analysis result relationships for site {request.SiteId}");

    foreach (var analysisEntry in request.AnalysisResults)
    {
        string groupName = analysisEntry.Key;  // "TopographyGroup", "SpectralGroup"
        var analysisResult = analysisEntry.Value;

        if (groupName == "Error" || analysisResult.Features == null)
            continue;

        // Create analysis node
        string analysisNodeId = $"{request.SiteId}_{groupName}";
        var analysisNodeProperties = new
        {
            id = analysisNodeId,
            siteId = request.SiteId,
            groupName = groupName,
            caption = analysisResult.Caption ?? string.Empty,
            tags = string.Join(",", analysisResult.Tags ?? new List<string>()),
            timestamp = DateTime.UtcNow.ToString("o")
        };

        await _neo4jRepository.CreateNodeAsync("AnalysisResult", "id", analysisNodeProperties);

        // Link site to analysis
        await _neo4jRepository.CreateRelationshipAsync(
            "ArchaeologicalSite", "siteId", request.SiteId,
            "AnalysisResult", "id", analysisNodeId,
            "HAS_ANALYSIS");
            
        // Create feature nodes
        foreach (var feature in analysisResult.Features)
        {
            string featureId = $"{analysisNodeId}_{Guid.NewGuid().ToString()}";
            var featureProperties = new
            {
                id = featureId,
                analysisId = analysisNodeId,
                name = feature.Name ?? "Unknown",
                confidence = feature.Confidence,
                description = feature.Description ?? string.Empty,
                featureType = feature.Name
            };
            
            await _neo4jRepository.CreateNodeAsync("ArchaeologicalFeature", "id", featureProperties);
            
            // Link analysis to feature (with confidence on relationship)
            await _neo4jRepository.CreateRelationshipAsync(
                "AnalysisResult", "id", analysisNodeId,
                "ArchaeologicalFeature", "id", featureId,
                "DETECTED_FEATURE",
                new { confidence = feature.Confidence });
            
            // Direct site-to-feature link (optimization for queries)
            await _neo4jRepository.CreateRelationshipAsync(
                "ArchaeologicalSite", "siteId", request.SiteId,
                "ArchaeologicalFeature", "id", featureId,
                "HAS_FEATURE",
                new { confidence = feature.Confidence });
        }
    }
}

Why Two Relationship Paths?

1
2


(Site)-[:HAS_ANALYSIS]->(AnalysisResult)-[:DETECTED_FEATURE]->(Feature)
(Site)-[:HAS_FEATURE]->(Feature)

The first path preserves the grouping structure (Topography vs Spectral analysis). The second path enables fast direct queries: “Show all sites with burial features >80% confidence.”

This is a common graph modeling pattern: denormalize for query performance.

5. Querying the Graph

Retrieving a site with all its related data is a single Cypher query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48


public async Task<ArchaeologicalSite> GetSiteBySiteIdAsync(string Id)
{
    var result = await _boltClient.Cypher
        .Match("(site:ArchaeologicalSite {id: $siteId})")
        .OptionalMatch("(site)-[:HAS_COMPONENT]->(component:SiteComponent)")
        .OptionalMatch("(user:User)-[:CREATED]->(site)")
        .OptionalMatch("(site)-[:HAS_ANALYSIS]->(analysis:AnalysisResult)")
        .OptionalMatch("(analysis)-[:DETECTED_FEATURE]->(feature:ArchaeologicalFeature)")
        .WithParam("siteId", Id)
        .With("site, collect(DISTINCT component) AS components, user, analysis, collect(DISTINCT feature) AS features")
        .With("site, components, user, collect({analysis: analysis, features: features}) AS analysisGroups")
        .Return((site, components, user, analysisGroups) => new
        {
            Site = site.As<ArchaeologicalSite>(),
            User = user.As<ArchaiosUser>(),
            Components = components.As<List<SiteComponent>>(),
            AnalysisGroups = analysisGroups.As<List<AnalysisGroupResult>>()
        })
        .ResultsAsync;

    var siteResult = result.FirstOrDefault();
    
    // Map analysis groups back to site model
    if (siteResult.AnalysisGroups != null)
    {
        siteResult.Site.AnalysisResults = new Dictionary<string, AnalysisResult>();
        
        foreach (var group in siteResult.AnalysisGroups.Where(g => g.analysis != null))
        {
            var analysisResult = new AnalysisResult
            {
                Caption = group.analysis.caption,
                GroupName = group.analysis.groupName,
                Tags = group.analysis.tags.Split(',').ToList(),
                Features = group.features.Select(f => new DetectedFeature
                {
                    Name = f.name,
                    Confidence = f.confidence,
                    Description = f.description
                }).ToList()
            };

            siteResult.Site.AnalysisResults[group.analysis.groupName] = analysisResult;
        }
    }

    return siteResult.Site;
}

This single query traverses:

Site properties
Related components (imagery)
Creator user
Analysis results with grouped features

In SQL, this would require 4-5 JOINs and complex result mapping.

6. Spatial Queries with Neo4j Points

Neo4j has native support for geographic points and distance calculations:

1
2
3


MATCH (site:ArchaeologicalSite)
WHERE distance(site.location, point({latitude: 52.3456, longitude: -1.2345})) < 5000
RETURN site

We store coordinates as Neo4j point types:

1

location = $"point({site.Latitude}, {site.Longitude})"

This enables queries like:

“Find sites within 10km of this location”
“Find the nearest archaeological site to these coordinates”
“Cluster sites by geographic proximity”

Combined with feature relationships:

1
2
3
4
5
6


MATCH (site:ArchaeologicalSite)-[:HAS_FEATURE]->(f:ArchaeologicalFeature)
WHERE f.name = 'Burial'
  AND f.confidence > 0.8
  AND distance(site.location, point({latitude: $lat, longitude: $lon})) < 10000
RETURN site, f
ORDER BY distance(site.location, point({latitude: $lat, longitude: $lon}))

This finds burial sites within 10km, ordered by proximity.

7. Graph Algorithms for Discovery

Neo4j’s Graph Data Science library enables pattern detection across the knowledge graph:

Community Detection

Find clusters of related sites:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


public async Task<IDictionary<string, double>> FindCommunitiesAsync(string label)
{
    var query = @"
        CALL gds.louvain.stream($label)
        YIELD nodeId, communityId
        WITH gds.util.asNode(nodeId) as node, communityId
        RETURN node.id as nodeId, communityId";

    var result = await _driver.RunAsync(query, new { label });
    
    // Returns dictionary: { "site-123": 1, "site-456": 1, "site-789": 2 }
    // Sites 123 and 456 are in the same community (connected by similar features)
}

This can reveal:

Sites created by the same user
Sites with similar detected features
Geographic clusters of settlements

PageRank for Importance

Rank sites by their centrality in the graph:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


public async Task<IDictionary<string, double>> RankNodesAsync(string label)
{
    var query = @"
        CALL gds.pageRank.stream($label)
        YIELD nodeId, score
        WITH gds.util.asNode(nodeId) as node, score
        RETURN node.id as nodeId, score
        ORDER BY score DESC";

    // Higher score = more connected site (many features, components, relationships)
}

Sites with:

More detected features
More components (imagery types)
More user interactions …rank higher, indicating archaeological significance.

8. Storing Agent Conversations

The multi-agent debate is serialized and stored directly on the site node:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


public async Task UpdateSiteAgentAnalysisAsync(string siteId, List<AgentChatMessage> agentMessages)
{
    string serializedMessages = JsonConvert.SerializeObject(agentMessages);

    await _boltClient.Cypher
        .Match("(site:ArchaeologicalSite {siteId: $siteId})")
        .Set("site.serializedAgentAnalysis = $messages")
        .WithParam("siteId", siteId)
        .WithParam("messages", serializedMessages)
        .ExecuteWithoutResultsAsync();
}

When retrieving the site, we deserialize it:

1
2
3
4
5
6


if (!string.IsNullOrEmpty(site.SerializedAgentAnalysis))
{
    site.AgentAnalysis = JsonConvert.DeserializeObject<List<AgentChatMessage>>(
        site.SerializedAgentAnalysis
    );
}

This preserves the entire conversation — users can see why the AI team approved or rejected their site.

9. Repository Pattern with Neo4jClient

Archaios uses Neo4jClient for Cypher query building and Neo4j.Driver for raw driver access:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48


public class Neo4jRepository : INeo4jRepository
{
    private readonly IBoltGraphClient _boltClient;  // Neo4jClient (high-level)
    private readonly IDriver _driver;                // Neo4j.Driver (low-level)

    public async Task CreateNodeAsync<T>(string label, string keyProperty, T properties)
    {
        var propertyValue = typeof(T).GetProperty(keyProperty)?.GetValue(properties);
        
        await _boltClient.Cypher
            .Merge($"(n:{label} {{{keyProperty}: $keyValue}})")
            .OnCreate()
            .Set("n = $props")
            .OnMatch()
            .Set("n = $props")
            .WithParams(new
            {
                keyValue = propertyValue,
                props = properties
            })
            .ExecuteWithoutResultsAsync();
    }

    public async Task CreateRelationshipAsync(
        string startNodeLabel, string startNodeProperty, string startNodeValue,
        string endNodeLabel, string endNodeProperty, string endNodeValue,
        string relationshipType, object properties = null)
    {
        var query = _boltClient.Cypher
            .Match($"(start:{startNodeLabel})", $"(end:{endNodeLabel})")
            .Where($"start.{startNodeProperty} = $startValue")
            .AndWhere($"end.{endNodeProperty} = $endValue")
            .WithParam("startValue", startNodeValue)
            .WithParam("endValue", endNodeValue);

        if (properties != null)
        {
            query = query.Create($"(start)-[r:{relationshipType} $props]->(end)")
                       .WithParam("props", properties);
        }
        else
        {
            query = query.Create($"(start)-[:{relationshipType}]->(end)");
        }

        await query.ExecuteWithoutResultsAsync();
    }
}

Why both libraries?

Neo4jClient - Fluent API for building type-safe Cypher queries
Neo4j.Driver - Raw driver for Graph Data Science algorithms and advanced queries

10. Benefits of Graph Modeling

🔗 Natural Relationship Queries

“Find all sites created by users who also discovered burial features”
SQL: Complex self-joins, temp tables
Cypher: MATCH (u:User)-[:CREATED]->(s1)-[:HAS_FEATURE]->(:Feature {name: 'Burial'}) MATCH (u)-[:CREATED]->(s2) RETURN DISTINCT s2

📍 Spatial + Graph Hybrid

“Find defensive sites within 5km of water sources that have high-confidence AI analysis”
Combines geographic proximity with graph traversal

🧠 Knowledge Discovery

Community detection reveals site clusters
PageRank identifies archaeologically significant sites
Shortest path finds connections between discoveries

⚡ Performance at Scale

Graph queries don’t degrade with relationship depth
SQL JOINs get exponentially slower with each level
Neo4j is optimized for traversals

🎯 Semantic Richness

Relationships have types (CREATED, DETECTED_FEATURE) and properties (confidence, timestamp)
The graph is the domain model — no impedance mismatch

11. Key Takeaways

🗺️ Graph databases excel at relationship-heavy domains like archaeology, social networks, knowledge management

📐 Node-relationship-property model is more intuitive than tables and foreign keys for complex domains

🔄 MERGE ensures idempotency — critical for distributed systems with retries

🌍 Neo4j native spatial support combines geographic and graph queries seamlessly

🧮 Graph algorithms (community detection, PageRank, shortest path) reveal hidden patterns

📊 Cypher queries read like natural language descriptions of graph patterns

🏛️ Domain-driven design maps directly to graph structure without ORM translation layers

12. Get Started

Ready to model your complex domain as a knowledge graph?

🔗 GitHub Repository: https://github.com/Cloud-Jas/Archaios

📚 Key Files to Study:

🚀 Neo4j Resources:

Graph databases transform how we model and query complex, relationship-heavy domains. By representing archaeological knowledge as a graph, Archaios enables powerful spatial-semantic queries that would be nearly impossible in relational databases. If you’re working with connected data, consider going graph!

#Neo4j #GraphDatabase #Cypher #KnowledgeGraph #Archaeology #Azure #DataModeling #MicrosoftMVP