Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 154 additions & 0 deletions BITSTRING_INDEXING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# BitString Indexing Implementation

This document describes the bitstring indexing implementation added to Platform.Data.Doublets as an alternative/additional indexing mechanism to the existing tree-based approaches.

## Overview

Bitstring indexing uses bit arrays (BitArray) to efficiently store and query relationships between links. This approach is particularly effective for:

- High-performance bitwise operations (AND, OR, XOR)
- Efficient set intersection and union operations
- Memory-efficient storage for sparse data relationships
- Direct hardware-accelerated bitwise computations

## Architecture

### Core Components

1. **IndexTreeType.BitStringIndex** - New enum value (4) added to support bitstring indexing
2. **IBitStringTreeMethods<TLinkAddress>** - Interface extending ILinksTreeMethods with bitstring-specific operations
3. **BitStringIndexMethodsBase<TLinkAddress>** - Base implementation for Split memory model
4. **LinksBitStringIndexMethodsBase<TLinkAddress>** - Base implementation for United memory model

### Implementation Classes

#### Split Memory Model
- `InternalLinksBitStringIndexMethods<TLinkAddress>` - For internal links data
- `ExternalLinksBitStringIndexMethods<TLinkAddress>` - For external links indices

#### United Memory Model
- `LinksSourcesBitStringIndexMethods<TLinkAddress>` - For source-based indexing
- `LinksTargetsBitStringIndexMethods<TLinkAddress>` - For target-based indexing

## Key Features

### Bitwise Operations
```csharp
// Perform efficient set operations on relationships
BitArray intersection = bitStringMethods.BitwiseAnd(key1, key2);
BitArray union = bitStringMethods.BitwiseOr(key1, key2);
BitArray difference = bitStringMethods.BitwiseXor(key1, key2);
```

### Bit Manipulation
```csharp
// Set/get individual bits representing relationships
bitStringMethods.SetBit(linkId, position, true);
bool hasRelationship = bitStringMethods.GetBit(linkId, position);
```

### Usage Tracking
```csharp
// Count and enumerate link usages
int usageCount = bitStringMethods.CountSetBits(linkId);
bitStringMethods.EachUsage(rootLink, handler);
```

### Dynamic Resizing
- BitArrays automatically resize when new relationships exceed current capacity
- Exponential growth strategy (2x) for efficient memory management
- Default initial size: 1024 bits (configurable)

## Usage Examples

### Basic Setup
```csharp
var memory = new HeapResizableDirectMemory();
var constants = new LinksConstants<ulong>(enableExternalReferencesSupport: true);

unsafe
{
var header = memory.AllocateOrReserve(sizeof(LinksHeader<ulong>));
var links = memory.AllocateOrReserve(sizeof(RawLink<ulong>) * 10);

var bitStringMethods = new LinksSourcesBitStringIndexMethods<ulong>(
constants, (byte*)links, (byte*)header);

// Use bitstring operations...

memory.Free();
}
```

### Relationship Management
```csharp
// Establish relationships
bitStringMethods.SetBit(sourceLink, relationshipId, true);
bitStringMethods.SetBit(targetLink, relationshipId, true);

// Search for common relationships
var commonRelation = bitStringMethods.Search(sourceLink, targetLink);
```

### Attach/Detach Operations
```csharp
// Attach child to parent
var parent = parentLink;
bitStringMethods.Attach(ref parent, childLink);

// Detach child from parent
bitStringMethods.Detach(ref parent, childLink);
```

## Performance Characteristics

### Advantages
- **O(1)** bit set/get operations
- **Hardware-accelerated** bitwise operations (AND, OR, XOR)
- **Memory efficient** for sparse relationship storage
- **Parallel processing** friendly - bitwise operations can be vectorized

### Considerations
- **Memory overhead** for dense relationships (compared to tree structures)
- **Not sorted** - relationships are accessed by position, not in sorted order
- **Dynamic resizing cost** when BitArrays need to expand

## Integration with Existing Code

The bitstring indexing implementation follows the same patterns as existing tree methods:

1. Implements the same `ILinksTreeMethods<TLinkAddress>` interface
2. Provides all required operations: CountUsages, Search, EachUsage, Attach, Detach
3. Can be used as a drop-in replacement for tree-based indexing
4. Supports both Split and United memory models

## Testing

Comprehensive unit tests are provided in `BitStringIndexMethodsTests.cs` covering:

- Basic bitstring operations (SetBit, GetBit, CountSetBits)
- Bitwise operations (AND, OR, XOR)
- Attach/Detach operations
- Search functionality
- Usage enumeration

## Future Enhancements

Potential improvements for future versions:

1. **Compression** - Implement run-length encoding or other compression schemes
2. **Persistence** - Add support for saving/loading bitstring indices to/from disk
3. **Hybrid indexing** - Combine bitstring with tree indexing for optimal performance
4. **SIMD optimization** - Leverage SIMD instructions for even faster bitwise operations
5. **Bloom filters** - Add probabilistic membership testing for large datasets

## Conclusion

The bitstring indexing implementation provides a high-performance alternative to tree-based indexing, particularly suitable for applications requiring:

- Fast set operations on relationships
- Efficient sparse data storage
- Hardware-accelerated bitwise computations
- Simple relationship presence/absence queries

This implementation maintains full compatibility with the existing Platform.Data.Doublets architecture while offering unique performance characteristics for specific use cases.
202 changes: 202 additions & 0 deletions csharp/Platform.Data.Doublets.Tests/BitStringIndexMethodsTests.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
using System;
using System.Collections;
using Platform.Data.Doublets.Memory;
using Platform.Data.Doublets.Memory.United.Generic;
using Platform.Memory;
using Xunit;

namespace Platform.Data.Doublets.Tests
{
public static class BitStringIndexMethodsTests
{
[Fact]
public static void BitStringBasicOperationsTest()
{
const ulong firstLink = 1ul;
const ulong secondLink = 2ul;
const ulong thirdLink = 3ul;
const int bitPosition1 = 0;
const int bitPosition2 = 5;
const int bitPosition3 = 10;

var memory = new HeapResizableDirectMemory();
var constants = new LinksConstants<ulong>(enableExternalReferencesSupport: true);

unsafe
{
var header = memory.AllocateOrReserve(sizeof(LinksHeader<ulong>));
var links = memory.AllocateOrReserve(sizeof(RawLink<ulong>) * 3);

var bitStringMethods = new LinksSourcesBitStringIndexMethods<ulong>(constants, (byte*)links, (byte*)header);

// Test SetBit and GetBit
bitStringMethods.SetBit(firstLink, bitPosition1, true);
bitStringMethods.SetBit(firstLink, bitPosition2, true);
bitStringMethods.SetBit(secondLink, bitPosition1, true);
bitStringMethods.SetBit(secondLink, bitPosition3, true);

Assert.True(bitStringMethods.GetBit(firstLink, bitPosition1));
Assert.True(bitStringMethods.GetBit(firstLink, bitPosition2));
Assert.False(bitStringMethods.GetBit(firstLink, bitPosition3));

Assert.True(bitStringMethods.GetBit(secondLink, bitPosition1));
Assert.False(bitStringMethods.GetBit(secondLink, bitPosition2));
Assert.True(bitStringMethods.GetBit(secondLink, bitPosition3));

// Test CountSetBits
Assert.Equal(2, bitStringMethods.CountSetBits(firstLink));
Assert.Equal(2, bitStringMethods.CountSetBits(secondLink));
Assert.Equal(0, bitStringMethods.CountSetBits(thirdLink));

// Test BitwiseAnd
var andResult = bitStringMethods.BitwiseAnd(firstLink, secondLink);
Assert.True(andResult[bitPosition1]); // Both have bit 0 set
Assert.False(andResult[bitPosition2]); // Only firstLink has bit 5 set
Assert.False(andResult[bitPosition3]); // Only secondLink has bit 10 set

// Test BitwiseOr
var orResult = bitStringMethods.BitwiseOr(firstLink, secondLink);
Assert.True(orResult[bitPosition1]); // Both have bit 0 set
Assert.True(orResult[bitPosition2]); // firstLink has bit 5 set
Assert.True(orResult[bitPosition3]); // secondLink has bit 10 set

// Test BitwiseXor
var xorResult = bitStringMethods.BitwiseXor(firstLink, secondLink);
Assert.False(xorResult[bitPosition1]); // Both have bit 0 set (XOR = false)
Assert.True(xorResult[bitPosition2]); // Only firstLink has bit 5 set
Assert.True(xorResult[bitPosition3]); // Only secondLink has bit 10 set

memory.Free();
}
}

[Fact]
public static void BitStringAttachDetachTest()
{
const ulong rootLink = 1ul;
const ulong childLink1 = 2ul;
const ulong childLink2 = 3ul;
const ulong childLink3 = 4ul;

var memory = new HeapResizableDirectMemory();
var constants = new LinksConstants<ulong>(enableExternalReferencesSupport: true);

unsafe
{
var header = memory.AllocateOrReserve(sizeof(LinksHeader<ulong>));
var links = memory.AllocateOrReserve(sizeof(RawLink<ulong>) * 4);

var bitStringMethods = new LinksTargetsBitStringIndexMethods<ulong>(constants, (byte*)links, (byte*)header);

// Test Attach
var root = rootLink;
bitStringMethods.Attach(ref root, childLink1);
bitStringMethods.Attach(ref root, childLink2);
bitStringMethods.Attach(ref root, childLink3);

// Test CountUsages
Assert.Equal(3ul, bitStringMethods.CountUsages(rootLink));

// Test GetBitString
var bitString = bitStringMethods.GetBitString(rootLink);
Assert.True(bitString[int.CreateTruncating(childLink1) - 1]);
Assert.True(bitString[int.CreateTruncating(childLink2) - 1]);
Assert.True(bitString[int.CreateTruncating(childLink3) - 1]);

// Test Detach
bitStringMethods.Detach(ref root, childLink2);
Assert.Equal(2ul, bitStringMethods.CountUsages(rootLink));

var updatedBitString = bitStringMethods.GetBitString(rootLink);
Assert.True(updatedBitString[int.CreateTruncating(childLink1) - 1]);
Assert.False(updatedBitString[int.CreateTruncating(childLink2) - 1]);
Assert.True(updatedBitString[int.CreateTruncating(childLink3) - 1]);

memory.Free();
}
}

[Fact]
public static void BitStringSearchTest()
{
const ulong sourceLink = 1ul;
const ulong targetLink = 2ul;
const ulong connectionLink = 3ul;

var memory = new HeapResizableDirectMemory();
var constants = new LinksConstants<ulong>(enableExternalReferencesSupport: true);

unsafe
{
var header = memory.AllocateOrReserve(sizeof(LinksHeader<ulong>));
var links = memory.AllocateOrReserve(sizeof(RawLink<ulong>) * 3);

var bitStringMethods = new LinksSourcesBitStringIndexMethods<ulong>(constants, (byte*)links, (byte*)header);

// Set up a connection: both source and target should reference the same connection
bitStringMethods.SetBit(sourceLink, int.CreateTruncating(connectionLink) - 1, true);
bitStringMethods.SetBit(targetLink, int.CreateTruncating(connectionLink) - 1, true);

// Test Search - should find the connection
var searchResult = bitStringMethods.Search(sourceLink, targetLink);
Assert.Equal(connectionLink, searchResult);

// Test Search with no common connection
const ulong otherLink = 4ul;
bitStringMethods.SetBit(otherLink, 10, true); // Different bit position
var noConnectionResult = bitStringMethods.Search(sourceLink, otherLink);
Assert.Equal(0ul, noConnectionResult);

memory.Free();
}
}

[Fact]
public static void BitStringEachUsageTest()
{
const ulong rootLink = 1ul;
const ulong childLink1 = 2ul;
const ulong childLink2 = 3ul;

var memory = new HeapResizableDirectMemory();
var constants = new LinksConstants<ulong>(enableExternalReferencesSupport: true);

unsafe
{
var header = memory.AllocateOrReserve(sizeof(LinksHeader<ulong>));
var links = memory.AllocateOrReserve(sizeof(RawLink<ulong>) * 3);

var bitStringMethods = new LinksTargetsBitStringIndexMethods<ulong>(constants, (byte*)links, (byte*)header);

// Attach children
var root = rootLink;
bitStringMethods.Attach(ref root, childLink1);
bitStringMethods.Attach(ref root, childLink2);

// Test EachUsage
var visitedLinks = new System.Collections.Generic.List<ulong>();
bitStringMethods.EachUsage(rootLink, link =>
{
visitedLinks.Add(link[0]);
return constants.Continue;
});

Assert.Equal(2, visitedLinks.Count);
Assert.Contains(childLink1, visitedLinks);
Assert.Contains(childLink2, visitedLinks);

// Test EachUsage with early break
var limitedVisits = new System.Collections.Generic.List<ulong>();
bitStringMethods.EachUsage(rootLink, link =>
{
limitedVisits.Add(link[0]);
return constants.Break; // Break after first visit
});

Assert.Equal(1, limitedVisits.Count);

memory.Free();
}
}
}
}
Loading
Loading