spike/jira.xml at main · Emyyr/spike · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8" ?>
<!--
#    \\ SPIKE: Secure your secrets with SPIFFE. — https://spike.ist/
#  \\\\\ Copyright 2024-present SPIKE contributors.
# \\\\\\\ SPDX-License-Identifier: Apache-2.0
-->
<!--
  ABOUT JIRA.XML

  JIRA.XML serves as a sandbox for capturing ideas and drafting issue templates.
  It is a tongue-in-cheek jab at how inefficient Jira is at managing tasks and
  how, sometimes, simple tools (like a shared, version-controlled, free-form
  text file) can make wonders because they are easy to use without any red tape
  around them.

  JIRA.xml provides a free-form space where we can “think out loud” and sketch
  potential issues before deciding which ones to formally create in GitHub. By
  working here first, we keep the active issue tracker focused and avoid
  cluttering it with early-stage or exploratory thoughts.
 -->
<stuff>
  <high-level-plan>
    <issue order="1">Zero Turtle Pi Grid</issue>
    <issue order="3">Secure Secrets CLI UX (similar to Claude auth)</issue>
    <watch-list>
      <issue>
        spike cipher (stream mode) is broken -> Murat.
      </issue>
      <issue>
        PRs to watch:
        https://github.com/spiffe/spike-sdk-go/pull/140
        https://github.com/spiffe/spike/pull/263
        https://github.com/spiffe/spike/pull/268
        https://github.com/spiffe/spike/pull/270
        https://github.com/spiffe/helm-charts-hardened/pull/665
      </issue>
    </watch-list>
  </high-level-plan>
  <task-group theme="immediate">
    <issue priority="high">
      fix broken things first:
      * CI integration test is broken
      * recovery/restore is broken
    </issue>
    <issue>
      a demo just focus on the architecture and nothing else. (no live demo,
      no k8s deployment, etc)
    </issue>
    <issue>
      Create a new walk-trough video recording.
        * Brief architecture overview
        * Dev startup for bare-metal
        * Dev startup for Minikube
        * CLI overview
        * Mini codewalk
    </issue>
    <issue severity="important" urgency="moderate">
      // TDO: check all database operations (secrets, policies, metadata)
      and
      // ensure that they are retried with exponential backoff.

      add retries to everything under:
      app/nexus/internal/state/persist
      ^ they all talk to db; and sqlite can temporarily lock for
      a variety of reasons.
    </issue>
    <issue>
      Try SPIKE on a Mac (and create a video)
    </issue>
    <issue>
      spike on windows video.
    </issue>
    <issue>
      func initializeSqliteBackend(rootKey *[32]byte) backend.Backend {
      panic if rootkey is nil or empty.
    </issue>
    <issue>
      create a video about how to develop SPIKE on WSL.
    </issue>
    <issue waitingFor="json mode to be fixed">
      demo: encryption as a service
    </issue>
  </task-group>
  <task-group state="parked" until="after-release">
    <issue>
      const GCMNonceSize = 12
      to sdk.
    </issue>
    <issue>
      `spike secret list` right now lists all secrets available and
      requires system level access.

      we might want to restrict the "list" action to a regular expression
      matcher with a policy, so for certain SPIFFE IDs, a `list` call
      will only list the secrets that they are allowed to see.

      so that a workload (if allowed) can list secrets under
      secret/tenants/acme/*
      but not secret/tenants/bodega

      For that, we likely need to extend the policy structure too

      here is how the policies are specified currenty:
      spike policy create --name=workload-can-write \
      --path-pattern="^tenants/demo/db/.*$" \
      --spiffeid-pattern="^spiffe://spike\.ist/workload/.*$" \
      --permissions="write"

      we may need to add a "list" permission type, and that would
      limit the policy to list secrets that match to the path pattern only.

      if a workload does not have a list access given by a policy, it
      won't be able to list secrets.
    </issue>
    <issue>
      duplicate code fragment:
      for _, policy := range *policies {
      result.WriteString(fmt.Sprintf("ID: %s\n", policy.ID))
      result.WriteString(fmt.Sprintf("Name: %s\n", policy.Name))
      result.WriteString(fmt.Sprintf("SPIFFE ID Pattern: %s\n",
      policy.SPIFFEIDPattern))
      result.WriteString(fmt.Sprintf("Path Pattern: %s\n",
      policy.PathPattern))

      perms := make([]string, 0, len(policy.Permissions))
      for _, p := range policy.Permissions {
      perms = append(perms, string(p))
      }
    </issue>
    <issue>
      duplicate code fragment

      func encryptStream(cmd *cobra.Command, api *sdk.API, inFile, outFile string) {
      // Validate the input file exists before attempting encryption.
      if inFile != "" {
      if _, err := os.Stat(inFile); err != nil {
      if os.IsNotExist(err) {
      cmd.PrintErrf("Error: Input file does not exist: %s\n", inFile)
      return
      }
      cmd.PrintErrf("Error: Cannot access input file: %s\n", inFile)
      return
      }
    </issue>
    <issue>
      maybe have a dedicated RootKey type of *[32]byte
    </issue>
    <issue>
      move internal stuff to the SDK.
    </issue>
    <issue>
      duplicate code fragment
      // Get the specific version
      v, exists := secret.Versions[version]
      if !exists {
      failErr := *sdkErrors.ErrEntityNotFound.Clone()
      failErr.Msg = fmt.Sprintf(
      "secret with path %s not found for version %v",
      path, version,
      )
      return nil, &amp;failErr
      }
    </issue>
    <issue>
      duplicated code fragment:
      func (s *DataStore) DeletePolicy(
      ctx context.Context, id string,
      ) *sdkErrors.SDKError {
      const fName = "DeletePolicy"

      validation.CheckContext(ctx, fName)

      s.mu.Lock()
      defer s.mu.Unlock()

      tx, beginErr := s.db.BeginTx(
      ctx, &amp;sql.TxOptions{Isolation: sql.LevelSerializable},
      )
      if beginErr != nil {
      failErr := sdkErrors.ErrTransactionBeginFailed.Wrap(beginErr)
      return failErr
      }
    </issue>
    <issue>
      duplicated code fragment:

      func RouteDecrypt(
      w http.ResponseWriter, r *http.Request, audit *journal.AuditEntry,
      ) *sdkErrors.SDKError {
      const fName = "routeDecrypt"
      journal.AuditRequest(fName, r, audit, journal.AuditCreate)

      // Check if streaming mode based on Content-Type
      contentType := r.Header.Get(headerKeyContentType)
      streamModeActive := contentType == headerValueOctetStream

      if streamModeActive {
      // Cipher getter for streaming mode
      getCipher := func() (cipher.AEAD, *sdkErrors.SDKError) {
      return getCipherOrFailStreaming(w)
      }
      return handleStreamingDecrypt(w, r, getCipher)
      }
    </issue>
    <issue>
      refactor with injections to make the code more testable:

      func TestSendShardsToKeepers_NetworkDependentFunction(t *testing.T) {
      // The sendShardsToKeepers function has multiple external dependencies that make it
      // difficult to test without a significant infrastructure:
      // 1. Requires SPIFFE X509Source
      // 2. Makes network calls via mTLS clients
      // 3. Depends on state management for the root key
      // 4. Calls computeShares() and sanityCheck() which have their own dependencies
      t.Skip("Skipping sendShardsToKeepers test - requires SPIFFE infrastructure, network connectivity, and state management")

      // Note: To properly test this function, you would need to:
      // 1. Mock the workloadapi.X509Source
      // 2. Mock network.CreateMTLSClientWithPredicate
      // 3. Mock net.Post
      // 4. Mock state.RootKeyZero()
      // 5. Mock computeShares() and sanityCheck()
      // 6. Set up test HTTP servers
      // 7. Or refactor the code for better testability with dependency injection
      }
    </issue>
    <issue>
      g := group.P256
      t := uint(env.ShamirThresholdVal() - 1) // Need t+1 shares to reconstruct
      n := uint(env.ShamirSharesVal())        // Total number of shares
      duplicated code fragment.
    </issue>
    <issue>
      SPIKE Bootstrap tries to reach keepers forever, but it should have a max
      timeout (like, say, 20mins), after which it gives up and crashes.
      -- configurable.
    </issue>

    <issue>
      ensure db operations are retried a couple times with backoff
      before giving up.

      SQLLite can error out if there is a blocked transaction or
      a integrity issue, which a retry can fix it.
    </issue>
    <issue>
      BREAKING: API Methods should accept a context, so that they can
      be cancelled from the consumer without having to for a wrapper
      goroutine
    </issue>
    <issue>
      Add a SPIKE Lite section for the docs.
      But before that crypto stream and json mode needs to be fixed.

      SPIKE Lite:
      # get pilot's pem and key to test things out.
      # this can be part of documentation too. to test the API directly, 1. extract the pem and key, and then do regular
      curl.
      curl -s -X POST --header "Content-Type:application/octet-stream" --data-binary "This is a test encryption"
      https://spire-spike-nexus/v1/encrypt -k --cert /tmp/pem/svid.0.pem --key /tmp/pem/svid.0.key -o encrypted
      curl -s -X POST --header "Content-Type:application/octet-stream" --data-binary @encrypted
      https://spire-spike-nexus/v1/decrypt -k --cert /tmp/pem/svid.0.pem --key /tmp/pem/svid.0.key -o decrypted
      cat decrypted; echo
    </issue>
    <issue>
      start.sh should test recovery and restore
      start.sh should test encryption and decryption
    </issue>
    <issue>
      func DatabaseOperationTimeout() time.Duration {
      ^ this is not used anywhere. find where it should be used and add it.
    </issue>

    <issue>
      code duplication

      // Initialize parameters
      g := group.P256
      t := uint(env.ShamirThresholdVal() - 1) // Need t+1 shares to reconstruct
      n := uint(env.ShamirSharesVal())        // Total number of shares

      // Create a secret from our 32-byte key:
      rootSecret := g.NewScalar()
      if err := rootSecret.UnmarshalBinary(rootKeySeed[:]); err != nil {
      failErr := sdkErrors.ErrDataUnmarshalFailure.Wrap(err)
      log.FatalErr(fName, *failErr)
      }
    </issue>
    <issue area="helm-charts">
      * helm-charts-hardened is still waiting for spike-next merge.
      * helm-charts-hardene/spike-next has incorrect assumptions about
      components that might need update:
      1.   ## @param trustRoot.keepers Override which trustRoot Keepers are in
      for spike nexus keepers can have multiple trust roots (which is okay)
      but for spike keeper, nexus is assumed to have a single trust root.
      in a HA setup nexus can have multiple trust roots, so the value
      in values.yaml should be an array.
      2. SPIKE Nexus statefulset uses both SPIKE_TRUST_ROOT_NEXUS and also
      SPIKE_TRUST_ROOT (same for keeper, bootstrap, and pilot)
      that's not a bug, but it's confusing and should be documented
      in the charts which one is for what. SPIKE_TRUST_ROOT is for self
      assignment and SPIKE_TRUST_ROOT_NEXUS etc is for validation functions.
    </issue>
    <issue>
      `make test` is still not parallelized due to env setup.
    </issue>
    <issue>
      when entries are not registered to the SPIRE Server and the
      operator tries to use SPIKE, the error messages can be more
      explanatory.
    </issue>
    <issue>
      also, empty id or path should raise an error for policies
      and also for secrets
    </issue>
    <issue>
      spike-next is still an open branch on helm-charts main.
      check if there are still things that need to be merged, and what
      needs to be done to push the PR forward.
    </issue>
    <issue>
      Go through GitHub issues and close the completed ones.
    </issue>
    <issue>
      // It's unlikely to have 1000 SPIKE Keepers across the board.
      // The indexes start from 1 and increase one-by-one by design.
      const maxShardID = 1000
      to env var configuration.
    </issue>
    <issue>
      additional memory clearing for the sdk:
      https://github.com/spiffe/spike/issues/243
    </issue>
    <issue>
      VSecM has a missing video:
      vsecm missing video(s)
      https://vsecm.com/documentation/getting-started/overview/
      we can directly remove it too, since vscem is less of a focus nowadays.
    </issue>
    <issue>
      move this to the SDK: app/spike/internal/trust/spiffeid.go
      then any code that does self-authentication such as the following,
      will use trust.AuthenticateForXyz() instead:
      This...
      // I should be SPIKE Nexus.
      if !spiffeid.IsNexus(selfSPIFFEID){...crash if not nexus...}
      will become
      trust.AuthenticateForNexus(selfSPIFFEID)
    </issue>
    <issue>
      SDK. duplicate code fragment:
      if source == nil {
      return nil, sdkErrors.ErrSPIFFENilX509Source.Clone()
      }

      r := reqres.SecretMetadataRequest{Path: path, Version: version}

      mr, marshalErr := json.Marshal(r)
      if marshalErr != nil {
      failErr := sdkErrors.ErrDataMarshalFailure.Wrap(marshalErr)
      failErr.Msg = "problem generating the payload"
      return nil, failErr
      }
    </issue>
  </task-group>
  <task-group for="v0.8.2">
    <issue>
      check all test helpers. some of them can go to the SDK instead.
    </issue>
    <issue>
      Integration test: Ensure SPIKE Nexus caches the root key in memory.
    </issue>
    <issue>
      Integration test: Ensure SPIKE Nexus recovers root key from keepers.
    </issue>
    <issue>
      Integration test: Ensure SPIKE Nexus does not (inadvertently) initialize twice.
      Once it's initialized successfully it should not recompute root key
      material without manual `spike operator` intervention (because rotating
      the root key without re-encrypting secrets will turn the backing store
      unreadable)
      when the key is lost, it should wait it to be re-seeded by keepers, or
      manually recovered via `spike operator`.
    </issue>
    <issue>
      verify if the keeper has shard before resending it:
      send hash of the shard first
      if keeper says “I have it”, don’t send the actual shard.
      this will make things extra secure.
    </issue>
    <issue>
      Use Case: Each SPIKE Keeper in its own trust domain.
      (we can also create a demo once this feature is established)

      Details:

      The current design assumes that all keepers are in the same trust boundary
      (which defaults to spike.ist)
      that can make sense for a single-cluster deployment
      (each keeper can be tainted to deploy itself on a distinct node for
      redundancy)
      however; a keeper's SPIFFE ID does not have to have a single trust root.
      each SPIKE keeper can (in theory) live in a separate cluster, in its own
      trust boundary, depicted by a different trust root.

      If that's the case the below function will not be valid

      func IsKeeper(id string) bool {
      return id == spiffeid.SpikeKeeper()
      }

      Instead of this the validation should be done against SPIKE_NEXUS_KEEPER_PEERS
      and the env var should also contain the trust root of each keeper.

      Similarly, the function cannot check the trust root.
      it may however verify the part of spiffeID "after" the trust root.

      // I should be a SPIKE Keeper.
      if !cfg.IsKeeper(selfSpiffeid) {
      log.FatalF("Authenticate: SPIFFE ID %s is not valid.\n", selfSpiffeid)
      }

      So for the `main` function of SPIKE Keeper we'll need a more relaxed
      version of IsKeeper.
      and the IsKeeper in SPIKE Nexus will validate looking at the env
      config.

      which also means, SPIKE Keeper can require SPIKE_TRUST_ROOT along
      with SPIKE_KEEPER_TLS_PORT to start. at start-keeper-1.sh
    </issue>
    <issue>
      Each keeper is backed by a TPM.
    </issue>
    <issue>
      UpsertPolicy calls LoadAllPolicies to find a policy by name, which is
      O(n) and inefficient as the number of policies grows. Add a
      LoadPolicyByName method to the backend interface that performs an
      indexed lookup directly (SQLite can use an index on the name column).
      This would make the lookup O(1) instead of O(n).

      Location: app/nexus/internal/state/base/policy.go (UpsertPolicy function)
      Backend interface: app/nexus/internal/state/backend/interface.go
    </issue>
    <issue>
      Note: this is non-trivial, but doable.

      Periodic rotation of the encryption keys is recommended, even in the
      absence of compromise. Due to the nature of the AES-256-GCM
      encryption
      used, keys should be rotated before approximately 232 encryptions
      have
      been performed, following the guidelines of NIST publication
      800-38D.

      This can be achieved by having a separate encryption key protected
      by
      the root key and rotating the encryption key, and maybe maintaining
      a
      keyring. This way, we won't have to rotate shards to rotate the
      encryption
      key and won't need to change the shards -- this will also allow the
      encryption key to be rotated behind-the-scenes automatically as per
      NIST guidance.
    </issue>
    <issue>
      spike dev mode:
      - it will not require SPIFFE
      - it will be in memory
      - it will be a single binary
      - it will present a SPIKE Nexus API in that binary.
      - regular unsafe `curl` would work.
      - would be SDK-compatible.

      ^ not sure it's worth the effort, but it will be nice-to-have.
    </issue>
    <issue kind="enhancement">
      attribute-based policy control

      path "secret/restricted" {
      capabilities = ["create"]
      allowed_parameters = {
      "foo" = []
      "bar" = ["zip", "zap"]
      }
      }
    </issue>
    <issue>
      maybe a default auditor SPIFFEID that can only read stuff (for
      Pilot;
      not for named admins; named admins will use the policy system
      instead)
    </issue>
    <issue>
      audit targets:
      - file
      - syslog
      - socket
      (if audit targets are enabled then command will not execute unless
      an
      audit trail is started)
    </issue>
    <issue>
      maybe ha mode

      HA Mode in OpenBao: In HA mode, OpenBao operates with one active server
      and multiple standby servers. The active server processes all requests,
      while standby servers redirect requests to the active instance. If the
      active server fails, one of the standby servers takes over as the new
      active instance. This mechanism relies on PostgreSQL's ability to manage
      locks and ensure consistency across nodes35.
      Limitations:
      The PostgreSQL backend for OpenBao is community-supported and considered
      in an early preview stage, meaning it may have breaking changes or limited
      testing in production environments2.
      While PostgreSQL supports replication and failover mechanisms for its own
      HA, these features operate independently of OpenBao's HA mode. Proper
      configuration and monitoring of the PostgreSQL cluster are essential to
      ensure database-level resilience
    </issue>
    <issue>
      We need use cases in the website
      - Policy-based access control for workloads
      - Secret CRUD operations
      - etc
    </issue>
    <issue priority="medium" severity="medium">
      a way to factory-reset SPIKE: reset db; recreate rootkey; delete
      etc.

      spike operator reset:
      deletes and recreates the ~/.spike folder
      restarts the initialization flow to rekey keepers.

      volkan@spike:~/Desktop/WORKSPACE/spike$ spike secret get /db
      Error reading secret: post: Problem connecting to peer

      ^ I get an error instead of a "secret not found" message.
    </issue>
    <issue kind="performance,research" severity="low" priority="low" fun="high">
      {"time":"2025-04-25T13:24:52.652299515-07:00","level":"INFO","m":"HydrateMemoryFromBackingStore","m":"HydrateMemoryFromBackingStore:
      secrets loaded"}
      {"time":"2025-04-25T13:24:52.652368182-07:00","level":"INFO","m":"HydrateMemoryFromBackingStore","m":"HydrateMemoryFromBackingStore:
      policies loaded"}
      ^
      how can we know that this data has already been pushed.
      a brute force way is to hash the last payloads and compare with the hashes of the current payloads.
      if hydrated, no need to re-hydrate then.
      but that requires two full table scans, json serialization, and hashing.
      could there be a better way?
    </issue>
    <issue priority="high" severity="medium">
      Test with different shamir ratios

      * 5/3 -- 5 keepers, out of 3 should be alive.
      * 1/1 -- A single keeper
      * 0/0 -- edge case; not sure how it should behave.
      * in-memory -- in-memory mode should disregard any keeper config.
    </issue>
    <issue>
      app/nexus/internal/state/backend/sqlite/persist/crypto.go
      these are generic enough to move to the SDK.
      maybe go through entire internal folders and see what can be ported.
    </issue>
    <issue>
      validateContext(ctx, fName)
      this can be an SDK utility function and can be used across the codebase instead of just this package.

    </issue>
    <issue>
      ability to lock nexus programmatically.
      `spike operator lock/unlock` => will need the right clusterspiffeid for
      the command to work.

      ^ instead of that, you can run a script that removes all SVID
      registrations. That will effectively result in the same thing.

      also document this
      also create a demo video
    </issue>
    <issue>
      - TODO optimize sqlite default params and also make sure we retry
      database operations -- at least check that we have sane defaults.
      - ref: https://tenthousandmeters.com/blog/sqlite-concurrent-writes-and-database-is-locked-errors/
    </issue>
    <issue>
      pattern-based random secret generation
      VSecM already does it; leverage it from it.
      Or, alternatively, move the code to SPIKE SDK and let VSecM use it
      from SPIKE.
    </issue>
    <issue>
      dev mode with "zero" keepers.
      this exists, but we need a demo video for it.
      and maybe documentation update.
    </issue>
    <issue>
      Integration test: Ensure SPIKE Pilot denies any operation when SPIKE
      Nexus is not initialized.
    </issue>
    <issue>
      Integration test: Ensure SPIKE Pilot warns the user if SPIKE Nexus is
      unreachable.
    </issue>
    <issue>
      Integration tests: Ensure we can create and read secrets.
    </issue>
    <issue>
      Integration test: Ensure we can create and read policies.
    </issue>
    <issue>
      // this pattern is repeated a lot; move to a helper function.
      _ = os.Setenv("SPIKE_NEXUS_BACKEND_STORE", "memory")
      defer func() {
      if original != "" {
      _ = os.Setenv("SPIKE_NEXUS_BACKEND_STORE", original)
      } else {
      _ = os.Unsetenv("SPIKE_NEXUS_BACKEND_STORE")
      }
      }()
    </issue>
    <issue>
      Test SPIKE Lite setup.
      And maybe create a demo video.
    </issue>
    <issue>
      goes to spike sdk go
      func Id() string {
      id, err := crypto.RandomString(8)
      if err != nil {
      id = fmt.Sprintf("CRYPTO-ERR: %s", err.Error())
      }
      return id
      }
    </issue>
    <issue>
      A `--dry run` feature for spike commands:
      it will not create policies, secrets, etc, but just pass validations
      and return success/failure responses instead.
      useful for integration tests.
    </issue>
    <issue>
      make sure we check Spike.LiteWorkload spiffe id in policies.
      also make sure the encryption as a service works.
    </issue>
    <issue>
      Notes: not for VSecM but for SPIKE directly maybe;
      also create an ADR first, maybe.

      Need to think about this; I'm not sure we need a UI at all.
      I am getting more fond of Claude Code model of triggering a web-device-PKCE
      cfg flow and then doing everything through the CLI.

      For VSecM: create a UI that directly talks to SPIKE; depracate all VSecM
      secret model. also delete most of the folders. start with a clean slate
    </issue>
    <issue>
      You have OIDC-related drafts; polish them and move them somewhere visible
      as an actionable roadmap.
    </issue>
    <issue when="later" action="consider">
      introduce SPIKE to VSecM
      https://github.com/vmware/secrets-manager/issues/1275
    </issue>
    <issue>
      consider using modernize:
      https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
    </issue>
    <issue>
      for spike consider some of these if not there yet already
      https://fluxcd.io/flux/security/
    </issue>
    <issue>
      const KeeperKeep APIURL = "/v1/store/keep"
      const KeeperContribute APIURL = "/v1/store/contribute"
      const KeeperShard APIURL = "/v1/store/shard"

      ^ these are unused in the SDK, which may hint that whatever uses them
      here may need to be SDK methods maybe.
    </issue>
    <issue status="assigned">
      make tests concurrent again.
    </issue>
    <issue waitingFor="upstreamHelmCharts">
      Go over entire documentation: There are places to update since
      we have helm charts updates.
    </issue>
    <issue>
      Sign generated binaries:
      #!/usr/bin/env bash
      set -euo pipefail

      ART_DIR="${1:-dist}"
      MODE="${MODE:-kms}"  # kms|keyless|file
      KEY="${KEY:-awskms://arn:aws:kms:us-west-2:123456789012:key/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}"
      PUB="${PUB:-spike-cosign.pub}"

      cd "$ART_DIR"

      # 1) checksums
      sha256sum * > SHA256SUMS

      signfile () {
      local f="$1"
      case "$MODE" in
      kms)
      cosign sign-blob --key "$KEY" --output-signature "$f.sig" "$f"
      ;;
      file)
      cosign sign-blob --key cosign.key --output-signature "$f.sig" "$f"
      ;;
      keyless)
      COSIGNEXPERIMENTAL=1 cosign sign-blob --yes \
      --output-signature "$f.sig" \
      --output-certificate "$f.pem" \
      --bundle "$f.bundle" \
      "$f"
      ;;
      *)
      echo "Unknown MODE=$MODE" >&2; exit 1
      ;;
      esac
      }

      # 2) sign all artifacts + checksum file
      for f in *; do
      [[ "$f" =~ .(sig|pem|bundle)$ ]] && continue
      sign_file "$f"
      done

      # 3) export public key if using KMS
      if [[ "$MODE" == "kms" ]]; then
      cosign public-key --key "$KEY" > "../$PUB"
      fi

      echo "Signed. Publish signatures + $PUB (if present)."
    </issue>
    <issue>
      create a list for what needs for 1.0 version; i.e. to move SPIKE
      out of alpha and make it "ready for production use with a hint of caution"

      SPIFFE Org already has standard requirements and a process for that.
      Check it out.
    </issue>
    <issue waitingFor="upstreamHelmCharts">
      after helm-charts changes are merged, verify that the quickstart
      guide still works.
    </issue>
    <issue>
      audit: right now we send audit logs to std err.
      we would need different audit targets later.
    </issue>
    <issue>
      this function is a temporary fix.
      conxtext-based handling is already in the SDK
      update it
      func contributeWithContext(
      ctx context.Context, api *spike.API,
      share secretsharing.Share, keeperID string,
      ) error {
    </issue>
    <issue>
      // Serve the app.
      sdkNet.ServeWithRoute(
      appName,
      source,
      http.Route,
      // AllowAll, because any workload can talk to SPIKE Nexus if they
      // have a legitimate SPIFFE ID registration entry.
      // we might want to further restrict this based on environment
      // configuration maybe (for example, a predicate that checks regex
      // matching on workload SPIFFFE IDs before granting access,
      // if the matcher is not provided, AllowAll will be assumed.
      predicate.AllowAll,
      env.NexusTLSPortVal(),
      )
    </issue>
    <issue>
      CLI debug logging to file.

      Currently, the spike CLI (spike policy, spike secret, etc.) does not
      have any structured logging for debugging purposes. When errors occur,
      only user-friendly messages are shown.

      Consider adding an optional file-based logging mechanism for the CLI
      that can be enabled via an environment variable (e.g., SPIKE_CLI_LOG)
      or a flag (e.g., --debug-log=/path/to/file). This would help with
      troubleshooting without cluttering the terminal output.

      The logs should be structured (JSON) and include error codes, timestamps,
      and context information.
    </issue>
    <issue>
      also check out: https://developer.hashicorp.com/vault/docs/concepts/policies
      to see if we can amend any updates to the policy rules
      (one such update, for example, is limiting what kind of attributes are
      allowed, but we should discuss whether that much granularity is worth the
      hassle)
    </issue>
    <issue>
      ADR:
      * Added the ability to optionally skip database schema creation during SPIKE
      initialization. This can be useful if the operator does not want to give
      db schema modification privileges to SPIKE to adhere to the principle of
      least privilege. The default behavior is to allow automatic schema creation.
      Since SPIKE is assumed to own its backing store, limiting its access
      does not provide a significant security benefit. Letting SPIKE manage
      its own database schema provides operational convenience.
    </issue>

    <issue>
      these may go to the sdk

      func GenerateCustomNonce(s *DataStore) ([]byte, error) {
      nonce := make([]byte, s.Cipher.NonceSize())
      if _, err := io.ReadFull(rand.Reader, nonce); err != nil {
      return nil, err
      }
      return nonce, nil
      }

      func EncryptWithCustomNonce(s *DataStore, nonce []byte, data []byte) ([]byte, error) {
      if len(nonce) != s.Cipher.NonceSize() {
      return nil, fmt.Errorf("invalid nonce size: got %d, want %d", len(nonce), s.Cipher.NonceSize())
      }
      ciphertext := s.Cipher.Seal(nil, nonce, data, nil)
      return ciphertext, nil
      }
    </issue>

    <issue>
      these serialization and deserialization functions can be
      extracted. --- They can even be part of the Go SDK.

      It can error, or swallow unknown permissions.

      // Deserialize permissions from comma-separated string
      if permissionsStr != "" {
      permissionStrs := strings.Split(permissionsStr, ",")
      policy.Permissions = make([]data.PolicyPermission, len(permissionStrs))
      for i, permStr := range permissionStrs {
      policy.Permissions[i] = data.PolicyPermission(strings.TrimSpace(permStr))
      }
      }
    </issue>
    <issue>
      missingFlags = append(missingFlags, "name")
      }
      if pathPattern == "" {
      missingFlags = append(missingFlags, "path-pattern")
      }
      if SPIFFEIDPattern == "" {
      missingFlags = append(missingFlags, "spiffeid-pattern")
      }
      if permsStr == "" {
      missingFlags = append(missingFlags, "permissions")

      have these flag names as constants maybe.
    </issue>
    <issue>
      go-spiffe uses context.Context so that we can time out SVID lookup instead
      of waiting forever; create an env variable for that and implement it
      wherever it makes sense.
    </issue>
    <issue>
      func contains(permissions []data.PolicyPermission,
      func hasAllPermissions(
      these can be generic helper functions in the sdk.
    </issue>
    <issue>
      u, err := url.JoinPath(
      keeperAPIRoot, string(apiUrl.KeeperContribute),
      )

      these should be methods of SDK instead.
    </issue>
    <issue>
      implement this sometime:
      ci/wip-draft.txt
      and find a place to run it every commit; could be github actions; but also
      can take a long time, so could be a dedicated toy server too.
    </issue

    <issue>
      maybe we can add a configurable "bootsrap time out" as an env var later. For now, the bootsrap app will try to
      bootstrap the thing in an exponentially-backing-off loop until it succeeds.
    </issue>
    <issue>
      update helm-charts-hardened for SPIKE HA setup
      based on future/001-spike-ha.md, it should be trivial.

      also add a section in the docs about SPIKE HA setup.
    </issue>
    <issue>
      go through internal files, some of them are generic enough to graduate
      to the SDK.
    </issue>
    <issue>
      Refactor app/spike/internal/cmd/secret/get.go to extract repeated
      marshaling logic (yaml.Marshal, json.Marshal, json.MarshalIndent) into
      helper functions. The same marshal-and-print pattern is duplicated
      multiple times for different format types.
    </issue>
    <issue>
      func VerifyShamirReconstruction(secret group.Scalar, shares []shamir.Share) {
      this is generic enough to go to the sdk.

      Check if there are other similar reusable functions that can go.
    </issue>
    <issue>
      validPermsList := "read, write, list, super"
      SDK should define these instead.
    </issue>
    <issue>
      these can be made generic and added to the SDK:
      app/nexus/internal/state/base/validation.go
    </issue>
    <issue priority="medium" category="testing">
      CLI Command Testing Guidance

      This issue documents strategies for testing SPIKE CLI commands with
      varying levels of complexity.

      == UNIT TESTING (No Mocking Required) ==

      The following helper functions can be tested directly without mocking:

      1. Policy Commands (app/spike/internal/cmd/policy/):
      - filter.go: validUUID() - UUID validation
      - fotmat.go: formatPoliciesOutput(), formatPolicy() - output formatting
      - validation.go: validatePolicySpec() - policy spec validation
      - create.go: readPolicyFromFile(), getPolicyFromFlags() - input parsing

      2. Secret Commands (app/spike/internal/cmd/secret/):
      - validation.go: validSecretPath() - path validation
      - print.go: formatTime(), printSecretResponse() - output formatting

      3. Cipher Commands (app/spike/internal/cmd/cipher/):
      - io.go: openInput(), openOutput() - file I/O handling

      Example test pattern for pure functions:
      ```go
      func TestValidUUID(t *testing.T) {
      tests := []struct {
      name     string
      uuid     string
      expected bool
      }{
      {"valid UUID", "123e4567-e89b-12d3-a456-426614174000", true},
      {"invalid", "not-a-uuid", false},
      }
      for _, tt := range tests {
      t.Run(tt.name, func(t *testing.T) {
      if validUUID(tt.uuid) != tt.expected {
      t.Errorf("validUUID(%q) = %v, want %v",
      tt.uuid, !tt.expected, tt.expected)
      }
      })
      }
      }
      ```

      == HTTP-LEVEL MOCKING (For API Calls) ==

      To test command Run functions that call the SDK API, use httptest.Server
      to mock SPIKE Nexus responses:

      ```go
      func TestSecretGetCommand_Success(t *testing.T) {
      // Create mock server
      server := httptest.NewTLSServer(http.HandlerFunc(
      func(w http.ResponseWriter, r *http.Request) {
      // Verify request
      if r.URL.Path != "/v1/store/secret" {
      t.Errorf("unexpected path: %s", r.URL.Path)
      }

      // Return mock response
      resp := reqres.SecretReadResponse{
      Data: map[string]string{"key": "value"},
      }
      json.NewEncoder(w).Encode(resp)
      },
      ))
      defer server.Close()

      // Configure SDK to use mock server URL
      // (requires setting SPIKE_NEXUS_API_URL env var or similar)
      t.Setenv("SPIKE_NEXUS_API_URL", server.URL)

      // Execute command and verify output
      // ...
      }
      ```

      Challenges with HTTP mocking:
      - SDK uses mTLS, so mock server needs proper TLS config
      - May need to mock X509Source or bypass SPIFFE validation
      - Consider creating test helpers for common mock scenarios

      == INTEGRATION TESTING (With Real SPIKE Nexus) ==