From dcd570ceccc039cfef6a6eeaf0e1e6aa4ca0a9e4 Mon Sep 17 00:00:00 2001
From: Drew Ogle <drew@aperturedata.io>
Date: Wed, 20 Nov 2024 23:17:40 -0500
Subject: [PATCH 1/2] Yolov4 Based Dog Clip Detection

---
 notebooks/video_clips/VideoClips.ipynb | 1032 ++++++++++++++++++++++++
 notebooks/video_clips/yolo4.py         |  201 +++++
 2 files changed, 1233 insertions(+)
 create mode 100644 notebooks/video_clips/VideoClips.ipynb
 create mode 100755 notebooks/video_clips/yolo4.py

diff --git a/notebooks/video_clips/VideoClips.ipynb b/notebooks/video_clips/VideoClips.ipynb
new file mode 100644
index 0000000..46101e9
--- /dev/null
+++ b/notebooks/video_clips/VideoClips.ipynb
@@ -0,0 +1,1032 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "2ef71144-e2c1-4696-a82a-a9c49dae001e",
+   "metadata": {},
+   "source": [
+    "# Creating Video Clips\n",
+    "This notebook demonstrates using a YOLOv4 model to process individual frames and use the resulting output to generate information about the video.\n",
+    "We use detections of labels over sequential frames to generate Clips which describe the existance of those objects within a specific portion of the video."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cb687190-a53b-45ca-a58c-3029c88dcecb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install --upgrade --force-reinstall git+https://github.com/aperture-data/aperturedb-python.git@frames_images_fix"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8835379a-1c02-44d3-930c-f321279ee0bb",
+   "metadata": {},
+   "source": [
+    "# Install ApertureDB\n",
+    "First we install the ApertureDB python module and other modules we need for running our model. Then, we verify our connection."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d444a543-c419-4d24-8db4-1fea7ccdc270",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install aperturedb tqdm 2>&1 >/dev/null\n",
+    "from aperturedb import Utils\n",
+    "c = Utils.create_connector()\n",
+    "\n",
+    "u = Utils.Utils(c)\n",
+    "u.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e1a96c2-b0d1-47b6-b836-c0eb0142101f",
+   "metadata": {},
+   "source": [
+    "# Download resources\n",
+    "Now we need to download the python code to run the model, `yolov4.py` and the video we are going to use, `norman.mp4`, a video about a dog riding a bicycle.\n",
+    "\n",
+    "This was chosen because it includes several labels but also because it has detections which overlap - dogs and bikes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8053c29f-bbaa-43f3-82e1-bfe103f72579",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Now we retrieve the items we are working with:\n",
+    "\n",
+    "# Retrieve the YOLO4 interface\n",
+    "\n",
+    "!wget https://raw.githubusercontent.com/drewaogle/YOLOv4-OpenCV-CUDA-DNN/refs/heads/main/yolo4.py\n",
+    "\n",
+    "# Retreive video\n",
+    "!wget https://aperturedata-public.s3.us-west-2.amazonaws.com/aperturedb_applications/norman.mp4\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5dae152-0643-4701-a505-0b331dd9b204",
+   "metadata": {},
+   "source": [
+    "# Run The Detector\n",
+    "Now that we've downloaded our YOLOv4 code, let's run it.\n",
+    "This will need to download the weights and some configuration; about 300M and will do it automatically.\n",
+    "\n",
+    "After downloading or verify files, it will then process the video. with `no_squash_detections` as `True` it won't overwrite an existing output dir, so delete it to rerun. This code can support hardware acceleration, but is designed so it won't be unwieldly without it. Detections should be at about 3-10fps without hardware, and take less 5 minutes.\n",
+    "\n",
+    "If a file were to fail halfway through it, rerunning the loader wont be happy ( sha2 sums won't match )\n",
+    "`!rm ~/models` will reset the downloads, though."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1fda14d1-dc17-48da-af51-bdeb8d6e468d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import importlib\n",
+    "import yolo4\n",
+    "importlib.reload(yolo4)\n",
+    "from yolo4 import RemoteYOLOv4\n",
+    "# options for detector\n",
+    "class DetectorOptions:\n",
+    "    image='' # path for images\n",
+    "    stream='' # path for stream\n",
+    "    cfg=\"models/yolov4.cfg\" # path to config\n",
+    "    weights=\"models/yolov4.weights\" # path to weights\n",
+    "    namesfile=\"models/coco.names\" # path for output to name mapping\n",
+    "    input_size=416\n",
+    "    use_gpu=False # use GPU or not\n",
+    "    outdir=\"output/norman\"\n",
+    "    no_squash_detections=True # if detections exist, don't rerun.\n",
+    "    def __init__(self, image='',stream=''):\n",
+    "        self.image = image\n",
+    "        self.stream=stream # 'webcam' to open webcam w/ OpenCV\n",
+    "\n",
+    "# now we pull data, and and run detection.\n",
+    "dopts = DetectorOptions( stream=\"norman.mp4\")\n",
+    "yolo = RemoteYOLOv4.__new__(RemoteYOLOv4)\n",
+    "yolo.__init__(dopts)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa362d75-8c06-4adb-a0ba-3b66421a1807",
+   "metadata": {},
+   "source": [
+    "## Now Check Detections\n",
+    "the YoloV4 code we use outputs detections sequentially into a csv file, so let's load the file and see what the output looks like."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e3053bdb-a78a-4a8e-a454-2507a1477eb9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#Now let's check detections\n",
+    "import pandas as pd\n",
+    "df = pd.read_csv(\"output/norman/detections.csv\")\n",
+    "print(df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a8cd9e9-ecaf-4e67-896b-019a6c30f1c3",
+   "metadata": {},
+   "source": [
+    "# Process Into Clips\n",
+    "Now that we've verified that we have the data from the model, we will take the output and process it into Clips.\n",
+    "\n",
+    "We will define a few classes and functions to process our model output.\n",
+    "\n",
+    "- `ClipOptions` - an options class that we will use to define how it works;  \n",
+    "- `preprocess` - convert the dataframe into information the detection can use\n",
+    "- `Clip` - a class that defines our output\n",
+    "- `ClipStorage` - a class for maintaining state between the functions\n",
+    "- `process_new_frame` - a function to run when we see a new frame\n",
+    "- `process_row` - a function to run we we see a new detection\n",
+    "- `process` - the function to process the whole video/csv.\n",
+    "\n",
+    "These could be in a single class, but I've left them apart to allow people to take them piece by piece."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2503ff66-92c7-46a6-9dfb-9b55882c79bf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import logging\n",
+    "# Fist we'll define the options we're going to use.\n",
+    "class ClipOptions:\n",
+    "    offset_frame=0 # starting offset in frames\n",
+    "    end_frame=-1 # ending offset in frames\n",
+    "    initconf=50 # minimun confidence to start ( 0-100 )\n",
+    "    initlen=5 # minimum detection duration in frames to start a clip\n",
+    "    dropconf=25 # confidence to end a frame (0 -100 )\n",
+    "    droplen=5 # number of detection missed frames to end a clip\n",
+    "    detections=None  # path to output detections\n",
+    "    video=None # video that the detections is from\n",
+    "    verbose=logging.INFO # moderate amount of info\n",
+    "    flush=False # remove old uuids\n",
+    "    nosave=False # dont add data to db\n",
+    "    label=\"\" # label for video\n",
+    "    def __init__(self,video,detections):\n",
+    "        self.video=video # video file to add\n",
+    "        self.detections=detections"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b0988e5-6908-4e43-9ddb-2fa63ef8449c",
+   "metadata": {},
+   "source": [
+    "## Set Initial Clip Detection options\n",
+    "We're going to set that we have processed `norman.mp4` into detections.csv in our output directory.\n",
+    "We'll set the initial confidence at 45%, and that we have to hit that for 3 frames to have what we will consider worth of a detection of a clip.\n",
+    "Then we'll set that a confidence dropping below 20% for more than 3 frames will be the end of a clip."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f3278fea-0abc-4220-b7bb-d36548442165",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# options\n",
+    "clip_opts = ClipOptions( \"norman.mp4\",\"output/noman/detections.csv\" )\n",
+    "clip_opts.label=\"Norman_Bike\"\n",
+    "clip_opts.initconf=45\n",
+    "clip_opts.initlen=3\n",
+    "clip_opts.dropconf=20\n",
+    "clip_opts.droplen=3"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ce9e797-f44f-4b8c-9255-133084dd890e",
+   "metadata": {},
+   "source": [
+    "## Spot Check Detections\n",
+    "Lets take a look at the detections when we have defined what the output means, and verify the labeling is correct."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ce21ed8-93bb-4d4f-8962-3dc77ce947f7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# function to prepare dataframe for work; add columns and trim frames we don't want.\n",
+    "def preprocess(df, args ):\n",
+    "   processed = df\n",
+    "   processed.columns = [\"frame\",\"label\",\"confidence\",\"left\",\"top\",\"width\",\"height\" ]\n",
+    "   processed.drop(processed[processed.frame < args.offset_frame].index, inplace=True)\n",
+    "   if args.end_frame > -1:\n",
+    "      processed.drop(processed[processed.frame > args.end_frame].index,inplace=True)\n",
+    "   return processed\n",
+    "\n",
+    "norman_detects = preprocess( df, clip_opts )\n",
+    "print(norman_detects)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ec29e04-24b0-4e63-9793-23f6289b3eaf",
+   "metadata": {},
+   "source": [
+    "These look good, and so we will now go and define our processing.\n",
+    "\n",
+    "We'll start with verifying that our mapping works. Always best to verify that your understanding of \"top\" matches with the model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "580724fa-493f-429a-9504-140d412cf551",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from IPython.display import display as ds\n",
+    "import cv2\n",
+    "from PIL import Image\n",
+    "\n",
+    "def display_image_and_bb( num, df ):\n",
+    "\n",
+    "    # we also output the video frames in our model code so we can spot check.\n",
+    "    cv_image = cv2.imread( f\"output/norman/video{num}.jpg\")\n",
+    "\n",
+    "    # Draw a rectangle around the detections\n",
+    "    counter = 0\n",
+    "    for id,coords in df[df[\"frame\"] == num].iterrows():\n",
+    "        left   = coords[\"left\"]\n",
+    "        top    = coords[\"top\"]\n",
+    "        right  = coords[\"left\"] + coords[\"width\"]\n",
+    "        bottom = coords[\"top\"] + coords[\"height\"]\n",
+    "        cv2.rectangle(cv_image, (left, top), (right, bottom), (0, 255, 0), 2)\n",
+    "        y = top - 15 if top - 15 > 15 else top + 15\n",
+    "        cv2.putText(cv_image, coords[\"label\"], (left, y),\n",
+    "                    cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)\n",
+    "        counter += 1\n",
+    "\n",
+    "    cv_image_rgb = cv2.cvtColor(cv_image, cv2.COLOR_BGR2RGB)\n",
+    "    ds(Image.fromarray(cv_image_rgb))\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "98818068-81a0-4bc9-979e-44fc844ed36c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Now we will select a frame, 150 and see now it looks.\n",
+    "display_image_and_bb( 150, norman_detects)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "900a56b6-f419-420c-a4ef-7ebff38cb7de",
+   "metadata": {},
+   "source": [
+    "### Detection Verification\n",
+    "This is pretty much what we would expect. bike, dog, person .. a car detection in the background, a nice find.\n",
+    "\n",
+    "That house is *not* a stop sign though. I guess it is seeing the sharp edges and deciding that is the bottom of a stop sign?\n",
+    "\n",
+    "Now we will define a `Clip` and `ClipStorage`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6ed6815a-07a8-4fa7-bc42-4f7b2cc89ebb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# simple Clip class for data storage\n",
+    "class Clip:\n",
+    "    def __init__(self, label, start,conf):\n",
+    "        self.label = label\n",
+    "        self.start_frame = start\n",
+    "        self.total_frames = 0 # don't include start in total.\n",
+    "        self.missed_frames = 0\n",
+    "        self.max_confidence = conf\n",
+    "        self.min_confidence = conf\n",
+    "        self.last_frame_seen = None\n",
+    "    def is_active( self, current_frame, drop_len ):\n",
+    "        last_seen = self.start_frame + self.total_frames\n",
+    "        # drop_len is the number of frame that can be missed and label is \"active\"\n",
+    "        # drop_len of 1 means active continues if unseen in previous frame.\n",
+    "        return last_seen + drop_len > current_frame\n",
+    "    def __str__(self):\n",
+    "        return f\"(Clip [{self.label}] @ {self.start_frame} + {self.total_frames})\"\n",
+    "    def __repr__(self):\n",
+    "        return f\"C[{self.label} @ {self.start_frame} + {self.total_frames}]\"\n",
+    "    def describe(self):\n",
+    "        return f\"Clip is of {self.label}. First seen at {self.start_frame}, seen until {self.start_frame+self.total_frames}\"\n",
+    "    def as_finished(self):\n",
+    "        return f\"{self.label}_{self.start_frame+self.total_frames}\"\n",
+    "    def add_confidence(self,new_confidence, frame_num):\n",
+    "        self.max_confidence = max(self.max_confidence,new_confidence)\n",
+    "        self.min_confidence = min(self.min_confidence,new_confidence)\n",
+    "        # there could be multiple detections of a given type per frame; we don't track multiple here\n",
+    "        # and are merely looking to know how many frames a label occurs in\n",
+    "        if frame_num == self.last_frame_seen:\n",
+    "            return\n",
+    "        self.last_frame_seen = frame_num\n",
+    "        self.total_frames = self.total_frames + 1 + self.missed_frames\n",
+    "        # when a frame is a hit, we add the missed frames to what is considered the total length.\n",
+    "        self.missed_frames = 0 \n",
+    "    # frames where confidence was below threshold but kept to avoid drop out.\n",
+    "    def add_missed(self,missed_confidence):\n",
+    "        self.missed_frames = self.missed_frames + 1\n",
+    "\n",
+    "class ClipStorage:\n",
+    "    def __init__(self):\n",
+    "        self.active = {} # clips that have been seen, but not passed initializition count ( suppressed mis-identification )\n",
+    "        self.registered = {} # clips that are 'valid', and currently \"seent\"\n",
+    "        self.finished = {} # clips that were valid, but dropped off."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48751f94-50d5-4e6d-983c-d07d6f69f87c",
+   "metadata": {},
+   "source": [
+    "next we'll define `process_new_frame` - what we want to happen when we have reached a new frame. This is important because YOLO has multiple detections per frame,\n",
+    "and we are processing per csv.\n",
+    "\n",
+    "then, `process_row` - one row from the csv, which is one detection."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9e2e8318-bd6f-4520-88dd-f0b801f1a9fb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# process events which trigger on new frame.\n",
+    "def process_new_frame( verbose, drop_len, cur_frame, last_frame, storage): \n",
+    "           # drop any which werent active last frame\n",
+    "           new_active = {}\n",
+    "           new_registered = {}\n",
+    "           for clip in storage.active.values():\n",
+    "               if not clip.is_active( cur_frame, drop_len ):\n",
+    "                   if verbose >= logging.INFO:\n",
+    "                       print(f'New Frame: frame {cur_frame}, Dropped {clip}') \n",
+    "               else:   \n",
+    "                   if verbose:\n",
+    "                       print(f\"New Frame: frame {cur_frame}, kept {clip} in active\")\n",
+    "                   new_active[clip.label] = clip\n",
+    "           for clip in storage.registered.values():\n",
+    "               if not clip.is_active( cur_frame, drop_len ):\n",
+    "                   if verbose >= logging.INFO:\n",
+    "                       print(f'New Frame: frame {cur_frame}, Retired {clip}') \n",
+    "                   storage.finished[ clip.as_finished() ] = clip\n",
+    "               else:\n",
+    "                   new_registered[clip.label] = clip\n",
+    "           if verbose >= logging.DEBUG:\n",
+    "               print(f\"Active dict: {storage.active}\")\n",
+    "           storage.registered = new_registered\n",
+    "           storage.active = new_active   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5e1ae077-3233-485a-b5df-7ffabd6cd382",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# process a row in the detections\n",
+    "# YOLOv4 can detect mutitple objects in a frame - this is a single detection in a given frame.\n",
+    "def process_row(verbose, initconf, initlen, dropconf, cur_frame, label, label_confidence, storage):\n",
+    "       if label in storage.active.keys():\n",
+    "           clip = storage.active[label]\n",
+    "           if label_confidence * 100 > initconf:\n",
+    "               clip.add_confidence(label_confidence,cur_frame)\n",
+    "               # total frames doesn't include first frame, so add 1.\n",
+    "               if clip.total_frames +1 >= initlen: \n",
+    "                   if verbose >= logging.INFO:\n",
+    "                       print(f\"At {cur_frame}, moved {clip} to registered\")\n",
+    "                   storage.registered[label] = clip\n",
+    "                   del storage.active[label]\n",
+    "               else:\n",
+    "                   if verbose >= logging.INFO:\n",
+    "                       print(f\"At frame {cur_frame}, saw {clip}\")\n",
+    "           else:   \n",
+    "               if verbose >= logging.INFO:\n",
+    "                   print(f\"{clip} seen at frame {cur_frame}, but confidence [ {label_confidence*100} < { initconf }]\" )\n",
+    "       elif label in storage.registered.keys():\n",
+    "           clip = storage.registered[label]\n",
+    "           # if above confidence for dropping, consider a new registered frame\n",
+    "           if label_confidence * 100 > dropconf:\n",
+    "                # allows frame to miss one and restart; duration calculated from start to current.\n",
+    "                clip.add_confidence(label_confidence,cur_frame)\n",
+    "           else:\n",
+    "               clip.add_missed(label_confidence)\n",
+    "       else:    \n",
+    "           # if label not in active list, nor registered.\n",
+    "           if label_confidence * 100 > initconf:\n",
+    "               clip = Clip( label, cur_frame, label_confidence )\n",
+    "               if verbose >= logging.INFO:\n",
+    "                   print(f\"* Added {clip} to actived\")\n",
+    "               storage.active[label] = clip\n",
+    "        "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ed172bf-51fd-4a70-a81a-c4fe8ee1fa72",
+   "metadata": {},
+   "source": [
+    "Finally, we define our main loop, which simply identifies when we need to run the new_frame function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5b192263-c3d8-4de3-87d2-d3ec26e3ee46",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# main loop over a frame.\n",
+    "def process(args,pf):\n",
+    "    args.verbose = True\n",
+    "    clip_store = ClipStorage()\n",
+    "    last_frame =0\n",
+    "    cur_frame = 0\n",
+    "    for idx,row in pf.iterrows():\n",
+    "        cur_frame = row['frame']\n",
+    "        label = row['label']\n",
+    "        if cur_frame > 155:\n",
+    "            break\n",
+    "        if cur_frame != last_frame:\n",
+    "           if args.verbose >= logging.DEBUG:\n",
+    "               print(f\"Processing switch from {last_frame} to {cur_frame}\")\n",
+    "           process_new_frame(args.verbose,args.droplen, cur_frame, last_frame,clip_store)\n",
+    "\n",
+    "        # all old active and registered are dropped prior to this.\n",
+    "        process_row(args.verbose, args.initconf, args.initlen, args.dropconf,cur_frame,label,row['confidence'],clip_store)\n",
+    "\n",
+    "        last_frame = cur_frame\n",
+    "    # move all registered to finished.\n",
+    "    if args.verbose:\n",
+    "        print(\"Video complete, finishing clips\")\n",
+    "        for clip in clip_store.registered.values():\n",
+    "            clip_store.finished[ clip.as_finished() ] = clip\n",
+    "    return clip_store.finished\n",
+    "       \n",
+    "\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c28a06c6-2d8a-471d-b1f5-de72bb4c7af0",
+   "metadata": {},
+   "source": [
+    "## Run the Clip Processing\n",
+    "Now that we've defined all our functions, we'll run it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "af03f241-edad-4817-b07f-75de6565960d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "norman_finished = process(clip_opts,norman_detects)\n",
+    "#print(norman_finished)\n",
+    "for clip in norman_finished.values():\n",
+    "    print(clip.describe())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51204c52-0821-4469-9614-9400529cf677",
+   "metadata": {},
+   "source": [
+    "We have some results, and we have a dog, a bicycle, a car and a person. Looks promising!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98ed405d-1f6f-4a69-af95-85d3b7bf790a",
+   "metadata": {},
+   "source": [
+    "# Adding the Results to ApertureDB\n",
+    "Now we have some data that we can put into the database.\n",
+    "We'll make some functions to handle the different types of data we're adding.\n",
+    "\n",
+    "* `add_bboxes` - helper function; add all detections from a single frame as bounding boxes\n",
+    "* `add_detections` - add all detections as frames, calls add_bboxes.\n",
+    "* `add_video` - adds the video, then detections."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "abac7dc5-88bd-48d7-8795-1635243bd181",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Add Detections to Database\n",
+    "import uuid\n",
+    "import re\n",
+    "from tqdm import tqdm\n",
+    "\n",
+    "video_url = \"aperturedb://demos/video_clip/video/{0}\"\n",
+    "frame_url = \"video_clips://frame/{0}\"\n",
+    "detection_url = \"video_clips://detection/{0}\"\n",
+    "clip_url = \"video_clips://clip/{0}\"\n",
+    "\n",
+    "def run_query(db, query,blobs,action_desc):\n",
+    "    blobs = [] if blobs is None else blobs\n",
+    "    #print(query)\n",
+    "    result,_ = db.query(query,blobs)\n",
+    "    if not db.last_query_ok():\n",
+    "        raise Exception(f\"Failed Running Query for {action_desc}: {result}\")\n",
+    "    return result\n",
+    "\n",
+    "\n",
+    "def add_bboxes( db,frame_id, detections ):\n",
+    "    add_bboxes_query = [{\n",
+    "        \"FindFrame\": {\n",
+    "            \"_ref\":1,\n",
+    "            \"constraints\": {\n",
+    "                \"id\": [\"==\",str(frame_id)]\n",
+    "            }\n",
+    "        }\n",
+    "        }]\n",
+    "    #print(f\"Detections = {detections}\")\n",
+    "    #split = [det.split(\",\") for det in re.findall(\"\\[([^]]*)\\]\",detections)]\n",
+    "    for row in detections:\n",
+    "        #print(row)\n",
+    "        detection_id = uuid.uuid5( frame_id, detection_url.format( row[\"label\"] ))\n",
+    "        add_bboxes_query.append({\"AddBoundingBox\": {\n",
+    "            \"image_ref\":1,\n",
+    "            \"label\":row[\"label\"],\n",
+    "            \"rectangle\": {\n",
+    "                \"x\": row[\"left\"],\n",
+    "                \"y\": row[\"top\"],\n",
+    "                \"width\": row[\"width\"],\n",
+    "                \"height\": row[\"height\"]\n",
+    "            },\n",
+    "            \"properties\": {\n",
+    "                \"id\": str(detection_id),\n",
+    "                \"confidence\": row[\"confidence\"]\n",
+    "            }\n",
+    "        }})\n",
+    "                \n",
+    "            \n",
+    "    run_query(db, add_bboxes_query,None, \"Adding Boundind Boxeds\")    \n",
+    "    \n",
+    "\n",
+    "def add_detections( db, video_id, detections, frame_width, frame_height ):\n",
+    "    det_df = pd.read_csv( detections )\n",
+    "    det_df.columns = [\"frame\",\"label\",\"confidence\",\"left\",\"top\",\"width\",\"height\" ]\n",
+    "    def format_detections( detections ):\n",
+    "        return \"\".join( f\"[{det.label},{det.confidence},{det.left},{det.top},{det.width},{det.height}]\" for det in detections )\n",
+    "\n",
+    "    frame_number = 0\n",
+    "    frame_detections = []\n",
+    "    def on_end_frame( detections ):\n",
+    "        frame_id = uuid.uuid5( video_id, frame_url.format( frame_number ))\n",
+    "        add_frame_query=[{\n",
+    "            \"FindVideo\": {\n",
+    "                \"constraints\": {\n",
+    "                    \"id\": [\"==\",str(video_id)]\n",
+    "                },\n",
+    "                \"_ref\":1\n",
+    "            }\n",
+    "        },{\n",
+    "            \"AddFrame\": {\n",
+    "                \"video_ref\":1,\n",
+    "                \"frame_number\": frame_number,\n",
+    "                \"properties\": {\n",
+    "                    \"detections\": format_detections( detections ),\n",
+    "                     \"frame_number\":frame_number,\n",
+    "                      \"id\": str(frame_id),\n",
+    "                      \"adb_image_width\":frame_width,\n",
+    "                      \"adb_image_height\":frame_height\n",
+    "                }\n",
+    "            }\n",
+    "        }]\n",
+    "        run_query(db,add_frame_query,None, \"Adding Frame\")\n",
+    "        add_bboxes(db,frame_id,detections)\n",
+    "\n",
+    "    pbar = tqdm(total=det_df.shape[0], desc=\"Inserting Detections\", unit=\" Frames\" )\n",
+    "    for row,data in det_df.iterrows():\n",
+    "        if data['frame'] != frame_number:\n",
+    "            on_end_frame(frame_detections)\n",
+    "            frame_detections = []\n",
+    "        pbar.update()\n",
+    "        frame_number = data['frame']\n",
+    "        frame_detections.append( data )\n",
+    "    # output last frame\n",
+    "    on_end_frame(frame_detections)\n",
+    "\n",
+    "def add_clips(db,video_id,clips):\n",
+    "    for clip in clips:\n",
+    "        add_clip_query=[{\n",
+    "            \"FindVideo\": {\n",
+    "                \"constraints\": {\n",
+    "                    \"id\": [\"==\",str(video_id)]\n",
+    "                },\n",
+    "                \"_ref\":1\n",
+    "            }\n",
+    "        },{\n",
+    "            \"AddClip\": {\n",
+    "                \"video_ref\":1,\n",
+    "                \"frame_number_range\":{\n",
+    "                    \"start\": clip.start_frame,\n",
+    "                    \"stop\": clip.start_frame + clip.total_frames\n",
+    "                },\n",
+    "                \"properties\": {\n",
+    "                    \"label\": clip.label,\n",
+    "                    \"id\": str(uuid.uuid5( video_id, clip_url.format( f\"{clip.label}_{clip.start_frame}\" )))\n",
+    "                },\n",
+    "            }\n",
+    "        }]\n",
+    "        print(run_query(db,add_clip_query,None, \"Add Clip\"))\n",
+    "        print(add_clip_query)\n",
+    "        acq2=[add_clip_query[0]]\n",
+    "        acq2[0][\"FindVideo\"][\"results\"] = {\"count\":True }\n",
+    "        print(run_query(db,acq2,None, \"AC Test\"))\n",
+    "\n",
+    "def add_video( db, video_path, detections_path, video_description ):\n",
+    "    video_id = uuid.uuid5( uuid.NAMESPACE_URL, video_url.format( video_path ))\n",
+    "    add_video_query=[{\n",
+    "        \"AddVideo\": {\n",
+    "            \"properties\": {\n",
+    "                \"source\": video_path,\n",
+    "                \"descrption\": video_description,\n",
+    "                \"id\": str( video_id )\n",
+    "            }\n",
+    "        }\n",
+    "        }]\n",
+    "    fd = open( video_path, 'rb')\n",
+    "    video_data = fd.read()\n",
+    "    fd.close()\n",
+    "\n",
+    "    info_video_query=[{\n",
+    "        \"FindVideo\":{\n",
+    "            \"constraints\": {\n",
+    "                \"id\": [\"==\",str(video_id)]\n",
+    "            },\n",
+    "            \"results\": {\n",
+    "                \"all_properties\":True\n",
+    "            }\n",
+    "        }\n",
+    "    }]\n",
+    "\n",
+    "    run_query(db,add_video_query,[video_data],\"Video Adding\")\n",
+    "    res = run_query(db,info_video_query,None,\"Video Find\")\n",
+    "    video_data= res[0]['FindVideo']['entities'][0]\n",
+    "    width = video_data['_frame_width']\n",
+    "    height = video_data['_frame_height']\n",
+    "    \n",
+    "    add_detections( db, video_id, detections_path, width,height )\n",
+    "    add_clips( db, video_id,  norman_finished.values() )\n",
+    "        "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "94e45efd-1d25-424e-94f4-5fc8fa5f9c87",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    u.remove_all_objects()\n",
+    "    add_video( c, \"norman.mp4\", \"output/norman/detections.csv\" , \"Norman the dog rides a bike with some help\")\n",
+    "    \n",
+    "    u.summary()\n",
+    "except Exception as e:\n",
+    "    print(\"Failed adding: \",e)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9fe349b-c0bd-4d1d-b43c-ad3dd84fc41d",
+   "metadata": {},
+   "source": [
+    "Ok, our data has been added to the database!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40efdaea-a2a8-4d36-85dc-c8d74bcead4a",
+   "metadata": {},
+   "source": [
+    "# Clip Verification\n",
+    "It looks like we found a lot of the things we wanted to find, a dog, a bicycle, and the lady who was helping the dorg.\n",
+    "\n",
+    "It's odd that we don't see the dog until frame 136 though, what gives?\n",
+    "\n",
+    "Also, our output shows it first registering the dog at frame 120, but then losing it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6f1d2a10-557b-4fb2-9f37-a5fa3ca7a79e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# It's odd that the dog is not seen until clip 136, lets see why?\n",
+    "import importlib\n",
+    "import aperturedb\n",
+    "importlib.reload(aperturedb)\n",
+    "from aperturedb.Images import Images\n",
+    "from aperturedb.Images import Frames\n",
+    "from aperturedb.Query import ObjectType\n",
+    "\n",
+    "FRAME_NUMBER=120\n",
+    "VIDEO_FILE=\"norman.mp4\"\n",
+    "import uuid\n",
+    "\n",
+    "def get_frame_id(video,frame):\n",
+    "    video_uuid = uuid.uuid5( uuid.NAMESPACE_URL, video_url.format( video ))\n",
+    "    frame_uuid = uuid.uuid5( video_uuid, frame_url.format( frame ))\n",
+    "    return str(frame_uuid)\n",
+    "\n",
+    "\n",
+    "class Frames2(Images):\n",
+    "    db_object = ObjectType.FRAME\n",
+    "\n",
+    "    def __init__(self, client, batch_size=100, response=None, **kwargs):\n",
+    "        super().__init__(client,batch_size=batch_size, response=response, **kwargs)\n",
+    "\n",
+    "frms=Frames(c)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8ee272a0-2f00-4558-9bac-794bddd09c79",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import importlib\n",
+    "import Ack\n",
+    "importlib.reload(Ack)\n",
+    "from Ack import Frames\n",
+    "frms=Frames(c)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "870d74b2-f16b-47d7-9eff-a1eff9219da3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "q=[{\"FindFrame\": {\n",
+    "    \"_ref\":1,\n",
+    "    \"constraints\": {\n",
+    "        \"id\": [\"==\",fuuid]\n",
+    "    },\n",
+    "    \"results\": {\n",
+    "        \"list\": [\"id\"]\n",
+    "    }\n",
+    "}},{\n",
+    "    \"FindBoundingBox\":{\n",
+    "        \"image_ref\":1,\n",
+    "        \"results\": {\n",
+    "            \"list\":[\"id\"]\n",
+    "        }\n",
+    "    }\n",
+    "}]\n",
+    "\n",
+    "r,_ = c.query(q)\n",
+    "print(r)\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd469e3d-c84b-4ce6-ac53-6e24ff47c2f5",
+   "metadata": {},
+   "source": [
+    "## Issue - dropped dog at 120\n",
+    "\n",
+    "Now that we've defined our code to retrieve a annotated frame from the database, lets explore.\n",
+    "\n",
+    "We saw that dog registered at 120, but then was dropped at 121 .. but wasn't really solid until 136.\n",
+    "\n",
+    "So lets do checks of the images.."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "52f52e53-2c85-48a4-9ff1-4a77e2d6b7e1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fuuid = get_frame_id(VIDEO_FILE,FRAME_NUMBER)\n",
+    "frms.search_by_property(prop_key=\"id\", prop_values=[fuuid])\n",
+    "frms.inspect()\n",
+    "frms.display(show_bboxes=True)\n",
+    "#dir(frms)\n",
+    "#frms.__retrieve_bounding_boxes(0,None)\n",
+    "#frms.rbb(0,None)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0c1acbf7-3427-413b-9f22-c72e0145b37d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(frms.images_bboxes)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c32aaec3-e446-42fa-b6ee-c82fda41e29f",
+   "metadata": {},
+   "source": [
+    "At 120 this is kind of what we would expect to see, lady helping the dog on the bike."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f25919e9-263d-4671-a209-345e07a437d7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fuuid = get_frame_id(VIDEO_FILE,121)\n",
+    "frms.search_by_property(prop_key=\"id\", prop_values=[fuuid])\n",
+    "frms.inspect()\n",
+    "frms.display(show_bboxes=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4a898868-9966-47c0-83d6-1fb3535e6347",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fuuid = get_frame_id(VIDEO_FILE,122)\n",
+    "frms.search_by_property(prop_key=\"id\", prop_values=[fuuid])\n",
+    "frms.inspect()\n",
+    "frms.display(show_bboxes=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a0337323-2a7c-47e2-847d-6a3877bf123b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fuuid = get_frame_id(VIDEO_FILE,123)\n",
+    "frms.search_by_property(prop_key=\"id\", prop_values=[fuuid])\n",
+    "frms.inspect()\n",
+    "frms.display(show_bboxes=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5860c873-947f-4b23-830e-f013515f9d81",
+   "metadata": {},
+   "source": [
+    "## Issue - why is a detected dog not seen?\n",
+    "Now we've seen that there is in fact a detected dog, so maybe confidence is an issue?\n",
+    "So we can check that.\n",
+    "\n",
+    "We'll write a query which will find our target video, find frames in the range we care about from that video\n",
+    "then find the bounding boxes which are connected to the frames, but also meet our critera of being \"dog\".\n",
+    "\n",
+    "We're going to retrieve their confidernce, and retreive ids from frames so we can figure out which is from each frame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "de75fa25-f114-41b3-9b61-d525922f4937",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "video_uuid = str(uuid.uuid5( uuid.NAMESPACE_URL, video_url.format( VIDEO_FILE )))\n",
+    "query=[{\n",
+    "    \"FindVideo\":{\n",
+    "        \"_ref\":1,\n",
+    "        \"constraints\":{\n",
+    "            \"id\": [ \"==\",video_uuid ]\n",
+    "        }\n",
+    "    }\n",
+    "},{\n",
+    "    \"FindFrame\": {\n",
+    "        \"_ref\":2,\n",
+    "        \"uniqueids\":True,\n",
+    "        \"is_connected_to\": {\n",
+    "            \"ref\":1\n",
+    "        },\n",
+    "        \"constraints\": {\n",
+    "            \"_frame_number\": [ \">\", 119 , \"<\",126 ]\n",
+    "        },\n",
+    "        \"results\": {\n",
+    "            \"list\":[\"_frame_number\"]\n",
+    "        }\n",
+    "    }\n",
+    "},{\n",
+    "    \"FindBoundingBox\": {\n",
+    "        \"is_connected_to\": {\n",
+    "            \"ref\":2\n",
+    "        },\n",
+    "        \"group_by_source\":True,\n",
+    "        \"constraints\": {\n",
+    "            \"_label\": [\"==\",\"dog\"]\n",
+    "        },\n",
+    "        \"results\": {\n",
+    "            \"list\":[\"confidence\"]\n",
+    "        }\n",
+    "    }\n",
+    "}]\n",
+    "\n",
+    "r,_=c.query(query)\n",
+    "print(r)\n",
+    "            "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8676500d-cded-4d9f-8a26-76c9b9ab9285",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Now we will map confidence by id.\n",
+    "dog_results=r[2][\"FindBoundingBox\"][\"entities\"]\n",
+    "frame_results=r[1][\"FindFrame\"][\"entities\"]\n",
+    "\n",
+    "frame_map = { fr['_uniqueid']: fr['_frame_number'] for fr in frame_results }\n",
+    "\n",
+    "for dog_detection in  dog_results.keys():\n",
+    "    print(f\"In Frame {frame_map[dog_detection]} dog confidence was {dog_results[dog_detection][0]['confidence']}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "710f58f2-bbc5-4fc2-bcab-dba1bbca01c8",
+   "metadata": {},
+   "source": [
+    "## Verification Solution\n",
+    "Now we've been able to tell with a powerful query that we didn't have a dog confidence high enough to start a clip.\n",
+    "\n",
+    "We can then either decide to fine tune or model - have it train on partially obscured dogs more, lower the threshold, or accept our results.\n",
+    "\n",
+    "But we now know why we got what we did."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/notebooks/video_clips/yolo4.py b/notebooks/video_clips/yolo4.py
new file mode 100755
index 0000000..5ea0f2b
--- /dev/null
+++ b/notebooks/video_clips/yolo4.py
@@ -0,0 +1,201 @@
+import sys
+import cv2
+import argparse
+import random
+import time
+import os
+import os.path as osp
+
+import urllib3
+from pathlib import Path
+import hashlib
+from tqdm import tqdm
+
+# source: https://github.com/kingardor/YOLOv4-OpenCV-CUDA-DNN
+class YOLOv4:
+
+    def __init__(self,args=None):
+        """ Method called when object of this class is created. """
+
+        self.args = None
+        self.net = None
+        self.names = None
+
+        self.args = self.parse_arguments() if args is None else args
+        self.initialize_network()
+        self.run_inference()
+
+    def parse_arguments(self):
+        """ Method to parse arguments using argparser. """
+
+        parser = argparse.ArgumentParser(description='Object Detection using YOLOv4 and OpenCV4')
+        parser.add_argument('--image', type=str, default='', help='Path to use images')
+        parser.add_argument('--stream', type=str, default='', help='Path to use video stream')
+        parser.add_argument('--cfg', type=str, default='models/yolov4.cfg', help='Path to cfg to use')
+        parser.add_argument('--weights', type=str, default='models/yolov4.weights', help='Path to weights to use')
+        parser.add_argument('--namesfile', type=str, default='models/coco.names', help='Path to names to use')
+        parser.add_argument('--input_size', type=int, default=416, help='Input size')
+        parser.add_argument('--use_gpu', default=False, action='store_true', help='To use NVIDIA GPU or not')
+        parser.add_argument('--outdir', type=str, default="video", help='Location to put the output')
+        parser.add_argument('--no-squash-detections', action='store_true', help="Fail if detections ouput exists") 
+
+        return parser.parse_args()
+
+    def initialize_network(self):
+        """ Method to initialize and load the model. """
+
+        self.net = cv2.dnn_DetectionModel(self.args.cfg, self.args.weights)
+        
+        if self.args.use_gpu:
+            self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
+            self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
+        else:
+            self.net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
+            self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
+            
+        if not self.args.input_size % 32 == 0:
+            print('[Error] Invalid input size! Make sure it is a multiple of 32. Exiting..')
+            sys.exit(0)
+        self.net.setInputSize(self.args.input_size, self.args.input_size)
+        self.net.setInputScale(1.0 / 255)
+        self.net.setInputSwapRB(True)
+        with open(self.args.namesfile, 'rt') as f:
+            self.names = f.read().rstrip('\n').split('\n')
+
+        if not osp.exists( self.args.outdir ):
+            os.makedirs( self.args.outdir )
+
+    def image_inf(self):
+        """ Method to run inference on image. """
+
+        frame = cv2.imread(self.args.image)
+
+        timer = time.time()
+        classes, confidences, boxes = self.net.detect(frame, confThreshold=0.1, nmsThreshold=0.4)
+        print('[Info] Time Taken: {}'.format(time.time() - timer), end='\r')
+        
+        if(not len(classes) == 0):
+            for classId, confidence, box in zip(classes.flatten(), confidences.flatten(), boxes):
+                label = '%s: %.2f' % (self.names[classId], confidence)
+                left, top, width, height = box
+                b = random.randint(0, 255)
+                g = random.randint(0, 255)
+                r = random.randint(0, 255)
+                cv2.rectangle(frame, box, color=(b, g, r), thickness=2)
+                cv2.rectangle(frame, (left, top), (left + len(label) * 20, top - 30), (b, g, r), cv2.FILLED)
+                cv2.putText(frame, label, (left, top), cv2.FONT_HERSHEY_COMPLEX, 1, (255 - b, 255 - g, 255 - r), 1, cv2.LINE_AA)
+
+        cv2.imwrite('result.jpg', frame)
+        cv2.imshow('Inference', frame)
+        if cv2.waitKey(0) & 0xFF == ord('q'):
+            return
+
+    def stream_inf(self):
+        """ Method to run inference on a stream. """
+
+        source = cv2.VideoCapture(0 if self.args.stream == 'webcam' else self.args.stream)
+
+        b = random.randint(0, 255)
+        g = random.randint(0, 255)
+        r = random.randint(0, 255)
+
+        i = 0
+        total_start = time.time()
+        detection_file = Path(self.args.outdir).joinpath('detections.csv')
+        if self.args.no_squash_detections and detection_file.exists():
+            print(f"Detections exists ({detection_file}), and arguments request no overwriting")
+            return
+        csv_file = open ( detection_file,"a") 
+        pbar = tqdm( unit=' Frames', desc=f"Detecting" ) 
+        while(source.isOpened()):
+            ret, frame = source.read()
+            if not ret and self.args.stream != 'webcam':
+                break
+            if ret:
+                timer = time.time()
+                classes, confidences, boxes = self.net.detect(frame, confThreshold=0.1, nmsThreshold=0.4)
+                pbar.set_description(desc="Detecting: Last Time = {:.2}s".format(time.time() - timer),refresh=False)
+                pbar.update()
+                
+
+                if(not len(classes) == 0):
+                    for classId, confidence, box in zip(classes.flatten(), confidences.flatten(), boxes):
+                        className = self.names[classId]
+                        left, top, width, height = box
+                        csv_file.write(f"{i},{className},{confidence},{left},{top},{width},{height}\n")
+
+                cv2.imwrite(osp.join( self.args.outdir,'video%d.jpg'%i),frame)
+                i = i + 1
+        total_end = time.time()
+        csv_file.close()
+        print("Done! %d frame%s, %f seconds" % ( i, "s" if i != 1 else "", total_end - total_start))
+
+    def run_inference(self):
+
+        if self.args.image == '' and self.args.stream == '':
+            print('[Error] Please provide a valid path for --image or --stream.')
+            sys.exit(0)
+
+        if not self.args.image == '':
+            self.image_inf()
+
+        elif not self.args.stream == '':
+            self.stream_inf()
+
+        #cv2.destroyAllWindows()
+
+
+
+if __name__== '__main__':
+
+    yolo = YOLOv4.__new__(YOLOv4)
+    yolo.__init__()
+
+
+class RemoteYOLOv4(YOLOv4):
+    root="https://aperturedata-public.s3.us-west-2.amazonaws.com/aperturedb_applications/"
+    files={
+            "coco.names":"634a1132eb33f8091d60f2c346ababe8b905ae08387037aed883953b7329af84",
+            "yolov4-tiny.cfg": "6cbf5ece15235f66112e0bedebb324f37199b31aee385b7e18f0bbfb536b258e",
+            "yolov4-tiny.weights": "cf9fbfd0f6d4869b35762f56100f50ed05268084078805f0e7989efe5bb8ca87",
+            "yolov4.cfg": "a15524ec710005add4eb672140cf15cbfe46dea0561f1aea90cb1140b466073e",
+            "yolov4.weights": "8463fde6ee7130a947a73104ce73c6fa88618a9d9ecd4a65d0b38f07e17ec4e4"
+            }
+    chunk_size = 1024*128
+    output_path="./models"
+
+    def __init__(self,args=None):
+        print("RemoteYOLO v4")
+        self.http = urllib3.PoolManager()
+        for f in self.files.keys():
+            disk_path=Path(self.output_path).joinpath(f)
+            disk_path.absolute().parent.mkdir( exist_ok=True)
+            self.download_file( self.root+f, disk_path, self.files[f])
+        print("All YOLOv4 model files downloaded")
+        super().__init__(args)
+
+    def download_file(self, from_path, to_path, expected_sha256 ):
+        if to_path.exists():
+            with open( to_path,"rb") as hash_file:
+                sha = hashlib.sha256()
+                while True:
+                    data = hash_file.read(1024*64)
+                    if not data:
+                        break
+                    sha.update(data)
+                if sha.hexdigest() == expected_sha256:
+                    return
+                else:
+                    print(f"Digest = {sha.hexdigest()} vs {expected_sha256}: Hash failed, aborting!") # we could delete and redownload.
+                    sys.exit(1)
+
+        req = self.http.request('GET', from_path, preload_content=False)
+        with open( to_path, 'wb') as out:
+            pbar = tqdm( unit='B',unit_scale=True, desc=f"{to_path.name}", total=int( req.headers['Content-Length'] ))
+            while True:
+                data = req.read(self.chunk_size)
+                if not data:
+                    break
+                pbar.update(len(data))
+                out.write(data)
+        req.release_conn()

From c5395e592d8b8eaa0d73be327e94173c655a8ccf Mon Sep 17 00:00:00 2001
From: Drew Ogle <drew@aperturedata.io>
Date: Wed, 20 Nov 2024 23:26:48 -0500
Subject: [PATCH 2/2] remove debugging

---
 notebooks/video_clips/VideoClips.ipynb | 111 +++++--------------------
 1 file changed, 21 insertions(+), 90 deletions(-)

diff --git a/notebooks/video_clips/VideoClips.ipynb b/notebooks/video_clips/VideoClips.ipynb
index 46101e9..c25e7b4 100644
--- a/notebooks/video_clips/VideoClips.ipynb
+++ b/notebooks/video_clips/VideoClips.ipynb
@@ -10,16 +10,6 @@
     "We use detections of labels over sequential frames to generate Clips which describe the existance of those objects within a specific portion of the video."
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "cb687190-a53b-45ca-a58c-3029c88dcecb",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!pip install --upgrade --force-reinstall git+https://github.com/aperture-data/aperturedb-python.git@frames_images_fix"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "8835379a-1c02-44d3-930c-f321279ee0bb",
@@ -36,7 +26,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "#!pip install aperturedb tqdm 2>&1 >/dev/null\n",
+    "!pip install aperturedb tqdm 2>&1 >/dev/null\n",
     "from aperturedb import Utils\n",
     "c = Utils.create_connector()\n",
     "\n",
@@ -94,9 +84,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import importlib\n",
-    "import yolo4\n",
-    "importlib.reload(yolo4)\n",
     "from yolo4 import RemoteYOLOv4\n",
     "# options for detector\n",
     "class DetectorOptions:\n",
@@ -135,7 +122,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "#Now let's check detections\n",
     "import pandas as pd\n",
     "df = pd.read_csv(\"output/norman/detections.csv\")\n",
     "print(df)"
@@ -358,6 +344,7 @@
     "    def add_missed(self,missed_confidence):\n",
     "        self.missed_frames = self.missed_frames + 1\n",
     "\n",
+    "# a simple processing storage\n",
     "class ClipStorage:\n",
     "    def __init__(self):\n",
     "        self.active = {} # clips that have been seen, but not passed initializition count ( suppressed mis-identification )\n",
@@ -579,8 +566,6 @@
     "            }\n",
     "        }\n",
     "        }]\n",
-    "    #print(f\"Detections = {detections}\")\n",
-    "    #split = [det.split(\",\") for det in re.findall(\"\\[([^]]*)\\]\",detections)]\n",
     "    for row in detections:\n",
     "        #print(row)\n",
     "        detection_id = uuid.uuid5( frame_id, detection_url.format( row[\"label\"] ))\n",
@@ -746,7 +731,9 @@
     "\n",
     "It's odd that we don't see the dog until frame 136 though, what gives?\n",
     "\n",
-    "Also, our output shows it first registering the dog at frame 120, but then losing it."
+    "Also, our output shows it first registering the dog at frame 120, but then losing it.\n",
+    "\n",
+    "So we will use aperturedb's `Frames` class to help us debug:"
    ]
   },
   {
@@ -756,13 +743,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# It's odd that the dog is not seen until clip 136, lets see why?\n",
-    "import importlib\n",
-    "import aperturedb\n",
-    "importlib.reload(aperturedb)\n",
-    "from aperturedb.Images import Images\n",
     "from aperturedb.Images import Frames\n",
-    "from aperturedb.Query import ObjectType\n",
     "\n",
     "FRAME_NUMBER=120\n",
     "VIDEO_FILE=\"norman.mp4\"\n",
@@ -773,59 +754,9 @@
     "    frame_uuid = uuid.uuid5( video_uuid, frame_url.format( frame ))\n",
     "    return str(frame_uuid)\n",
     "\n",
-    "\n",
-    "class Frames2(Images):\n",
-    "    db_object = ObjectType.FRAME\n",
-    "\n",
-    "    def __init__(self, client, batch_size=100, response=None, **kwargs):\n",
-    "        super().__init__(client,batch_size=batch_size, response=response, **kwargs)\n",
-    "\n",
     "frms=Frames(c)"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8ee272a0-2f00-4558-9bac-794bddd09c79",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import importlib\n",
-    "import Ack\n",
-    "importlib.reload(Ack)\n",
-    "from Ack import Frames\n",
-    "frms=Frames(c)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "870d74b2-f16b-47d7-9eff-a1eff9219da3",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "q=[{\"FindFrame\": {\n",
-    "    \"_ref\":1,\n",
-    "    \"constraints\": {\n",
-    "        \"id\": [\"==\",fuuid]\n",
-    "    },\n",
-    "    \"results\": {\n",
-    "        \"list\": [\"id\"]\n",
-    "    }\n",
-    "}},{\n",
-    "    \"FindBoundingBox\":{\n",
-    "        \"image_ref\":1,\n",
-    "        \"results\": {\n",
-    "            \"list\":[\"id\"]\n",
-    "        }\n",
-    "    }\n",
-    "}]\n",
-    "\n",
-    "r,_ = c.query(q)\n",
-    "print(r)\n",
-    "    "
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "bd469e3d-c84b-4ce6-ac53-6e24ff47c2f5",
@@ -850,20 +781,7 @@
     "fuuid = get_frame_id(VIDEO_FILE,FRAME_NUMBER)\n",
     "frms.search_by_property(prop_key=\"id\", prop_values=[fuuid])\n",
     "frms.inspect()\n",
-    "frms.display(show_bboxes=True)\n",
-    "#dir(frms)\n",
-    "#frms.__retrieve_bounding_boxes(0,None)\n",
-    "#frms.rbb(0,None)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0c1acbf7-3427-413b-9f22-c72e0145b37d",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print(frms.images_bboxes)"
+    "frms.display(show_bboxes=True)"
    ]
   },
   {
@@ -871,7 +789,9 @@
    "id": "c32aaec3-e446-42fa-b6ee-c82fda41e29f",
    "metadata": {},
    "source": [
-    "At 120 this is kind of what we would expect to see, lady helping the dog on the bike."
+    "At 120 this is kind of what we would expect to see, lady helping the dog on the bike.\n",
+    "\n",
+    "So lets check some more ..."
    ]
   },
   {
@@ -977,6 +897,14 @@
     "            "
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "33deeabb-bd05-48c0-91ee-e7a718f10c75",
+   "metadata": {},
+   "source": [
+    "That output is a bit hard to read, so let's pull out the information from the json in an easy to understand format."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -984,10 +912,13 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Now we will map confidence by id.\n",
+    "# reference the results from the bounding box.\n",
     "dog_results=r[2][\"FindBoundingBox\"][\"entities\"]\n",
+    "# reference the results from the frames.\n",
     "frame_results=r[1][\"FindFrame\"][\"entities\"]\n",
     "\n",
+    "# bounding boxes are referenced to their frame objects by the database unique id, so we'll create a map\n",
+    "# so we can get the frame data for each detection.\n",
     "frame_map = { fr['_uniqueid']: fr['_frame_number'] for fr in frame_results }\n",
     "\n",
     "for dog_detection in  dog_results.keys():\n",