Skip to content

Commit 6a6d654

Browse files
committed
Chore: cleanup
1 parent 50b34ae commit 6a6d654

File tree

1 file changed

+45
-58
lines changed

1 file changed

+45
-58
lines changed

projects/make-a-webcam-controlled-game-with-tensorflowjs/make-a-webcam-controlled-game-with-tensorflowjs.mdx

Lines changed: 45 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,10 @@ By the end of this tutorial, we’ll have a fully functional gesture-controlled
4343
**Note:** The main focus of this tutorial is understanding how hand tracking works with TensorFlow.js, not building a complex game. The game mechanics are kept simple so we can focus on the machine learning concepts.
4444

4545
Let’s start!
46+
4647
## How Does Hand Tracking Work?
47-
Before we jump into code, we need to understand what’s happening behind the scenes when we track hands using TensorFlow.js!
48+
49+
Before we jump into code, we need to understand what’s happening behind the scenes when we track hands using TensorFlow.js!
4850

4951
There are three main concepts I want to talk about: How Machine Learning is different from traditional computer programming, what TensorFlow.js and MediaPipe Hands libraries are, and how the detection process works.
5052

@@ -77,10 +79,9 @@ So, in summary, the idea of programming with machine learning is that the develo
7779

7880
So, instead of the developers writing the rules, the machine writes the logic for us. Super cool, right?
7981

80-
8182
### What is TensorFlow.js?
8283

83-
[TensorFlow.js](https://www.tensorflow.org/js) is a JavaScript library that lets you run machine learning models directly in the browser.
84+
[TensorFlow.js](https://www.tensorflow.org/js) is a JavaScript library that lets you run machine learning models directly in the browser.
8485

8586
This means:
8687

@@ -96,10 +97,11 @@ This means:
9697
<img
9798
src="https://firebasestorage.googleapis.com/v0/b/codedex-io.appspot.com/o/projects%2Fmake-a-webcam-controlled-game-with-tensorflowjs%2Fhand-tracking-3d.gif?alt=media&token=af65e486-8ba1-46d7-b782-30e86cc0d460"
9899
alt="MediaPipe model landmark demo"
100+
style={{ width: "250px" }}
99101
/>
100102

101-
102103
These 21 landmarks include:
104+
103105
- Wrist (landmark 0)
104106
- Thumb joints (landmarks 1-4)
105107
- Index finger joints (landmarks 5-8)
@@ -122,11 +124,11 @@ Here’s what happens every time a frame of the live video is processed (You can
122124
Okay! Now that we have a general idea of what is happening behind the scenes, let’s get onto building it!
123125

124126
## Setup
127+
125128
Let’s begin with the starter code by downloading the project files from this repository: [Air Juggler using TensorFlow.js](https://github.com/Goku-kun/air-juggler-using-tensorflowjs)
126129

127130
The folder should have the following structure:
128131

129-
130132
```
131133
air-juggler-with-tensorflowjs/
132134
├── starter/ # Start here - incomplete code with TODOs
@@ -146,14 +148,12 @@ air-juggler-with-tensorflowjs/
146148

147149
We'll also load TensorFlow.js and MediaPipe Hands from CDNs, so no installation is required!
148150

149-
150151
## Step 1: Add TensorFlow.js Libraries
151152

152153
Open the **starter/index.html** file. You'll notice there's a TODO comment near the bottom where we need to add the TensorFlow.js libraries.
153154

154155
Replace the TODO comment with these script tags:
155156

156-
157157
```html
158158
<!-- Load TensorFlow.js and MediaPipe Hands -->
159159
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
@@ -184,15 +184,16 @@ What each library does:
184184
- Abstracts away the complexity of working directly with tensors.
185185

186186
**Save the file.** You now have TensorFlow.js loaded in your browser!
187+
187188
## Step 2: Implement Loading Progress Indicator
189+
188190
While TensorFlow.js loads (~2-3 seconds first time), we should show a loading indicator.
189191

190192
Open **starter/game.js** and find **Step 2** (bottom of file).
191193

192194
Add this function before the `render()` call:
193195

194196
```javascript
195-
196197
// Check if TensorFlow.js is loaded
197198
function checkTensorFlowLoaded() {
198199
if (typeof tf !== "undefined" && typeof handPoseDetection !== "undefined") {
@@ -212,7 +213,6 @@ How this works:
212213
- If they’re not defined, run the function again in another 100ms.
213214

214215
```javascript
215-
216216
// Start checking once DOM is loaded
217217
if (document.readyState === "loading") {
218218
document.addEventListener("DOMContentLoaded", checkTensorFlowLoaded);
@@ -221,15 +221,13 @@ if (document.readyState === "loading") {
221221
}
222222
```
223223

224-
225224
**How this works:**
226225

227226
Once the HTML has loaded in the page, call the `checkTensorFlowLoaded` function
228227
This function will keep running and continue showing the loading state until both the libraries load.
229228

230229
**Test it:** Open **starter/index.html** in your browser by double clicking on the file. You should see a loading spinner that disappears after 2-3 seconds!
231230

232-
233231
## Step 3: Request Webcam Access
234232

235233
Now we'll access the user's webcam using the `getUserMedia` API.
@@ -238,11 +236,10 @@ Open **starter/handTracking.js** and find **Step 3a and Step 3b** in the `setupH
238236

239237
Add this code:
240238

241-
242239
```javascript
243240
// Request webcam access using getUserMedia
244241
const stream = await navigator.mediaDevices.getUserMedia({
245-
video: { width: 640, height: 480 }
242+
video: { width: 640, height: 480 },
246243
});
247244
```
248245

@@ -278,19 +275,19 @@ Now we'll load the pre-trained MediaPipe Hands model. This model was trained by
278275

279276
At **Step 4**, configure and load the MediaPipe Hands model:
280277

281-
282278
```javascript
283279
// Load MediaPipe Hands model
284280
const model = window.handPoseDetection.SupportedModels.MediaPipeHands;
285281
const detectorConfig = {
286-
runtime: 'mediapipe',
287-
solutionPath: 'https://cdn.jsdelivr.net/npm/@mediapipe/hands',
288-
maxHands: 2,
289-
modelType: 'full'
282+
runtime: "mediapipe",
283+
solutionPath: "https://cdn.jsdelivr.net/npm/@mediapipe/hands",
284+
maxHands: 2,
285+
modelType: "full",
290286
};
291287
```
292288

293289
**Configuration explained:**
290+
294291
- `runtime: 'mediapipe'` - Use MediaPipe instead of existing TensorFlow model because it is more efficient
295292
- `solutionPath` - CDN URL where model files are hosted
296293
- `maxHands: 2` - Detect up to 2 hands simultaneously
@@ -319,7 +316,6 @@ In **starter/handTracking.js**, find **Step 5** in the `detectHands` function.
319316

320317
Add this code:
321318

322-
323319
```javascript
324320
// Run hand detection on current video frame
325321
const hands = await detector.estimateHands(video);
@@ -333,7 +329,6 @@ const hands = await detector.estimateHands(video);
333329
- Each hand object contains landmark data information.
334330
- It looks like this object:
335331

336-
337332
```javascript
338333
[
339334
{
@@ -342,33 +337,30 @@ const hands = await detector.estimateHands(video);
342337
{ x: 125.1, y: 235.7, z: -4.8, name: "index_finger_mcp" },
343338
// ... 19 more keypoints
344339
],
345-
handedness: "Left", // or "Right"
346-
score: 0.98 // Confidence score (0-1)
347-
}
348-
]
340+
handedness: "Left", // or "Right"
341+
score: 0.98, // Confidence score (0-1)
342+
},
343+
];
349344
```
350345

351-
352346
## Step 6: Transform Hand Landmarks to Coordinates
353347

354348
The model gives us 21 detailed keypoints, but our game only needs one position per hand—the palm center. Let's transform this complex data into simple (x, y) coordinates.
355349

356350
At **Step 6a**, process the detected landmarks:
357351

358-
359352
```javascript
360-
361353
// Transform hand landmarks to canvas coordinates
362-
const handPositions = hands.map(hand => {
363-
// Get palm center (average of wrist and thumb 4 points)
364-
const palmBase = [0, 5, 9, 13, 17].map(i => hand.keypoints[i]);
365-
const avgX = palmBase.reduce((sum, kp) => sum + kp.x, 0) / palmBase.length;
366-
const avgY = palmBase.reduce((sum, kp) => sum + kp.y, 0) / palmBase.length;
367-
368-
return {
369-
x: 640 - avgX, // Mirror x coordinate; 640 is our video resolution width dimension when we initialize the video feed
370-
y: avgY
371-
};
354+
const handPositions = hands.map((hand) => {
355+
// Get palm center (average of wrist and thumb 4 points)
356+
const palmBase = [0, 5, 9, 13, 17].map((i) => hand.keypoints[i]);
357+
const avgX = palmBase.reduce((sum, kp) => sum + kp.x, 0) / palmBase.length;
358+
const avgY = palmBase.reduce((sum, kp) => sum + kp.y, 0) / palmBase.length;
359+
360+
return {
361+
x: 640 - avgX, // Mirror x coordinate; 640 is our video resolution width dimension when we initialize the video feed
362+
y: avgY,
363+
};
372364
});
373365
```
374366

@@ -377,7 +369,6 @@ const handPositions = hands.map(hand => {
377369
alt="Hand landmark information with indexing"
378370
/>
379371

380-
381372
**Why these calculations?**
382373

383374
- We use the 5 keypoints (wrist + base of index, middle, ring and pinky fingers) to find the approximate palm center. Look at the points: 0, 5, 9, 13 and 17 in the image above.
@@ -389,17 +380,18 @@ At **Step 6b**, call the function:
389380
```javascript
390381
// Call sendHandsCallback with new hand positions
391382
if (sendHandsCallback) {
392-
sendHandsCallback(handPositions);
383+
sendHandsCallback(handPositions);
393384
}
394385
```
395386

396387
**Data Flow:**
397-
1. It is important to understand how the whole process is working together, i.e. where the data originates, how its passed around, transforming it and getting us our hand information.
398-
2. Video frame made available from video gets passed to the model
388+
389+
1. It is important to understand how the whole process is working together, i.e. where the data originates, how its passed around, transforming it and getting us our hand information.
390+
2. Video frame made available from video gets passed to the model
399391
3. Model generates and outputs 21 landmarks per hand
400392
4. Landmarks are then used for the palm center calculation
401393
5. Palm center calculation are then used to get the mirrored coordinates
402-
6. These coordinates are passed to the sendHandsCallback which in turn updates the game state.
394+
6. These coordinates are passed to the sendHandsCallback which in turn updates the game state.
403395

404396
**The callback pattern:** Every 30 FPS, we call the `sendHandsCallback` function with new hand positions, updating the game in real-time!
405397

@@ -411,7 +403,6 @@ Open **starter/game.js** and find **Step 10** in the `startGame` function.
411403

412404
Add this code:
413405

414-
415406
```javascript
416407
// Initialize hand tracking if not already done
417408
if (!window.handTrackingInitialized) {
@@ -427,7 +418,7 @@ if (!window.handTrackingInitialized) {
427418
const success = await window.handTracking.setupHandTracking(
428419
webcam,
429420
function receiveHands(hands) {
430-
gameState.hands = hands; // Update game state with detected hands
421+
gameState.hands = hands; // Update game state with detected hands
431422
},
432423
);
433424

@@ -445,7 +436,6 @@ if (!window.handTrackingInitialized) {
445436
}
446437
```
447438

448-
449439
That was a lot of code! Let’s break it down and understand what’s going on here:
450440

451441
### Understanding the Integration
@@ -476,15 +466,16 @@ loadingStatus.textContent = "Loading MediaPipe Hands model...";
476466
```javascript
477467
const success = await window.handTracking.setupHandTracking(
478468
webcam,
479-
function receiveHands (hands) {
480-
gameState.hands = hands; // This is the magic line!
469+
function receiveHands(hands) {
470+
gameState.hands = hands; // This is the magic line!
481471
// it receives updated hand positions for the game loop to use
482472
// [{x: 320, y: 240}, {x: 400, y: 300}] which is an array
483-
}
473+
},
484474
);
485475
```
486476

487477
// inside the handTracking.js file
478+
488479
```javascript
489480
async function setupHandTracking(videoElement, sendHands) {
490481
sendHandsCallback = sendHands // this function will be used to send updated hands position which will be received in the game.js file via the receiveHands function
@@ -510,11 +501,10 @@ if (!success) {
510501
- If camera permission is denied, setup returns `false`.
511502
- Shows friendly error message.
512503
- Prevent the game from starting without hand tracking.
513-
-
514-
<img
515-
src="https://firebasestorage.googleapis.com/v0/b/codedex-io.appspot.com/o/projects%2Fmake-a-webcam-controlled-game-with-tensorflowjs%2Fdemo.gif?alt=media&token=e72eaaa2-588a-45f3-aff2-63f7e6cda54a"
516-
alt="Final Project Output"
517-
/>
504+
- <img
505+
src="https://firebasestorage.googleapis.com/v0/b/codedex-io.appspot.com/o/projects%2Fmake-a-webcam-controlled-game-with-tensorflowjs%2Fdemo.gif?alt=media&token=e72eaaa2-588a-45f3-aff2-63f7e6cda54a"
506+
alt="Final Project Output"
507+
/>
518508
519509
### How the Game Works (Optional)
520510
@@ -534,11 +524,12 @@ From the overall perspective of steps, this is what happens when the game is pla
534524
2. Game calls the `setupHandTracking()` function.
535525
3. Hand tracking initializes camera + model.
536526
4. Hand detection loop starts (30 FPS).
537-
5. Every frame: model detects hands, then calls the receiveHands callback, which in turn updates the `gameState.hands`.
527+
5. Every frame: model detects hands, then calls the receiveHands callback, which in turn updates the `gameState.hands`.
538528
6. Game loop (60 FPS) uses `gameState.hands` for rendering and collision
539529
7. Result: Smooth, responsive hand-controlled gameplay!
540530
541531
## Conclusion
532+
542533
Congratulations! You've built a gesture-controlled game using TensorFlow.js! 🎉
543534
544535
Let's recap what we learned:
@@ -554,7 +545,6 @@ Let's recap what we learned:
554545
- Drawing video to HTML5 Canvas
555546
- Extracting meaningful data from ML model outputs
556547
557-
558548
Now that you understand hand tracking, you can build even more exciting projects such as:
559549
560550
- **Gesture Recognition** - Detect specific hand gestures (peace sign, thumbs up, etc.).
@@ -569,6 +559,3 @@ Now that you understand hand tracking, you can build even more exciting projects
569559
- [WebRTC getUserMedia](https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia)
570560
571561
Share your creations on Twitter and tag [@gokukun_io](https://twitter.com/gokukun_io) [@codedex_io](https://twitter.com/codedex_io)!
572-
573-
574-

0 commit comments

Comments
 (0)