Project 2 Comments

# Documentation
- Well organized.
- What is a **vlink**?  I presume that that's a link to a video recording.  It would make sense to use this part of your documentation to annotate any attributes whose names do not make their meanings obvious.
- Steps and figures ... hmmm ... OK, we probably should have had this conversation earlier, and it's really OK anyway, but ...  We do not really have _steps_ in ECD, we just have _figures_.  A _step_ is an ordinary walking step, often used (actually) with a count to ensure timing with the music, e.g., "Four changes of rights and lefts, four steps per change."  The normal assumption is that you take a _step_ on every beat of the music, if yoiu are moving at all.  _Four changes of rights and lefts_ is an example of a _figure_.  Another is _up a double_ (and variants, e.g., _up a double and back_, _down a double_).  The website [Up a Double has a nice list of figures or figure categories](https://upadouble.info/ecdTutorial.php#Common).  Notice how short the list is!

# E-R Analysis
- I'm confused about the meanings of **step** and **figure** — following on the above remarks.  For you all, a _step_ has a sequence.  I would have said that a dance has a sequence of figures, and that a figure is a sequence of moves or steps.  But, anyway, I'm not sure how you model the sequence, or how you model a **move** of a **figure**.  I don't quite know what those words mean.  (Maybe this information belongs in your documentation.)
- Otherwise, good modeling of relationship sets in particular.

# Schema and SQL
- By the time I got to reviewing this, your **schema.sql** had mysteriously disappeared, so I looked at the one in P3.  This is slightly unfortunate, because I wanted to understand your schema better by seeing the data types, which (understandably) are not present in your textual representation of the schema.
- The **Figure** relation here makes sense, though specifying the **duration** would probably not work.  If the integer type **duration** represents the number of steps or beats, this makes a _back-to-back_ in 8 steps a completely different figure from one in 12 steps, which does not quite agree with most dancers' intuitions.  Moreover, most figures involve a sequence of steps and changes of direction.  For instance, in a 2-couple minor set, _four changes of rights and lefts_, the dancers walk around the square boundary of their set in some direction, depending on their initial position: the first man and second woman go clockwise, the others counter-clockwise.  On each side of the square, the two passing each other clasp right or left hands (generally, right on the first pass, left on the second, etc.).  Each side would require some number of steps, and the number is usually, but not necessarily, the same on each side.  So the duration of the whole figure does not tell us enough information.
- The above remarks suggest that the list of possible figures should be small (see above), and that there be some other entity set, and therefore some other relation, that parametrizes the variant of that figure.
- I still do not know what a step (**Step**) is or why a step would refer back to a particular dance.  I'm baffled.
- The **FigureStep** relation looks interesting, but, again, I don't know what it is supposed to model, or what **place** is exactly.
- It's actually inefficient _for SQLite_  to declare your IDs as **autoincrement**.  If you leave that off, when you declare an attribute to have type **integer** and status as **primary key**, SQLite will automatically treat it as an alias for its own **rowid**, which it always creates and is already automatically incremented.  This is a special feature of SQLite.  Practically everybody made this mistake, and it's by fault.  I myself did not learn about this feature until after the first few weeks of class.  I mentioned it, but probably not enough times.

# Code
You all put some extra effort into the deduplication algorithm, which is ingenious and super-cool.  Now that I've described a bit more of what a figure is, let's consider how else you might have defined the problem and then addressed it.  In this case, you have a fairly small and finite set of figures to start with, and you need to classify a large set of terms for those figures into that small set of classes.  This would actually be easier for AI — it is close to a traditional NLP classification problem, at which an LLM should do quite well.  Or you could use a pre-LLM technology, such as simply minimal edit distance (Levenshtein distance) or K-means clustering or some other proximity measure.  You would want the algorithm to _fail_ where the dissimilarity to any known class is too great, because that would enable you to identify special minority classes.  Then you could add those minority classes to your set of figures.

There is also a potentially interesting problem involved in separating out what I am calling the _figure class_ from the parameters that determine its specific variety.  For instance, there's a _circle left_ and a _circle right_.  Either one can go all the way, half way, or a quarter way.  (Moreover, a _single file_ actually is a circle, but without holding hands.  Should it be classified as a circle, and have hand-holding be a parameter?)  If the parameters are separated out from the figure classes, then how would they be modeled and stored?

But hats off for your experiment in applying AI to this problem!

Great job overall, team!  😊


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project 2 Comments #75

Documentation

E-R Analysis

Schema and SQL

Code

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Project 2 Comments #75

Description

Documentation

E-R Analysis

Schema and SQL

Code

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions