Structured Typesetting (STS) generation by tirix · Pull Request #9 · metamath/metamath-exe

tirix · 2021-12-24T06:54:43Z

I've tried to make as few changes as possible. The changes are in the following source files:

metamath.c for the changes to the SHOW STATEMENT command and the new VERIFY STS command, as well as the output post-processing,
mmcmdl.c for the new HELP command options, and the new commands,
mmcmds.c for the changes to the SHOW STATEMENT command at top level (distinct and dummy variables, syntax hints),
mmdata.c for some utility functions,
mmhlpa.c and mmhlpb.c for the new built-in HELP options,
mmhtbl.c, a new file with a hash table implementation,
mmwsts.c, a new file with the main STS implementation,
mmwtex.c, for the main hook into the STS formula output and for in-line comment formulas.

This also adds a .gitignore file to ignore object files.

wlammen

Why did you add the makefile? The readme instructions note several ways to compile metamath, including gcc m*.c -o metamath, and using automake.

wlammen

metamath.c line 3032 duplicated code from line 2393

wlammen

style, readability command line option STS: Norm used always verbose forms, consider expanding it to something like STRUCTURED_TYPESETTING or so

wlammen

built-in help does not cover new option

tirix · 2021-12-25T13:52:31Z

Thank you Wolf for your review!

built-in help does not cover new option

It actually does, see changes in mmhlpa.c and mmhlpb.c.

style, readability command line option STS: Norm used always verbose forms, consider expanding it to something like STRUCTURED_TYPESETTING or so

That's very verbose, though. Maybe just "STRUCTURED" ?

Why did you add the makefile? The readme instructions note several ways to compile metamath, including gcc m*.c -o metamath, and using automake.

Sure, I can remove the makefile.

benjub · 2021-12-25T15:16:53Z

How do you render this "structured typesetting"? Is this MathML, MathJax or something like this ?

For texts, and in particular help texts, Norm used the double-space convention between sentences, and I admit I like that.

tirix · 2021-12-25T16:02:44Z

How do you render this "structured typesetting"? Is this MathML, MathJax or something like this ?

Yes, the set-mathml.mmts file contains instructions about how to display set.mm formulas as MathML. Then, the MathML result is sent to MathJax for rendering.
The actual MathJax command used is included in that file in the $c ... $. instruction towards the end of the file. That instruction is read and executed by the metamath-exe program, with the input file generated send it its standard input.

For texts, and in particular help texts, Norm used the double-space convention between sentences, and I admit I like that.

Your wish has come true with the last commit!

benjub · 2021-12-25T18:00:08Z

mmhlpb.c

+H("Syntax:  VERIFY STS <format>");
+H("");
+H("This command error-checks that the STS rules definition covers all syntax");
+H("defined in the Metamath source file loaded. It runs through all non");


Double space here ! And I would write "non-definitional" with a hyphen, or even "nondefinitional", since "non" is not a word.

benjub · 2021-12-25T18:03:38Z

Thanks. So maybe you can use the option "/ MATHML" ? I'm fine with "/ STRUCTURED" too. I cannot approve this PR since I am not fluent in C :-(

wlammen · 2021-12-25T18:39:37Z

The option STS or /STRUCTURED is the second alternative to /HTML and /ALT_HTML. Are you able to roughly tell in what way these options differ by just looking at the name? If not, people will have to guess at what the result of invocations are. Do you expect this option to be typed by users of metamath directly? Or will it be likely be part of a script rarely changing? If you type the option frequently, a short name is handy, otherwise even a very long option name will hardly annoy anyone, but is easily understandable to casual readers of the script.
If I understand your code right, embedded snippets are translated via a sort of grammar file into true HTML. This process is then best expressed in your option tag.

I am still busy with Xmas, have just skimmed your PR. I currently cannot look into this further.

tirix · 2021-12-25T18:41:19Z

Thanks. So maybe you can use the option "/ MATHML" ? I'm fine with "/ STRUCTURED" too.

I wanted to keep it generic because one could generate anything with it, not just MathML.
The set-mathml.mmts just happens to contain rules to generate MathML, the Metamath.exe functionality is completely unaware of what it is (actually, I had written another file to generate LaTeX with the same method).

tirix · 2021-12-25T18:55:56Z

The option STS or /STRUCTURED is the second alternative to /HTML and /ALT_HTML.

Yes, and it is itself followed by the name of the production to use. A command would typically be:
SHOW STATEMENT syl / STS mathml
which instructs the program to read and execute the mathml instruction file (set-mathml.mmts).

If you had, say, a graphviz-dot instruction file, the command:
SHOW STATEMENT syl / STS graphviz-dot
would search a set-graphviz-dot.mmts file, and presumably output some graphs instead of formulas (that might be an interesting experiment!).

If I understand your code right, embedded snippets are translated via a sort of grammar file into true HTML. This process is then best expressed in your option tag.

Yes, that's roughly the process. So if I follow you, we could use e.g. /STS_HTML as an option tag?

wlammen · 2021-12-25T19:20:18Z

What about EXPAND_HTML? Or HTML_EXPAND?
Note that structured is an adjective (usually used to describe a state/property) while expand is a verb (describing a process/activity). If you prefer an adjective here, consider (EMBEDDED_)ENCODING.
I wonder whether the proposed technique is limited to typesetting. If not, the suggestions here are generic enough to even cover exotic applications.

benjub · 2021-12-25T22:01:03Z

When running metamath, it is enough to type any non-ambiguous prefix, so here, MM> show statement syl /S mathml would suffice. In other words, there is no big inconvenience in having a verbose option. As for option names, I always thought the other two (HTML, ALT_HTML) were not optimal. I think more explicit choices would be /UNICODE and /GIF.

wlammen

.gitignore: OK, see here (https://stackoverflow.com/questions/6626136/best-practice-for-adding-gitignore-to-repo)
Maybe one should put this info into the README?

wlammen

metamath.c line 60 suggest a version number and add a changelog entry.

wlammen

naming consistency mmwtex.h line 33: stsFlag (and possibly following) should be prefixed with g_ (g_stsFlag, matching g_altHtmlFlag for example), see changelog 0.187 metamath.c, line 88

wlammen

optimization metamath.c line 2381: The function switchPos("/ STS") is called three times more or less in succession. Consider evaluating the result once and use a local variable to recall the result within the if conditions.

wlammen

parsetSTSRules type: why parset...? Looks like a typo

wlammen

meamath.c line 2977: Suspicious value of i. The other options / TIME or / NO_VERSIONING add 2 to the number of options, seemingly counting the slash and the tag as different entries. Why is this not done for / STS? If the count is correct, IMO a comment should clarify the underlying logic.

to be continued...

tirix · 2021-12-26T14:42:43Z

@wlammen I'm going to copy your remarks directly into a code review, it creates a sub-thread per remark and I think it's much easier to follow like that.

tirix · 2021-12-26T15:22:43Z

metamath.c line 60 suggest a version number and add a changelog entry

Ok, I've updated the history and proposed a version 0.199.
This kind of history entry can only really be finalized at the merge time (imagine another PR comes before) and typically will create merge conflicts, though.

tirix · 2021-12-26T14:44:55Z

.gitignore

@@ -0,0 +1,2 @@
+*.o
+metamath


@wlammen writes:

OK, see here (https://stackoverflow.com/questions/6626136/best-practice-for-adding-gitignore-to-repo)
Maybe one should put this info into the README?

I'm not sure which info you would like to add to the README.
There is a standard .gitignore file for C projects here, we could give it a try.
Maybe better in a separate PR?

The reason why I added this comment is, because the metamath sources are distributed via a tar archive as well. Obviously you wont need the gitignore in this case. I think, some kind of distribution how-to is finally called for. That are my thoughts. I was curious why you added the file. Separate PR is fine with me.

tirix · 2021-12-26T14:49:03Z

mmwtex.h

+/* 19-Jul-2017 tar Added for STS/MathML output */
+extern flag stsFlag; /* STS output (for "structural typesetting") */
+extern vstring stsOutput; /* output mode chosen for STS (follows STS flag) */
+extern vstring postProcess; /* command to pipe the output into (used for MathJax prerendering) */


@wlammen wrote:

stsFlag (and possibly following) should be prefixed with g_ (g_stsFlag, matching g_altHtmlFlag for example), see changelog 0.187 metamath.c, line 88

Yes, this global variable naming convention has been introduced after I programmed the STS module. As an improvement, I could retrofit this to follow it too.

These are the current rules.

tirix · 2021-12-26T14:55:10Z

metamath.c

+        if (switchPos("/ ALT_HTML") != 0 || switchPos("/ STS") != 0 ) {
+          print2("?Please specify only one of / HTML , / ALT_HTML and / STS.\n");


@wlammen wrote:

optimization: The function switchPos("/ STS") is called three times more or less in succession. Consider evaluating the result once and use a local variable to recall the result within the if conditions.

I don't think this is called 3 times. There are 3 else..if branches, and it is called one time in each, to ensure only one HTML output formatting option is chosen.

You're right. It still looks awkward since the call and the parameter is coded several times.

tirix · 2021-12-26T14:56:15Z

mmwsts.h

+#include "mmdata.h"
+
+/* Parse an STS file */
+int parsetSTSRules(vstring format);


@wlammen wrote:

parsetSTSRules type: why parset...? Looks like a typo

Yes, it's a typo. I'll fix that!

tirix · 2021-12-26T15:06:38Z

metamath.c

+      /* 7-Jul-2017 added MathML/STS */
+      if (switchPos("/ STS")) i = i + 1;


@wlammen wrote:

Suspicious value of i. The other options / TIME or / NO_VERSIONING add 2 to the number of options, seemingly counting the slash and the tag as different entries. Why is this not done for / STS? If the count is correct, IMO a comment should clarify the underlying logic.

Interesting. So, a typical SHOW STATEMENT command looks like this:
SHOW STATEMENT syl / HTML
That would be g_rawArgs = 5, which already includes 2 arguments for the / HTML.
As you correctly guessed, for / NO_VERSIONING and / TIME, there are two more arguments, therefore the +2.
In the case of STS, the / STS would already be accounted for in the 5 (taking the place of the / HTML), so that's not why the +1 is for. Rather, in the case of STS, there is one more argument, namely the name of the output processing, e.g. mathml. That is what the +1 is for.

I'll add a comment to make that clearer.

I am subject to a learning curve as well while reviewing, wrt both to your source code and the review style.

wlammen · 2021-12-27T15:57:43Z

mmcmdl.c

+            if (lastArgMatches("STS")) {
+              i++;
+              if (strlen(stsOutput)) {
+              if (!getFullArg(i,cat("* Using which output mode <",stsOutput,">? ",NULL)))


Incorrect indentation

wlammen · 2021-12-27T16:42:09Z

mmcmdl.c

+          print2("?No source file has been read in.  Use READ first.\n");
+          goto pclbad;
+        }
+        if (strlen(stsOutput)) {


Duplicated code from line 586 (with i == 2)

wlammen · 2021-12-27T16:43:32Z

mmcmds.c

            let(&str2, "");
-            str2 = tokenToTex(g_MathToken[nmbrTmpPtr2[k]].tokenName, showStmt);
+            /* 27 Jul 2017 tar For MathML/STS */
+            if(stsFlag) str2 = stsToken(nmbrTmpPtr2[k], showStmt);


Duplicated code from line 357

wlammen · 2021-12-27T16:45:05Z

mmcmds.c

              let(&str2, "");
-              str2 = tokenToTex(g_MathToken[nmbrTmpPtr2[k]].tokenName, showStmt);
+              /* 27 Jul 2017 tar For MathML/STS */
+              if(stsFlag) str2 = stsToken(nmbrTmpPtr2[k], showStmt);


Duplicated code from line 377

wlammen · 2021-12-27T16:58:38Z

mmdata.c

+  long i;
+  long hash = 0;
+  i = -1;
+  while (i < 13 && s[i] != -1) {


Suspicious i in first loop: -1, which is at least out of bounds for salt, likely for s, too.

I'll answer this one now as it is interesting, useful for the rest of the review... and definitely deserves more information in the comments!

Actually, indices -1, -2 and -3 are valid and used in nmbrString:

Index -1 is the length of the number string. See nmbrLen (mmdata.c line 1023)

Index -2 is the allocated length, i.e. how many numbers are available totally (could be more than the actual current length of the string). See nmbrAllocLen (mmdata.c line 1032)

Index -3 is the location in memUsedPool (for memory management)

In this specific case, I wanted to include the length of the string in the hash, which I think makes sense but shall have been explained.

In any case, you clearly spotted a mistake here because in the salt, index -1 is clearly invalid!

Any use of negative indices should be wrapped behind a macro to provide more safety and clarity. (That is, there should be a macro like #define nmbrLen(p) p[-1] where p has type nmbrString which is a typedef for int* or what have you.

wlammen · 2021-12-27T17:00:45Z

mmdata.c

+  static long salt[] = { 4938, 48977, 6897, 7293, 2663, 7925, 2999, 12238,
+      40033, 14038, 10699, 29746, 56108, 34526, 63576, 52053, 61949, 41177, 43740, 22822
+ };
+  long i;


Declaration is Initialization rule (section 4.5 in https://docs.oracle.com/cd/E17984_01/doc.898/e14699/variables_data_structs.htm) Initialize with value from line 1113

wlammen · 2021-12-27T17:03:06Z

mmdata.c

+/* This simply computes a XOR of the first numbers */
+int nmbrHash(nmbrString *s)
+{
+  static long salt[] = { 4938, 48977, 6897, 7293, 2663, 7925, 2999, 12238,


Only 13 (?) values actually needed. Remove/comment out unneeded ones.

Where is the hash algorithm explained? Add a comment/link.

to be continued...

wlammen · 2021-12-28T06:02:11Z

mmdata.c

+  long i;
+  if (sstart - 1 + len > nmbrLen(s)
+   || tstart - 1 + len > nmbrLen(t)) return 0;
+  for (i = 0; s[sstart-1+i] == t[tstart-1+i] && i<len; i++);


Suspicious index in first loop: -1, if sstart == 0. (1) Out of bounds access to s and t, or (2) Parameters sstart and tstart must be > 0.

Either provide parameter checks (cf. line 1211), or (minimum) state limitations in comment line 1122

wlammen · 2021-12-28T06:13:44Z

mmdata.c

+long nmbrInstrN(long start_position, long occ, nmbrString *string1,
+  nmbrString *string2, long start2, long length2)
+{
+  if (start_position < 1) start_position = 1;


This means: If garbage is provided in parameter start_position, then I sanitize it to something more useful, hoping for the best. This kind of fault tolerance supports a sloppy programming on the caller's side. Better throw an exception, or call bug()

wlammen · 2021-12-28T06:18:15Z

mmdata.c

  return (sout);
 }

+/* Search for the nth occurrence of string2 in string1 */


Explain parameters (semantics and limitations) in a comment. For example, what does occ mean? n would match the functions name.

wlammen · 2021-12-28T06:25:25Z

mmdata.c

+  start_position--;
+  for(; occ > 0;occ--) {
+    long ls1, i, j;
+    ls1 = nmbrLen(string1);


Pull constant evaluations out of the loop.

wlammen · 2021-12-28T06:29:22Z

mmdata.c

+  if (start_position < 1) start_position = 1;
+  start_position--;
+  for(; occ > 0;occ--) {
+    long ls1, i, j;


declare i and j in for commands

wlammen · 2021-12-28T06:30:35Z

mmdata.c

+        }
+      }
+      if (found) {
+	start_position = i+1;


incorrect indentation

wlammen · 2021-12-28T06:32:01Z

mmdata.c

+    for (i = start_position - 1; i <= ls1 - length2; i++) {
+      flag found = 1;
+      for (j = 0; j < length2; j++) {
+        if (string1[i+j] != string2[start2-1+j]) {


This condition is usually part of the for-command.

wlammen · 2021-12-28T06:35:21Z

mmdata.c

+/* Add a single number to start of a nmbrString - faster than nmbrCat */
+nmbrString *nmbrUnshiftElement(nmbrString *g, long element)
+{
+  long length;


declaration should be initialization

wlammen · 2021-12-28T06:39:22Z

mmdata.c

 }


+/* Add a single number to start of a nmbrString - faster than nmbrCat */


The return value is pushed on an internal variable stack with an implicit memory management. This important detail is not mentioned in the comment.

wlammen · 2021-12-28T06:41:45Z

mmdata.h

 long nmbrAllocLen(nmbrString *s);
 void nmbrZapLen(nmbrString *s, long length);

+/* Search for the nth occurrence of string2 in string1 */


Should explain parameters and specify their limitations.

wlammen · 2021-12-28T09:30:13Z

mmhtbl.c

+#define NO_LINKEDITEM -1
+linked *linkedItems;
+int free_linkedItem;
+flag htinit_done = 0;


This is private to htinit() so declare it there as a static variable

wlammen · 2021-12-28T09:33:44Z

mmhtbl.c

+/* Static buffer for the linked lists */
+#define NB_LINKEDITEMS 50000L
+#define NO_LINKEDITEM -1
+linked *linkedItems;


It is safer to initialize this to NULL and 0

wlammen · 2021-12-28T09:47:57Z

mmhtbl.c

+
+  /* Create and fill the structure */
+  hashtable ht;
+  ht.name = name;


The problem here is that you copy a pointer, not its value, from a parameter to the result. The caller handles the parameter's contents with a series of let() instructions, finally freeing it. All these operations are done without ht.name in mind. This opens up all kinds of memory allocation/access failures. Even if you know, this won't happen in the foreseeable future, you need a guarantee, or a contract, here to decouple caller and callee.

wlammen · 2021-12-28T10:13:09Z

mmhtbl.c

+      /* Found it, free the object and remove it from the chain */
+      hashtable->freeFunc(&linkedItems[*pli].key, &linkedItems[*pli].object);
+      int old_free = free_linkedItem;
+      free_linkedItem = *pli;


DRY duplicated code from line 76

to be continued...

tirix · 2021-12-30T05:47:54Z

@wlammen thank you very much for your careful review!
I think you are not quite done yet: please give me a heads up when you are, and I will try to address all remarks at once.

wlammen · 2021-12-30T12:27:25Z

New Year's Eve is closing in. I need some time for it.

wlammen · 2022-01-02T13:29:22Z

mmwsts.c

+/* The structure containing information about the STS variable tokens */
+struct stsVar_struct {
+  long stsType; /* type of the token in the STS (must be a constant tokenId) */
+  long stsSchemeId; /* number of the schemed in which this variable is defined + 1. */


"schemed" typo

wlammen · 2022-01-02T13:33:40Z

mmwsts.c

+};
+
+/* Current output format for STS */
+vstring stsFormat = "";


g_ prefix missing from global variables (here and elsewhere in this file)

wlammen · 2022-01-02T13:36:51Z

mmwsts.c

+/* Math symbol comparison for bsearch */
+/* Here, key is pointer to a character string. */
+/* Here we search only the global tokens, those
+ * which endStatement is the last statement */


English: where endStatement...

wlammen · 2022-01-02T13:39:46Z

mmwsts.c

+/* Here, key is pointer to a character string. */
+/* Here we search only the global tokens, those
+ * which endStatement is the last statement */
+int mathSrchGlbCmp(const void *key, const void *data)


is there a reason to avoid specific C types in parameter declarations? And so dodge C type checking?
key: char const *
data: mathToken_struct const *

wlammen · 2022-01-02T13:52:22Z

mmwsts.c

+  if(g_MathToken[ *((long *)data) ].endStatement == g_statements) return 0;
+
+  /* Find the direction in which the target token is */
+  for(long *ptr = (long*)data; !strcmp(key, g_MathToken[ *ptr ].tokenName); ptr++)


Optimization: The first loop seems to check element data again, and the outcome is known to be 0. So I suggest to initialize with long* ptr = (long*) data + 1;

wlammen · 2022-01-02T14:24:47Z

mmwsts.c

+/* Cache to speed up conversions */
+hashtable stsCache;
+
+/* Math symbol comparison for bsearch */


The comment should explain the result, that is limited to -1, 0 and +1, and what is delivered when.
It seems possible that the token list contains the same key multiple times in succession. Since this is part of a binary search, data may point somewhere in the middle of such a series. Is it guaranteed to hold just one global element (there is a suspicious active flag in the structure, that may allow a disabled and enabled element in the same series)? Is it guaranteed there is always a global element present? If either assumption is missed, the binary search may fail.

Clarify the so-called pre-/postconditions in a comment.

wlammen · 2022-01-02T17:19:19Z

mmwsts.c

+  long i;
+  char *fbPtr;
+  long textLen, tokenLen_;
+  long *g_mathKeyPtr; /* bsearch returned value */


incorrect use of prefix g_

wlammen · 2022-01-02T17:26:08Z

mmwsts.c

+  /* Make sure that g_mathTokens has been initialized */
+  if (!g_mathTokens) bug(1717);
+
+  textLen = (long)strlen(text);


definition is initialization long textLen = (long) strlen(text); same for wrklen etc.

wlammen · 2022-01-02T17:29:57Z

mmwsts.c

+#include "mmvstr.h"
+#include "mmdata.h"
+#include "mminou.h"
+#include "mmpars.h" /* For rawSourceError and mathSrchCmp and lookupLabel */


and whiteSpaceLen

wlammen · 2022-01-02T17:54:47Z

mmwsts.c

+    wrkNmbrPtr[mathStringLen] = *g_mathKeyPtr;
+    mathStringLen++;
+    fbPtr = fbPtr + tokenLen_ + 1; /* Move on to next token */
+    if(fbPtr >= text + textLen) break;


This should be the while condition

wlammen · 2022-01-02T17:57:12Z

mmwsts.c

+      return NULL_NMBRSTRING;
+    }
+    wrkNmbrPtr[mathStringLen] = *g_mathKeyPtr;
+    mathStringLen++;


tip: wrkNmbrPtr[mathStringLen++] = ... saves the following line and can reduce code size
same for fbPtr += tokenLen_ + 1;

wlammen · 2022-01-02T18:10:30Z

mmwsts.c

+  return mathString;
+}
+
+/* Store a couple key/object into the cache */


Document Pre/Postconditions

wlammen · 2022-01-03T12:17:56Z

mmwsts.c

+}
+
+/* Dump a couple key/object from the cache */
+void stsDumpCache(nmbrString *key, vstring object) {


suspicious cast to eqFunc in line 235: The signature of this function does not match that of eqFunc: int (eqFunc)(void *, void *)

wlammen · 2022-01-03T12:19:02Z

mmwsts.c

+  if(stsUseCache) {
+    stsCache = htcreate(format, STS_CACHE_BUCKETS, "", (hashFunc*)&nmbrHash, (eqFunc*)&nmbrEq,
+			(letFunc*)&stsStoreCache, (freeFunc*)&stsFreeCache,
+			(eqFunc*)&stsDumpCache);


suspicious cast: signature of stsDumpCache does not match eqFunc.

wlammen · 2022-01-03T12:27:29Z

mmwsts.c

+}
+
+/* Parse a file containing the structured typesetting rules. */
+int parseSTSRules(vstring format) {


Coding style: This function is way too long. It covers more than 300 lines. This exceeds the recommended max length of 20 lines by far, and its code is easily broken down into steps that can be moved into helper functions. https://stackoverflow.com/questions/475675/when-is-a-function-too-long

document pre/postconditions: example: stsFormat is both an input and an output variable, but that is not easily seen.

wlammen · 2022-01-03T12:28:46Z

mmwsts.c

+  g_outputToString = 0;
+
+  /* If the same format was already parsed, nothing to do. */
+  if(strcmp(stsFormat, format) == 0) {


move the early out code to the beginning of the function where parameter checks usually take place

…amath-exe into tirix-structured-typesetting

…pesetting

wlammen · 2022-01-05T08:59:40Z

mmhtbl.c

+/* Dumpts the whole table */
+void htdump(hashtable *hashtable) {
+  print2("Hashtable %s:\n", hashtable->name);
+  //for(int bucket=0;bucket<hashtable->bucket_count;bucket++) {


This type of comment is not ANSI C compatible (see section 3.1.9). Use /* ... */ instead

I thought we were agreed on C99 now? I don't mind seeing // creep in, even if we don't do any bulk conversions.

Have we? Can you point me to where the decision took place?

I'm thinking of #8 (comment) . In short: we already compile only on C99, even before taking into account the recent refactors. (I'm not opposed to pushing the minimum beyond C99 (i.e. C11), but I don't think we should consider ANSI C (C89) any more.)

In C99 this issue is not relevant and can be ignored.

wlammen · 2022-01-05T09:09:50Z

mmdata.c

+{
+  if (start_position < 1) start_position = 1;
+  start_position--;
+  for(; occ > 0;occ--) {


Optimization: Use a dedicated loop variable in for loops. A compiler will allocate a CPU register for that.
for (long o = occ + 1; --o > 0;) {... (replace occ with o in loop)...}
is how I would write it.

Info: There is a tiny semantic change in my example: If the memory model of the computer supports signed magnitude instead of two's complement AND occ is MAX_LONG then (occ + 1)-1 == occ may not hold. We can safely ignore this nowadays.

long is a signed type, and signed overflow is UB, so I think that this change is legal by the spec.

@digama0 Exactly. Because occ+1 is UB when occ==MAX_LONG, --o may contain anything, and the loop starts with a random value. This cannot happen with the original code. We can safely ignore this semantic difference here.

wlammen · 2022-01-05T09:14:48Z

mmdata.c

 }

+/* Search for the nth occurrence of string2 in string1 */
+long nmbrInstrN(long start_position, long occ, nmbrString *string1,


why is occ defined as long? Do you really expect more than two billion occurrences (or even 32000 should int be only 16 bit wide) of a substring in a string?

wlammen · 2022-01-05T09:15:31Z

mmdata.c

+        }
+      }
+      if (found) {
+	start_position = i+1;


check indentation

wlammen · 2022-01-05T09:26:33Z

mmhtbl.c

+#define NO_LINKEDITEM -1
+linked *linkedItems;
+int free_linkedItem;
+flag htinit_done = 0;


make this a static variable within htinit(). It is completely private implementation detail in this function.

wlammen · 2022-01-05T09:28:29Z

mmwsts.c

+  return mmlLine;
+}
+
+


change this to
#if 0
test code
#endif
to comply with ANSI C

wlammen · 2022-01-05T09:36:00Z

I think there are already lots of ideas and issues for refactoring the source. In particular extracting code from long functions into auxiliary sub-function and documenting pre/postconditions can help with further review, since the source becomes a lot more readable then.

In addition merge conflicts have to be resolved.

tirix · 2022-01-08T03:52:08Z

Thank you very much @wlammen for your efforts reviewing my code!

Indeed now, before this can be merged, I'll have to solve the conflicts with all the refactoring that's going on.

…pesetting

digama0 · 2022-01-08T09:30:15Z

I took care of merging this with master. @tirix , you should double check the last commit, which fixes a few warnings I was getting with the original version.

tirix · 2022-01-08T11:25:45Z

Thank you Mario!
This looks good to me!

tirix added 3 commits December 24, 2021 14:53

Structured Typesetting (STS) generation

7742b5c

Merge branch 'master' into structured-typesetting

8af50bc

Rename global variables (to adapt to change 0.187 15-Aug-2020)

eedc22f

tirix mentioned this pull request Dec 24, 2021

Metamath software future directions #8

Closed

wlammen reviewed Dec 25, 2021

View reviewed changes

Delete makefile, formatting.

9202215

Double-space after periods in help texts.

991bd6c

benjub reviewed Dec 25, 2021

View reviewed changes

More help text formatting

1a66907

wlammen reviewed Dec 26, 2021

View reviewed changes

wlammen suggested changes Dec 26, 2021

View reviewed changes

wlammen reviewed Dec 26, 2021

View reviewed changes

wlammen suggested changes Dec 26, 2021

View reviewed changes

tirix added 2 commits December 26, 2021 23:10

Typo: parsetSTSRules -> parseSTSRules

f3ae0fa

Update comments in metamath.c

2a870d8

tirix commented Dec 26, 2021

View reviewed changes

wlammen suggested changes Dec 27, 2021

View reviewed changes

wlammen suggested changes Dec 28, 2021

View reviewed changes

wlammen mentioned this pull request Dec 28, 2021

A word of caution from Norm #13

Closed

wlammen suggested changes Jan 2, 2022

View reviewed changes

wlammen suggested changes Jan 3, 2022

View reviewed changes

digama0 added 2 commits January 4, 2022 11:22

Merge branch 'structured-typesetting' of https://github.com/tirix/met…

ea0bde7

…amath-exe into tirix-structured-typesetting

Merge remote-tracking branch 'origin/master' into tirix-structured-ty…

59c02a3

…pesetting

wlammen suggested changes Jan 5, 2022

View reviewed changes

digama0 added 3 commits January 8, 2022 04:02

Merge commit 'origin/master~' into tirix-structured-typesetting

6371dc7

Merge remote-tracking branch 'origin/master' into tirix-structured-ty…

dcbc83a

…pesetting

fix compile errors/warnings

c5925cb

tabs -> spaces

e5f602f

wlammen mentioned this pull request Mar 13, 2022

doc cmdInput1 and dependent(1) #75

Merged

wlammen mentioned this pull request Dec 29, 2022

fatal error handling, part 9 #111

Merged

tirix mentioned this pull request Oct 31, 2023

Html pages don't work tirix/metamath-web#24

Closed

GinoGiotto mentioned this pull request Nov 3, 2023

Fixes link to structured typesetting version of the HTML pages metamath/set.mm#3611

Merged

		if (switchPos("/ ALT_HTML") != 0 \|\| switchPos("/ STS") != 0 ) {
		print2("?Please specify only one of / HTML , / ALT_HTML and / STS.\n");

		/* 7-Jul-2017 added MathML/STS */
		if (switchPos("/ STS")) i = i + 1;

		}


		/* Add a single number to start of a nmbrString - faster than nmbrCat */

Conversation

tirix commented Dec 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wlammen left a comment

Choose a reason for hiding this comment

Uh oh!

wlammen left a comment

Choose a reason for hiding this comment

Uh oh!

wlammen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wlammen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tirix commented Dec 25, 2021

Uh oh!

benjub commented Dec 25, 2021

Uh oh!

tirix commented Dec 25, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benjub commented Dec 25, 2021

Uh oh!

wlammen commented Dec 25, 2021

Uh oh!

tirix commented Dec 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tirix commented Dec 25, 2021

Uh oh!

wlammen commented Dec 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benjub commented Dec 25, 2021

Uh oh!

wlammen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wlammen left a comment

Choose a reason for hiding this comment

Uh oh!

wlammen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wlammen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wlammen left a comment

Choose a reason for hiding this comment

Uh oh!

wlammen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tirix commented Dec 26, 2021

Uh oh!

tirix commented Dec 26, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wlammen Dec 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wlammen Dec 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tirix commented Dec 24, 2021 •

edited

Loading

wlammen left a comment •

edited

Loading

wlammen left a comment •

edited

Loading

tirix commented Dec 25, 2021 •

edited

Loading

wlammen commented Dec 25, 2021 •

edited

Loading

wlammen left a comment •

edited

Loading

wlammen left a comment •

edited

Loading

wlammen left a comment •

edited

Loading

wlammen left a comment •

edited

Loading

wlammen Dec 26, 2021 •

edited

Loading

wlammen Dec 26, 2021 •

edited

Loading

wlammen Dec 27, 2021 •

edited

Loading

wlammen Dec 27, 2021 •

edited

Loading

wlammen Dec 27, 2021 •

edited

Loading

wlammen Dec 27, 2021 •

edited

Loading