Skip to content

Documenting/improving ways to quickly check docbook xml syntax/correctness on editor save #72

@TysonAndre

Description

@TysonAndre

Motivation

Rendering a part of the docbook takes around 10 seconds for me, even for a partial build, and requires prerequisite steps

time phd --docbook doc-base/.manual.xml --package PHP --partial en/reference/simdjson --format xhtml

Some editors (e.g. vim) don't have xml validation built in, and rely on plugins using external programs such as xmllint (from libxml2-utils) to work, so documenting ways to set up xml validation would save time

Related to php/doc-en#1148

Feature Request

Add example scripts and editorconfigs to quickly check validity of individual xml files to doc-base/scripts.

This could possibly be extended by hardcoding known entities and warning about unknown entities, xml tag names, etc
(or by actually configuring the proper dtd files when run in the doc-base folder)

(other alternatives exist, but usually require external programs, e.g. https://github.com/vim-syntastic/syntastic/blob/master/syntax_checkers/xml/xmllint.vim - assume php documentation contributors would have php installed)

" Example additions to vimrc to check xml tags match up
function! XMLsynCHK()
  let winnum =winnr() " get current window number
  silent make %
  cw 4 " open the error window if it contains error
  " return to the window with cursor set on the line of the first error (if any)
  execute winnum . "wincmd w"
  :redraw!
endfunction
au! BufWritePost  *.xml    call XMLsynCHK()

au FileType xml,docbk setlocal makeprg=/path/to/doc-base/scripts/xmllint.php
au FileType xml,docbk setlocal errorformat=%m\ in\ %f\ on\ line\ %l
#!/usr/bin/env php
<?php // xmllint.php

/** @return never */
function print_usage_and_exit() {
    global $argv;
    fprintf(STDERR, "Usage: %s path/to/file.xml\n", $argv[0]);
    exit(1);
}

call_user_func(function () {
    error_reporting(E_ALL);
    ini_set('display_errors', E_ALL);
    global $argv;
    if (count($argv) !== 2) {
        print_usage_and_exit();
    }
    $file = $argv[1];
    if (!is_readable($file)) {
        fprintf(STDERR, "%s is not readable\n", var_export($file, true));
        print_usage_and_exit();
    }
    $contents = file_get_contents($file);
    if (!is_string($contents)) {
        fprintf(STDERR, "Could not read %s\n", var_export($file, true));
        print_usage_and_exit();
    }
    libxml_use_internal_errors(true);
    try {
        (new DOMDocument())->loadXML($contents, LIBXML_PARSEHUGE|LIBXML_COMPACT);
    } catch (Exception $e) { }
    foreach (libxml_get_errors() as $error) {
        $message = trim($error->message);
        if (preg_match('/^Entity.*not defined$/', $message)) {
            continue;
        }
        
        printf("%s in %s on line %d\n", $message, $file, $error->line);
    }
});

Brainstorming other ideas

  • For DOMDocument::schemaValidate - I see https://docbook.org/ns/docbook has no official schema. doc-base has RFC/schema for a proposed schema but the commit from 2010 notes "PhD doesn't use any of this"
  • I'm not familiar with the implementation of the tools. Currently, it seems like we have to generate the entire .manual.xml with the manual of all settings, to generate the html even for one page. (process on http:// site for http://doc.php.net/tutorial/local-setup.php )
  • I haven't yet looked into whether phd or configure.php can be changed to run on an error-tolerant way on a single file without building the full manual.xml file with every single page (or by using some other method faster for decoding and retrieval than parsing an entire xml file, e.g. putting all the definitions once in sqlite, caching it, and only querying the necessary rows later and on manual request)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions