Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion dune-project
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ possible and does not make any assumptions about IO.
(ppx_expect (and (>= v0.15.0) :with-test))
(ocamlformat (and :with-test (= 0.24.1)))
(ocamlc-loc (and (>= 3.5.0) (< 3.7.0)))
(omd (and (>= 1.3.2) (< 2.0.0~alpha1)))
(octavius (>= 1.2.2))
(uutf (>= 1.0.2))
(pp (>= 1.1.2))
Expand Down
1 change: 0 additions & 1 deletion ocaml-lsp-server.opam
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ depends: [
"ppx_expect" {>= "v0.15.0" & with-test}
"ocamlformat" {with-test & = "0.24.1"}
"ocamlc-loc" {>= "3.5.0" & < "3.7.0"}
"omd" {>= "1.3.2" & < "2.0.0~alpha1"}
"octavius" {>= "1.2.2"}
"uutf" {>= "1.0.2"}
"pp" {>= "1.1.2"}
Expand Down
241 changes: 241 additions & 0 deletions ocaml-lsp-server/src/omd/ABOUT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
<!-- -*- coding: utf-8 -*- -->

About [OMD](https://github.com/pw374/omd/)
==========================================

The implementation of this library and command-line tool
is based on [DFMSD][].
That description doesn't define a grammar but a sort of guide for
human users who are not trying to implement it. In other words,
it's ambiguous, which is a problem since there are no errors in the
Markdown language, which design is mostly based on some
email-writing experience: the meaning of a phrase is the meaning
a human would give when reading the phrase as some email contents.
For instance, if there are blank lines that have spaces
(lines that read empty but actually contain some characters, from
the computer point of view since spaces are represented by characters),
since they're invisible to the normal human reader, they should be ignored.


Specificities
-------------

There follows a list of specificities of OMD.
This list is probably not exhaustive.

**Please note that OMD's semantics have changed over time, but they are becoming
more and more stable with time and new releases. The goal is to eventually
have a semantics that's as sane as it can possibly be for a Markdown parser.
Please [browse and open issues](https://github.com/pw374/omd/issues/)
if you find something that seems wrong.**

- Email addresses encoding: email addresses are not hex entity-encoded.

- `[foo]` is a short-cut for `[foo][]`, but if `foo` is not a reference
then `[foo]` is printed `[foo]`, not `[foo][]`.
*(Taken from Github Flavour Markdown.)*

- The Markdown to Markdown conversion may performe
some cleaning (some meaningless characters may disappear)
or spoiling (some meaningless characters may appear),
but both inputs and ouputs should have the same semantics (otherwise
please do report the bug).

- A list containing at least one item which has at least one paragraph
is a list for which all items have paragraphs and/or blocks.
In HTML words, in practice, if an `li` of a `ul` or `ol` has a `p`,
then all other `li`s of that list have at least a `p` or a `pre`.

- It's not possible to emphasise a part of a word using underscores.
*(Taken from Github Flavour Markdown.)*

- A code section declared with at least 3 backquotes (`` ` ``) at the
first element on a line is a code block. The backquotes should be
followed by a language name (made of a-z characters) or by a newline.

- A code block starting with several backquotes (e.g., ```` ``` ````)
immediately followed by a word W made of a-z characters is a code block
for which the code language is W. (If you use other characters than
a-z, the semantics is currently undefined although it's deterministic
of course, because it may change in the near future.) Also, if you use
the command line tool `omd`, you can define programs to process code
blocks specifically to the languages that are declared for those code
blocks.

- Each and every tabulation is converted by OMD to 4 spaces at the lexing
step. And the behaviour of the parser is undefined for tabulations.
- Note that it does mean that if you have a document with some code written
using the
[Whitespace](http://en.wikipedia.org/wiki/Whitespace_(programming_language))
language, it will not work very well. This might be fixed in the future
but unless you have a very good reason for OMD to support tabulations,
it will probably not.

- Parentheses and square brackets are generally parsed in a way such that
`[a[b]](http://c/(d))` is the URL `http://c/(d)` with the text `a[b]`.
If you want a parenthesis or bracket not to count in the balanced parsing,
escape it with a backslash, such as in `[a\[b](http://c/\(d)`.
*This is typically something that's not defined in [DFMSD].*
- Note about backslashes in URLs: some web browsers (e.g., Safari)
automatically convert `\` to `/`. It's not the case of CURL.
However I assume it's safe to consider that backslashes are not
to be used in URLs. Still it's always possible to
backslashe-escape them anyways.

- HTML is somewhat a part of Markdown. OMD will partially parse HTML tags
and if you have a tag that isn't a known HTML tag, then it's possible
that OMD will not consider it as HTML. For instance, a document
containing just `<foo></foo>` will be converted to
`<p>&lt;foo&gt;&lt;/foo&gt;</p>`.
- It's possible to ask `omd` to relax this constraint.

- Some additional features are available on the command line.
For more information, used the command `omd -help`



[DFMSD]: http://daringfireball.net/projects/markdown/syntax
"John Gruber's description of the syntax of Markdown"

"DFMSD" is short for "Daring Fireball: Markdown Syntax Documentation",
which is the HTML title of the page located at
<http://daringfireball.net/projects/markdown/syntax>.

Extension mechanisms
--------------------

The parser is implemented using a big (very big) recursive function
(`Omd_parser.Make(Env).main_loop_rev`), with a set of some auxiliary
functions. Some parts are easy to understand, some parts are
not. However, overall, it should be easy enough.


The parser has a double extension mechanism.

1. To use the first mechanism, you may define a set of functions in
the module `Env` given to instanciate the functor `Omd_parser.Make`.
* The value `Env.extensions` is a list of elements of
type `Omd_representation.extension` which is equal to
`r -> p -> l -> (r * p * l) option` where
* `r = Omd_representation.t`
and represents the result of the parsing process,
* `p = Omd_representation.tok list`
and represents the tokens preceding `l`,
* and `l = tok list` and is the list of tokens to parse.
* The result, of type `(r * p * l) option`, is `None` if
the extension has no effect (and the parser will continue
doing its job with its state it had before using the
extension), and is `Some(r,p,l)` when it gives a new set of
data to the parser.
* Each element of the list `Env.extensions` is applied in a fold left
manner. (The first element of that list is applied first.)
* And they are applied when a standard parsing rule fails.

2. The second extension stands in the representation of the lexemes
(`Tag of string * extension`).
It allows to insert extensions directly into the lexeme list.

The Markdown representation also provides an extension mechanism,
which is useful if you want to insert “smart objects” (which are as
“smart” as smartphones). Those objects have four methods, 2 of them
are particularly useful: `to_html` and `to_t`, and implementing one
of them is necessary. They both return a `string option`, and a default
dummy such smart object can be defined as follows:

```ocaml
let dummy =
X (object
method name = "dummy"
method to_html ?(indent=0) _ _ = None
method to_sexpr _ _ = None
method to_t _ = None
end)
```



History
-------

OMD has been developed by [Philippe Wang](https://github.com/pw374/)
at [OCaml Labs](http://ocaml.io/) in [Cambridge](http://www.cl.cam.ac.uk),
with precious feedbacks and [pull requests](https://github.com/pw374/omd/pulls)
(cf. next section).

Its development was motivated by at least these facts:

- We wanted an OCaml implementation of Markdown; some OCaml parsers of
Markdown existed before but they were incomplete. It's easier for an
OCaml project to depend on an pure-OCaml implementation of Markdown than
to depend some interface to a library implemented using another language,
and this is ever more important since [Opam](https://opam.ocaml.org) exists.

- We wanted to provide a way to make the contents of
the [OCaml.org](http://ocaml.org/) website be essentially in Markdown
instead of HTML. And we wanted to this website to be implemented in
OCaml.

- Having an OCaml implementation of Markdown is virtually mandatory for
those who want to use a Markdown parser in
a [Mirage](http://www.openmirage.org) application.
Note that OMD has replaced the previous Markdown parser of
[COW](https://github.com/mirage/ocaml-cow), which has been developed
as part of the Mirage project.



Thanks
------

Thank you to
[Christophe Troestler](https://github.com/Chris00),
[Ashish Argawal](https://github.com/agarwal),
[Sebastien Mondet](https://github.com/smondet),
[Thomas Gazagnaire](https://github.com/samoht),
[Daniel Bünzli](https://github.com/dbuenzli),
[Amir Chaudry](https://github.com/amirmc),
[Anil Madhavapeddy](https://github.com/avsm/),
[David Sheets](https://github.com/dsheets/),
[Jeremy Yallop](https://github.com/yallop/),
and \<please insert your name here if you believe you've been forgotten\>
for their feedbacks and contributions to this project.



Miscellaneous notes
-------------------

- There's been absolutely no effort in making OMD fast, but it should be
amongst the fastest parsers of Markdown, just thanks to the fact that
it is implemented in OCaml. That being said, there's quite some room
for performance improvements. One way would be to make a several-pass
parser with different intermediate representations (there're currently
only 2 representations: one for the lexing tokens and one for the parse
tree).

- The hardest part of implementing a parser of Markdown is the process
of understanding and unravelling the grammar of Markdown to turn it into
a program.

- OMD 1.0.0 will probably use some external libraries,
e.g., [UUNF](http://erratique.ch/software/uunf)
and perhaps [Xmlm](http://erratique.ch/software/xmlm/doc/Xmlm)


- "OMD" is the name of this library and command-line tool.
- It might be written "Omd" or "omd" sometimes, but it should
be written using capital letters because it should be read
`əʊ ɛm diː` rather than `ə'md` or `ˌɒmd`.

- "`Omd`" is a module.
- It's written using monospace font and it's capitalized.

- "`omd`" is a command-line tool.
- It's written using monospace font and it's always lowercase letters only
because unless you have a non-sensitive file system, calling `Omd` on the
command line is not just another way of calling `omd`.

- OMD has been added on the quite long list of Markdown parsers
<https://github.com/markdown/markdown.github.com/wiki/Implementations>
on the 29th of January.

117 changes: 117 additions & 0 deletions ocaml-lsp-server/src/omd/CHANGES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Document Title

1.3.2
------

- port from oasis to dune (#273, @tmattio)

1.3.x
-----

- might stop checking validity of HTML tag *names* and accept any XML-parsable
tag name.

1.2.5
-----

- only fixes a single bug (an ordered list could be transformed into an
unordered list)

1.2.4
-----

- only fixes a single bug (some spaces were wrongly handled in the HTML parsing)

1.2.2/3
-------

- fix a few issues with HTML parsing.

1.2.1
-----

- mainly fixes issues with HTML parsing.

1.2.0
-----

- introduces options `-w` and `-W`. Fixes mostly concern subtle uses of `\n`s in
HTML and Markdown outputs.

1.1.2
-----

- fix: some URL-related parsing issues.

1.1.0/1.1.1
-----------

- fix: some HTML-related issues.

1.0.1
-----

- fixes some parsing issues, improves output. (2014-10-02)

1.0.0
-----

- warning: this release is only partially compatible with previous versions.

- accept HTML blocks which directly follow each other

- fix: accept all XML-compatible attribute names for HTML
attributes

- fix backslash-escaping for hash-ending ATX-titles + fix Markdown output for
Html_block

- fix (HTML parsing) bugs introduced in 1.0.0.b and 1.0.0.c

- rewrite parser of block HTML to use the updated Omd.t

- rewrite parser of inline HTML to use the updated Omd.t

- upgrade Omd.t for HTML representation

There will not be any newer 0.9.x release although new bugs have been
discovered. Thus it's recommended to upgrade to the latest 1.x.y.

0.9.7
-----

- introduction of media:end + bug fixes.

If you need to have a version that still has `Tag of extension` instead of `Tag
of name * extension` and don't want to upgrade, you may use 0.9.3

0.9.6
-----

- fix a bug (concerning extensions) introduced by 0.9.4.

0.9.5
-----

- bug fix + `Tag of extension` changed to `Tag of name * extension`

0.9.4
-----

- fixes a bug for the new feature

0.9.3
-----

- new feature `media:type="text/omd"`. This version is recommended if you do
not use that new feature and want to use 0.9.x

0.9.2
-----

- not released...

older versions
--------------

- cf. [commit log](https://github.com/ocaml/omd/commits/master)
20 changes: 20 additions & 0 deletions ocaml-lsp-server/src/omd/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
##
# Omd
#
# @file

.PHONY: test build fmt deps

build: deps
dune build

deps:
opam install . --deps-only --yes

test:
dune build @gen --auto-promote
dune runtest

fmt:
dune build @fmt --auto-promote
# end
Loading