Skip to content

Update from upstream repo github/linguist#2

Open
backstroke-bot wants to merge 1545 commits intooctocat:masterfrom
github-linguist:master
Open

Update from upstream repo github/linguist#2
backstroke-bot wants to merge 1545 commits intooctocat:masterfrom
github-linguist:master

Conversation

@backstroke-bot
Copy link

Hello!
The remote github/linguist has some new changes that aren't in this fork.

So, here they are, ready to be merged! 🎉

If this pull request can be merged without conflict, you can publish your software
with these new changes. Otherwise, if you have merge conflicts, this
is the place to fix them.

Have fun!


Created by Backstroke. Oh yea, I'm a bot.

@ghost
Copy link

ghost commented Oct 3, 2016

Hello!
The remote github/linguist has some new changes that aren't in this fork.

root = true

[*]
charset = utf-8

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所得到的多

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi I'd like to pull request

xiaq and others added 9 commits September 2, 2022 11:53
* Add Gemini language

* Remove .gemini extension

Co-authored-by: printfn <printfn@users.noreply.github.com>
Co-authored-by: Colin Seymour <colin@github.com>
* Add generic .tag to JSP

* Update lib/linguist/heuristics.yml

Co-authored-by: John Gardner <gardnerjohng@gmail.com>

Co-authored-by: John Gardner <gardnerjohng@gmail.com>
* Update all grammars

* Update all cached licenses

* Version 7.23.0

* Update all grammars

* Update cached licenses
This PR does not introduce any changes to installed packages, but it does change the filesystem by deleting the package caches.

- The `linux-headers` package is found in the base image, such that adding it is not required
- The removal of `build-base`, `libc-dev`, and `cmake` are handled by `apk del build_deps`, the virtual package we created. `linux-headers` can not be deleted, as it is required by `libffi-dev` in the base image.
- Removing `--no-cache` on the second `apk add` because the package caches were already downloaded and retained in the first `apk add`, so there's no need to fetch them again.
- Removing the caches from `/var/cache/apk/*` saves some KBees 🐝
Add Brewfile for bootstrapping deps on Mac

Co-authored-by: Colin Seymour <colin@github.com>
* Repoint razor-plus grammar submodule at new repo

* Run list-grammars and update README

* Move razor-plus project into github-linguist org

* Update grammar README for last change
* Update linguist CLI to analyze specific revisions
* Update README to document new --rev option
Paranoid46 pushed a commit to Paranoid46/linguist that referenced this pull request Oct 6, 2022
Update from upstream repo github/linguist
Alhadis and others added 17 commits October 19, 2022 15:31
Co-authored-by: Colin Seymour <colin@github.com>
Co-authored-by: John Gardner <gardnerjohng@gmail.com>
* Feat/cypher (#3)

* add cypher grammar
* add samples
* add missing file extension sample
* upd ordering
* add extra sample
* making license type precise

* remove license update mistake

* more samples (#4)

* Chore/add more samples (#5)

* more samples

* trim examples

* Chore/add more samples (#6)

* trim examples

* Delete graph_alg.cql

* remove cql (#7)

* Update languages.yml

* remove cql

* typo

* new examples

Co-authored-by: benf <benf@local>

Co-authored-by: benf <benf@local>
adding support for .jsh extension

Co-authored-by: Colin Seymour <colin@github.com>
* Add SDC and XDC to TCL language

* Add better constraint samples

* Add aliases

Co-authored-by: Colin Seymour <colin@github.com>
Co-authored-by: Colin Seymour <colin@github.com>
* Add PDDL

* Update lib/linguist/languages.yml

Co-authored-by: Colin Seymour <colin@github.com>

* remove large examples

* Add PDDL  to README

Co-authored-by: Colin Seymour <colin@github.com>
starlark: support recognition of WORKSPACE.bazel

Both WORKSPACE and WORKSPACE.bazel are valid names for the WORKSPACE
file. The latter takes precedent, even though it is an alias because
other projects may have a similarly named file.

Alternatively, *.bazel could be added to extensions, but .bzl is
the recommended extension.

This adds a WORKSPACE.bazel file from
https://github.com/google/skia/blob/main/WORKSPACE.bazel
* Add language: Just

* Submodules: Update

* Update Justfile

* Rename to justfile

* Add license snapshot

Source: https://github.com/skellock/vscode-just/commit/e781b35a3ca38d8a3c4a0650f6982b5712b23406\#diff-c693279643b8cd5d248172d9c22cb7cf4ed163a3c98c8a3f69c2717edd3eacb7

* Update lib/linguist/languages.yml

Co-authored-by: Casey Rodarmor <casey@rodarmor.com>

* Update grammars.yml

Co-authored-by: Casey Rodarmor <casey@rodarmor.com>

* Fix just language id

* Update license

* Remove extensions

* Rerun ./script/list-grammars

* Apply suggestions from code review

* Update grammars.yml

* Samples/Justfiles -> samples/just

* Rerun tests

* Fix order

Co-authored-by: Casey Rodarmor <casey@rodarmor.com>
Co-authored-by: Colin Seymour <colin@github.com>
* Add OASv2 and OASv3 languages

* Add test fixtures for OASv2 and v3

Co-authored-by: Colin Seymour <colin@github.com>
* Add Language: Imba

* Update lib/linguist/languages.yml

Co-authored-by: Colin Seymour <colin@github.com>

Co-authored-by: Colin Seymour <colin@github.com>
* Add Scenic language

* Update Scenic grammar

* forgot to update metadata

* Update Scenic grammar
* Add VB6

Adding the VB6 language and removing it as an alias of VBA.

* Remove .vb6 extension

No samples for .vb6 found on GitHub

* Add samples

* Update ids

* Change language name and adjust aliases

In response to requested change: #6124 (comment)

* Change .dsr to .Dsr

* Add addtionnal sample for .Dsr

* Change folder name

* Fix order

* Add missing VB6 line

Co-authored-by: Colin Seymour <colin@github.com>
* Generate samples during bootstrap

* This isn't needed as the rake does it
DecimalTurn and others added 20 commits August 29, 2024 14:21
replace `language-mcfunction` -> `syntax-mcfunction`
* Change Cairo grammar repo to software-mansion-labs/cairo-tm-grammar

Signed-off-by: Marek Kaput <marek.kaput@swmansion.com>

* Rename `Cairo` to `Cairo 0` to reflect official language name change

Signed-off-by: Marek Kaput <marek.kaput@swmansion.com>

* Add Cairo language and heuristics to disambiguate it from Cairo 0

Signed-off-by: Marek Kaput <marek.kaput@swmansion.com>

* Add CASM samples to help classifier identify them as Cairo 0

Signed-off-by: Marek Kaput <marek.kaput@swmansion.com>

* Add more samples for Cairo and Cairo 0

* Remove Cairo 0 heuristics

This commit partially reverts a718fec

* Change language IDs for Cairo langs as asked in review, and group them

* Revert "Remove Cairo 0 heuristics"

This reverts commit 25dd32a.

* Add `ap++` sequence to Cairo 0 heuristic

* Assume Cairo if no Cairo 0 heuristic match

* Rename `Cairo 0` to `Cairo Zero`

This change has been suggested by the StarkWare Product Team,
so here it is.

---------

Signed-off-by: Marek Kaput <marek.kaput@swmansion.com>
* Update the references to the modern qsharp repository.

* Update yml files
* Add entry to language.yml and grammar

* Add samples

* Run script/update-ids

* Fix iCalendar's language.yml entry

* Add missing trailing newlines

* Update lib/linguist/languages.yml

Co-authored-by: John Gardner <gardnerjohng@gmail.com>

---------

Co-authored-by: John Gardner <gardnerjohng@gmail.com>
* Add vCard with sample

* Add sample + remove comment in yml

* Add vCard grammar

* Add id

* Edit aliases

* Add vcf to TSV + heuristics

* Add test
* Add language

* Add sample 1

https://github.com/AnywhereSoftware/B4X-Pleroma/blob/master/OAuth.bas

* Add sample 2

https://github.com/AnywhereSoftware/B4X-Pleroma/blob/master/RequestsManager.bas

* Language Id + Sample + Grammar

* Add heuristic

* Edit .bas heuristic test

* Edit heuristic

* Handle BOM issue with heuristic

* Limit search in the first 10 lines

* Simplify heuristic

* Simplify heuristic further

* Adjust heuristic

This commit moves the check for BOM at the start of the file and fixes a potential problem of compatibility with re2.
Note that `{3}?` in re2 is interpreted as matching the previous token exactly 3 times exactly while the Oniguruma engine interprets this as matching 3 or 0 times.

* Remove redundant `^`

* Use portable version
revise: updates the WDL language grammar
* Update heuristics.yml

FIx .yy heuristic to account for changes in property name in GMStudio 2.3

* Add sample

* Fix generated detection (WIP)

* Relax the constraint that the property has to be on the 3rd line

* Targeting JSON's heuristic directly

* Remove outdated comment
* chore: update grammar

* Revert "chore: update grammar"

This reverts commit c756098.

* Re-replace grammar

---------

Co-authored-by: Colin Seymour <colin@symr.io>
* Use match?

* Remove double-negation
* Add uv.lock to languages.yml as a TOML file

* Use a smaller sample file
* Add initial support for carbon

* Apply custom language ID to carbon

* Carbon classes

* example window creation

* Removing the .cb file extension for Carbon

* Puts V/Go syntax in Carbon syntax highlighting

* thanks for the fix

Co-authored-by: John Gardner <gardnerjohng@gmail.com>

* Carbon in vendor/README.md

---------

Co-authored-by: John Gardner <gardnerjohng@gmail.com>
* Add support for `HOSTS.TXT` files

* Update license hash
* Add `.peggy` for PEG.js

* Swap `semver` sample for `abnfp` for peggy
* Add extra aliases for vimscript

* Update lib/linguist/languages.yml

---------

Co-authored-by: Colin Seymour <colin@github.com>
* New Centroid-based Classifier

Training:

* A fixed vocabulary is set to all tokens that appear in, at least, 2
  samples.
* All out-of-vocabulary tokens are discarded.
* For every token, we set its Inverse Class Frequency (ICF) to
`log(ct / cf) + 1` where `ct` is the total number of classes and `cf` is
the number of classes where the token occurs.
* Each sample is converted to a vector of `tf * icf` for every token in
the vocabulary. `tf` is `1 + log(freq)`, where `freq` is the
number of occurrences of the token in the given sample.
* Samples are L2-normalized.
* For each class (language), we compute the centroid of all its training
samples by averaging them and L2-normalizing the result.

Classification:

* For a new sample, we get the L2-normalized vector with `tf * icf`
terms for every known token, then classify the sample using the nearest
centroid. Cosine similarity is used as similarity measure for this.

* Fixture file is now detected as Raku

* Update lib/linguist/samples.rb

Co-authored-by: Colin Seymour <colin@github.com>

* Update test/test_classifier.rb

Co-authored-by: Colin Seymour <colin@github.com>

* Add exec bit

* Adjust acceptable errors

* Remove two useless samples

* Add a better R sample

* Remove fixmes

* Remove empty lines

---------

Co-authored-by: Colin Seymour <colin@github.com>
Co-authored-by: Colin Seymour <colin@symr.io>
Co-authored-by: John Gardner <gardnerjohng@gmail.com>
* Add the "LiveCode Script" language.

* Add examples for the `*.lc` extension

* Removing the ".lc" extension and its samples

* Update vendor/licenses/git_submodule/vscode-livecodescript.dep.yml

---------

Co-authored-by: Colin Seymour <colin@github.com>
* Switch PEG.js TM Scope to `source.peggy`

* Add missing license

* Re-gen grammar list

---------

Co-authored-by: Colin Seymour <colin@github.com>
* Add Dune

* Remove dune-file which only has one use

* Merge all Dune entries into the same languages

Since they all share the same grammar, they should just be considered as
one language. The grammar used also only defines one source.dune scope.

* Reduce scope to just dune-project

- `dune` is only used by a bit over 100 repositories (5 pages), the
  1.8k in the search results isn't what we're counting here
- The two workspace files have even fewer uses
Noordsestern and others added 4 commits August 31, 2024 09:26
* add .resource extension to robot

* add resource file example

* docs

* add heuristics for RF resource files

* fix typo

* add robotframework keywords heuristic

---------

Co-authored-by: Colin Seymour <colin@github.com>
* Update all grammars

* Update cached licenses

* v8.0.0

* Correct license type

* Update grammars

* Update cached licenses
* Update number of acceptable classification errors.

* Update number of acceptable errors when using --all
* Update Move grammar

* Update cached license

* v8.0.1
@lildude lildude deleted the branch octocat:master September 17, 2024 15:31
@lildude lildude deleted the master branch September 17, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants