Skip to content

Commit cb756ae

Browse files
authored
Add .gitattributes override mention when returning the strategy (#7600)
* Record gitattribute override as a strategy with unit and integration tests * Force strategy detection when instrumenter is definded * Improve command line docs and info on --strategies flag * Adjust examples * Add "confirmed by gitattributes"
1 parent e5e38c0 commit cb756ae

File tree

4 files changed

+285
-2
lines changed

4 files changed

+285
-2
lines changed

README.md

Lines changed: 106 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,9 +70,14 @@ project.languages #=> { "Ruby" => 119387 }
7070

7171
### Command line usage
7272

73+
The `github-linguist` executable operates in two distinct modes:
74+
75+
1. **[Git Repository mode](#git-repository)** - Analyzes an entire Git repository (when given a directory path or no path)
76+
2. **[Single file mode](#single-file)** - Analyzes a specific file (when given a file path)
77+
7378
#### Git Repository
7479

75-
A repository's languages stats can also be assessed from the command line using the `github-linguist` executable.
80+
A repository's languages stats can be assessed from the command line using the `github-linguist` executable.
7681
Without any options, `github-linguist` will output the language breakdown by percentage and file size.
7782

7883
```bash
@@ -151,6 +156,51 @@ lib/linguist.rb
151156
152157
```
153158

159+
##### `--strategies`
160+
161+
The `--strategies` or `-s` flag will show the language detection strategy used for each file. This is useful for understanding how Linguist determined the language of specific files. Note that unless the `--json` flag is specified, this flag will set the `--breakdown` flag implicitly.
162+
163+
You can try running `github-linguist` on the root directory in this repository itself with the strategies flag:
164+
165+
```console
166+
$ github-linguist --breakdown --strategies
167+
66.84% 264519 Ruby
168+
24.68% 97685 C
169+
6.57% 25999 Go
170+
1.29% 5098 Lex
171+
0.32% 1257 Shell
172+
0.31% 1212 Dockerfile
173+
174+
Ruby:
175+
Gemfile [Filename]
176+
Rakefile [Filename]
177+
bin/git-linguist [Extension]
178+
bin/github-linguist [Extension]
179+
lib/linguist.rb [Extension]
180+
181+
```
182+
183+
If a file's language is affected by `.gitattributes`, the strategy will show the original detection method along with a note indicating whether the gitattributes setting changed the result or confirmed it.
184+
185+
For instance, if you had the following .gitattributes overrides in your repo:
186+
187+
```gitattributes
188+
189+
*.ts linguist-language=JavaScript
190+
*.js linguist-language=JavaScript
191+
192+
```
193+
194+
the output of Linguist would be something like this:
195+
196+
```console
197+
100.00% 217 JavaScript
198+
199+
JavaScript:
200+
demo.ts [Heuristics (overridden by .gitattributes)]
201+
demo.js [Extension (confirmed by .gitattributes)]
202+
```
203+
154204
##### `--json`
155205

156206
The `--json` or `-j` flag output the data into JSON format.
@@ -168,6 +218,8 @@ $ github-linguist --breakdown --json
168218

169219
```
170220

221+
NB. The `--strategies` flag has no effect, when the `--json` flag is present.
222+
171223
#### Single file
172224

173225
Alternatively you can find stats for a single file using the `github-linguist` executable.
@@ -182,6 +234,59 @@ grammars.yml: 884 lines (884 sloc)
182234
language: YAML
183235
```
184236

237+
#### Additional options
238+
239+
##### `--breakdown`
240+
241+
This flag has no effect in *Single file* mode.
242+
243+
##### `--strategies`
244+
245+
When using the `--strategies` or `-s` flag with a single file, you can see which detection method was used:
246+
247+
```console
248+
$ github-linguist --strategies lib/linguist.rb
249+
lib/linguist.rb: 105 lines (96 sloc)
250+
type: Text
251+
mime type: application/x-ruby
252+
language: Ruby
253+
strategy: Extension
254+
```
255+
256+
If a file's language is affected by `.gitattributes`, the strategy will show whether the gitattributes setting changed the result or confirmed it:
257+
258+
In this fictitious example, it says "confirmed by .gitattributes" since the detection process (using the Filename strategy) would have given the same output as the override:
259+
```console
260+
.devcontainer/devcontainer.json: 27 lines (27 sloc)
261+
type: Text
262+
mime type: application/json
263+
language: JSON with Comments
264+
strategy: Filename (confirmed by .gitattributes)
265+
```
266+
267+
In this other fictitious example, it says "overridden by .gitattributes" since the gitattributes setting changes the detected language to something different:
268+
269+
```console
270+
test.rb: 13 lines (11 sloc)
271+
type: Text
272+
mime type: application/x-ruby
273+
language: Java
274+
strategy: Extension (overridden by .gitattributes)
275+
```
276+
277+
Here, the `.rb` file would normally be detected as Ruby by the Extension strategy, but `.gitattributes` overrides it to be detected as Java instead.
278+
279+
##### `--json`
280+
281+
Using the `--json` flag will give you the output for a single file in JSON format:
282+
283+
```console
284+
$ github-linguist --strategies --json lib/linguist.rb
285+
{"lib/linguist.rb":{"lines":105,"sloc":96,"type":"Text","mime_type":"application/x-ruby","language":"Ruby","large":false,"generated":false,"vendored":false}}
286+
```
287+
288+
NB. The `--strategies` has no effect, when the `--json` flag is present.
289+
185290
#### Docker
186291

187292
If you have Docker installed you can either build or use

lib/linguist/lazy_blob.rb

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,25 @@ def language
7373
return @language if defined?(@language)
7474

7575
@language = if lang = git_attributes['linguist-language']
76-
Language.find_by_alias(lang)
76+
detected_language = Language.find_by_alias(lang)
77+
78+
# If strategies are being tracked, get the original strategy that would have been used
79+
if detected_language && Linguist.instrumenter
80+
# Get the original strategy by calling super (which calls Linguist.detect)
81+
original_language = super
82+
original_strategy_info = Linguist.instrumenter.detected_info[self.name]
83+
original_strategy = original_strategy_info ? original_strategy_info[:strategy] : "Unknown"
84+
85+
if original_language == detected_language
86+
strategy_name = "#{original_strategy} (confirmed by .gitattributes)"
87+
else
88+
strategy_name = "#{original_strategy} (overridden by .gitattributes)"
89+
end
90+
91+
strategy = Struct.new(:name).new(strategy_name)
92+
Linguist.instrument("linguist.detected", blob: self, strategy: strategy, language: detected_language)
93+
end
94+
detected_language
7795
else
7896
super
7997
end

test/test_basic_instrumenter.rb

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,4 +81,25 @@ def test_tracks_filename_strategy
8181
assert_equal "Filename", @instrumenter.detected_info[blob.name][:strategy]
8282
assert_equal "Dockerfile", @instrumenter.detected_info[blob.name][:language]
8383
end
84+
85+
def test_tracks_override_strategy
86+
# Simulate a blob with a gitattributes override
87+
blob = Linguist::FileBlob.new("Gemfile", "")
88+
# Simulate detection with gitattributes strategy showing the override
89+
strategy = Struct.new(:name).new("Filename (overridden by .gitattributes)")
90+
language = Struct.new(:name).new("Java")
91+
@instrumenter.instrument("linguist.detected", blob: blob, strategy: strategy, language: language) {}
92+
assert @instrumenter.detected_info.key?(blob.name)
93+
assert_match(/overridden by \.gitattributes/, @instrumenter.detected_info[blob.name][:strategy])
94+
assert_equal "Java", @instrumenter.detected_info[blob.name][:language]
95+
end
96+
end
97+
98+
def test_override_strategy_is_recorded
99+
# This file is overridden by .gitattributes to be detectable and language Markdown
100+
blob = sample_blob("Markdown/tender.md")
101+
Linguist.detect(blob)
102+
assert @instrumenter.detected_info.key?(blob.name)
103+
assert_includes ["GitAttributes"], @instrumenter.detected_info[blob.name][:strategy]
104+
assert_equal "Markdown", @instrumenter.detected_info[blob.name][:language]
84105
end

test/test_cli_integration.rb

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
require_relative "./helper"
2+
require 'tmpdir'
3+
require 'fileutils'
4+
require 'open3'
5+
6+
class TestCLIIntegration < Minitest::Test
7+
def setup
8+
@temp_dir = Dir.mktmpdir('linguist_cli_test')
9+
@original_dir = Dir.pwd
10+
Dir.chdir(@temp_dir)
11+
12+
# Initialize a git repository
13+
system("git init --quiet")
14+
system("git config user.name 'Test User'")
15+
system("git config user.email 'test@example.com'")
16+
end
17+
18+
def teardown
19+
Dir.chdir(@original_dir)
20+
FileUtils.rm_rf(@temp_dir)
21+
end
22+
23+
def test_strategies_flag_with_gitattributes_override
24+
# Create a .gitattributes file that overrides language detection
25+
File.write('.gitattributes', "*.special linguist-language=Ruby\n")
26+
27+
# Create a test file with a non-Ruby extension but Ruby content
28+
File.write('test.special', "puts 'Hello, World!'\n")
29+
30+
# Stage and commit the files
31+
system("git add .")
32+
system("git commit -m 'Initial commit' --quiet")
33+
34+
# Run github-linguist with --strategies flag from the original directory but pointing to our test file
35+
stdout, stderr, status = Open3.capture3(
36+
"bundle", "exec", "github-linguist", File.join(@temp_dir, "test.special"), "--strategies",
37+
chdir: @original_dir
38+
)
39+
40+
assert status.success?, "CLI command failed: #{stderr}"
41+
assert_match(/language:\s+Ruby/, stdout, "Should detect Ruby language")
42+
assert_match(/strategy:\s+.*\(overridden by \.gitattributes\)/, stdout, "Should show override in strategy")
43+
end
44+
45+
def test_strategies_flag_with_normal_detection
46+
# Create a normal Ruby file
47+
File.write('test.rb', "puts 'Hello, World!'\n")
48+
49+
# Stage and commit the file
50+
system("git add .")
51+
system("git commit -m 'Initial commit' --quiet")
52+
53+
# Run github-linguist with --strategies flag
54+
stdout, stderr, status = Open3.capture3(
55+
"bundle", "exec", "github-linguist", File.join(@temp_dir, "test.rb"), "--strategies",
56+
chdir: @original_dir
57+
)
58+
59+
assert status.success?, "CLI command failed: #{stderr}"
60+
assert_match(/language:\s+Ruby/, stdout, "Should detect Ruby language")
61+
assert_match(/strategy:\s+Extension/, stdout, "Should show Extension strategy")
62+
end
63+
64+
def test_breakdown_with_gitattributes_strategies
65+
# Create multiple files with different detection methods
66+
File.write('.gitattributes', "*.special linguist-language=JavaScript\n")
67+
File.write('override.special', "console.log('overridden');\n")
68+
File.write('normal.js', "console.log('normal');\n")
69+
File.write('Dockerfile', "FROM ubuntu\n")
70+
71+
# Stage and commit the files
72+
system("git add .")
73+
system("git commit -m 'Initial commit' --quiet")
74+
75+
# Run github-linguist with --breakdown --strategies flags on the test repository
76+
stdout, stderr, status = Open3.capture3(
77+
"bundle", "exec", "github-linguist", @temp_dir, "--breakdown", "--strategies",
78+
chdir: @original_dir
79+
)
80+
81+
assert status.success?, "CLI command failed: #{stderr}"
82+
83+
# Check that GitAttributes strategy appears for the overridden file
84+
assert_match(/override\.special \[.* \(overridden by \.gitattributes\)\]/, stdout, "Should show override for overridden file")
85+
86+
# Check that normal detection strategies appear for other files
87+
assert_match(/normal\.js \[Extension\]/, stdout, "Should show Extension strategy for .js file")
88+
assert_match(/Dockerfile \[Filename\]/, stdout, "Should show Filename strategy for Dockerfile")
89+
end
90+
91+
def test_json_output_preserves_functionality
92+
# Create a simple test file
93+
File.write('test.rb', "puts 'Hello, World!'\n")
94+
95+
# Stage and commit the file
96+
system("git add .")
97+
system("git commit -m 'Initial commit' --quiet")
98+
99+
# Run github-linguist with --json flag
100+
stdout, stderr, status = Open3.capture3(
101+
"bundle", "exec", "github-linguist", File.join(@temp_dir, "test.rb"), "--json",
102+
chdir: @original_dir
103+
)
104+
105+
assert status.success?, "CLI command failed: #{stderr}"
106+
107+
# Parse JSON output
108+
require 'json'
109+
result = JSON.parse(stdout)
110+
111+
test_file_key = File.join(@temp_dir, "test.rb")
112+
assert_equal "Ruby", result[test_file_key]["language"], "JSON output should contain correct language"
113+
assert_equal "Text", result[test_file_key]["type"], "JSON output should contain correct type"
114+
end
115+
116+
def test_repository_scan_with_gitattributes
117+
# Create a more complex repository structure
118+
FileUtils.mkdir_p('src')
119+
File.write('.gitattributes', "*.config linguist-language=JavaScript\n")
120+
File.write('src/app.rb', "class App\nend\n")
121+
File.write('config.config', "var x = 1;\n")
122+
123+
# Stage and commit the files
124+
system("git add .")
125+
system("git commit -m 'Initial commit' --quiet")
126+
127+
# Run github-linguist on the test repository
128+
stdout, stderr, status = Open3.capture3(
129+
"bundle", "exec", "github-linguist", @temp_dir, "--breakdown", "--strategies",
130+
chdir: @original_dir
131+
)
132+
133+
assert status.success?, "CLI command failed: #{stderr}"
134+
135+
# Verify that both normal and override detection work in repository scan
136+
assert_match(/src\/app\.rb \[Extension\]/, stdout, "Should show Extension strategy for Ruby file")
137+
assert_match(/config\.config \[.* \(overridden by \.gitattributes\)\]/, stdout, "Should show override for overridden file")
138+
end
139+
end

0 commit comments

Comments
 (0)