Skip to content

Commit b9670c1

Browse files
newrengitster
authored andcommitted
dir: fix checks on common prefix directory
Many years ago, the directory traversing logic had an optimization that would always recurse into any directory that was a common prefix of all the pathspecs without walking the leading directories to get down to the desired directory. Thus, git ls-files -o .git/ # case A would notice that .git/ was a common prefix of all pathspecs (since it is the only pathspec listed), and then traverse into it and start showing unknown files under that directory. Unfortunately, .git/ is not a directory we should be traversing into, which made this optimization problematic. This also affected cases like git ls-files -o --exclude-standard t/ # case B where t/ was in the .gitignore file and thus isn't interesting and shouldn't be recursed into. It also affected cases like git ls-files -o --directory untracked_dir/ # case C where untracked_dir/ is indeed untracked and thus interesting, but the --directory flag means we only want to show the directory itself, not recurse into it and start listing untracked files below it. The case B class of bugs were noted and fixed in commits 16e2cfa ("read_directory(): further split treat_path()", 2010-01-08) and 48ffef9 ("ls-files: fix overeager pathspec optimization", 2010-01-08), with the idea being that we first wanted to check whether the common prefix was interesting. The former patch noted that treat_path() couldn't be used when checking the common prefix because treat_path() requires a dir_entry() and we haven't read any directories at the point we are checking the common prefix. So, that patch split treat_one_path() out of treat_path(). The latter patch then created a new treat_leading_path() which duplicated by hand the bits of treat_path() that couldn't be broken out and then called treat_one_path() for the remainder. There were three problems with this approach: * The duplicated logic in treat_leading_path() accidentally missed the check for special paths (such as is_dot_or_dotdot and matching ".git"), causing case A types of bugs to continue to be an issue. * The treat_leading_path() logic assumed we should traverse into anything where path_treatment was not path_none, i.e. it perpetuated class C types of bugs. * It meant we had split logic that needed to kept in sync, running the risk that people introduced new inconsistencies (such as in commit be8a84c, which we reverted earlier in this series, or in commit df5bcdf which we'll fix in a subsequent commit) Fix most these problems by making treat_leading_path() not only loop over each leading path component, but calling treat_path() directly on each. To do so, we have to create a synthetic dir_entry, but that only takes a few lines. Then, pay attention to the path_treatment result we get from treat_path() and don't treat path_excluded, path_untracked, and path_recurse all the same as path_recurse. This leaves one remaining problem, the new inconsistency from commit df5bcdf. That will be addressed in a subsequent commit. Signed-off-by: Elijah Newren <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent c5c4edd commit b9670c1

File tree

2 files changed

+59
-14
lines changed

2 files changed

+59
-14
lines changed

dir.c

Lines changed: 56 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2102,37 +2102,82 @@ static int treat_leading_path(struct dir_struct *dir,
21022102
const struct pathspec *pathspec)
21032103
{
21042104
struct strbuf sb = STRBUF_INIT;
2105-
int baselen, rc = 0;
2105+
int prevlen, baselen;
21062106
const char *cp;
2107+
struct cached_dir cdir;
2108+
struct dirent *de;
2109+
enum path_treatment state = path_none;
2110+
2111+
/*
2112+
* For each directory component of path, we are going to check whether
2113+
* that path is relevant given the pathspec. For example, if path is
2114+
* foo/bar/baz/
2115+
* then we will ask treat_path() whether we should go into foo, then
2116+
* whether we should go into bar, then whether baz is relevant.
2117+
* Checking each is important because e.g. if path is
2118+
* .git/info/
2119+
* then we need to check .git to know we shouldn't traverse it.
2120+
* If the return from treat_path() is:
2121+
* * path_none, for any path, we return false.
2122+
* * path_recurse, for all path components, we return true
2123+
* * <anything else> for some intermediate component, we make sure
2124+
* to add that path to the relevant list but return false
2125+
* signifying that we shouldn't recurse into it.
2126+
*/
21072127

21082128
while (len && path[len - 1] == '/')
21092129
len--;
21102130
if (!len)
21112131
return 1;
2132+
2133+
/*
2134+
* We need a manufactured dirent with sufficient space to store a
2135+
* leading directory component of path in its d_name. Here, we
2136+
* assume that the dirent's d_name is either declared as
2137+
* char d_name[BIG_ENOUGH]
2138+
* or that it is declared at the end of the struct as
2139+
* char d_name[]
2140+
* For either case, padding with len+1 bytes at the end will ensure
2141+
* sufficient storage space.
2142+
*/
2143+
de = xcalloc(1, sizeof(struct dirent)+len+1);
2144+
memset(&cdir, 0, sizeof(cdir));
2145+
cdir.de = de;
2146+
#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
2147+
de->d_type = DT_DIR;
2148+
#endif
21122149
baselen = 0;
2150+
prevlen = 0;
21132151
while (1) {
2114-
cp = path + baselen + !!baselen;
2152+
prevlen = baselen + !!baselen;
2153+
cp = path + prevlen;
21152154
cp = memchr(cp, '/', path + len - cp);
21162155
if (!cp)
21172156
baselen = len;
21182157
else
21192158
baselen = cp - path;
2120-
strbuf_setlen(&sb, 0);
2159+
strbuf_reset(&sb);
21212160
strbuf_add(&sb, path, baselen);
21222161
if (!is_directory(sb.buf))
21232162
break;
2124-
if (simplify_away(sb.buf, sb.len, pathspec))
2125-
break;
2126-
if (treat_one_path(dir, NULL, istate, &sb, baselen, pathspec,
2127-
DT_DIR, NULL) == path_none)
2163+
strbuf_reset(&sb);
2164+
strbuf_add(&sb, path, prevlen);
2165+
memcpy(de->d_name, path+prevlen, baselen-prevlen);
2166+
de->d_name[baselen-prevlen] = '\0';
2167+
state = treat_path(dir, NULL, &cdir, istate, &sb, prevlen,
2168+
pathspec);
2169+
if (state != path_recurse)
21282170
break; /* do not recurse into it */
2129-
if (len <= baselen) {
2130-
rc = 1;
2171+
if (len <= baselen)
21312172
break; /* finished checking */
2132-
}
21332173
}
2174+
add_path_to_appropriate_result_list(dir, NULL, &cdir, istate,
2175+
&sb, baselen, pathspec,
2176+
state);
2177+
2178+
free(de);
21342179
strbuf_release(&sb);
2135-
return rc;
2180+
return state == path_recurse;
21362181
}
21372182

21382183
static const char *get_ident_string(void)

t/t3011-common-prefixes-and-directory-traversal.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ test_expect_success 'git ls-files -o --directory untracked_dir does not recurse'
7474
test_cmp expect actual
7575
'
7676

77-
test_expect_failure 'git ls-files -o --directory untracked_dir/ does not recurse' '
77+
test_expect_success 'git ls-files -o --directory untracked_dir/ does not recurse' '
7878
echo untracked_dir/ >expect &&
7979
git ls-files -o --directory untracked_dir/ >actual &&
8080
test_cmp expect actual
@@ -86,7 +86,7 @@ test_expect_success 'git ls-files -o untracked_repo does not recurse' '
8686
test_cmp expect actual
8787
'
8888

89-
test_expect_failure 'git ls-files -o untracked_repo/ does not recurse' '
89+
test_expect_success 'git ls-files -o untracked_repo/ does not recurse' '
9090
echo untracked_repo/ >expect &&
9191
git ls-files -o untracked_repo/ >actual &&
9292
test_cmp expect actual
@@ -133,7 +133,7 @@ test_expect_success 'git ls-files -o .git shows nothing' '
133133
test_must_be_empty actual
134134
'
135135

136-
test_expect_failure 'git ls-files -o .git/ shows nothing' '
136+
test_expect_success 'git ls-files -o .git/ shows nothing' '
137137
git ls-files -o .git/ >actual &&
138138
test_must_be_empty actual
139139
'

0 commit comments

Comments
 (0)