Skip to content

Commit 642e7fd

Browse files
committed
Merge branch 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux
Pull removal of in-kernel calls to syscalls from Dominik Brodowski: "System calls are interaction points between userspace and the kernel. Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy() should only be called from userspace via the syscall table, but not from elsewhere in the kernel. At least on 64-bit x86, it will likely be a hard requirement from v4.17 onwards to not call system call functions in the kernel: It is better to use use a different calling convention for system calls there, where struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands processing over to the actual syscall function. This means that only those parameters which are actually needed for a specific syscall are passed on during syscall entry, instead of filling in six CPU registers with random user space content all the time (which may cause serious trouble down the call chain). Those x86-specific patches will be pushed through the x86 tree in the near future. Moreover, rules on how data may be accessed may differ between kernel data and user data. This is another reason why calling sys_xyzzy() is generally a bad idea, and -- at most -- acceptable in arch-specific code. This patchset removes all in-kernel calls to syscall functions in the kernel with the exception of arch/. On top of this, it cleans up the three places where many syscalls are referenced or prototyped, namely kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h" * 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits) bpf: whitelist all syscalls for error injection kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions kernel/sys_ni: sort cond_syscall() entries syscalls/x86: auto-create compat_sys_*() prototypes syscalls: sort syscall prototypes in include/linux/compat.h net: remove compat_sys_*() prototypes from net/compat.h syscalls: sort syscall prototypes in include/linux/syscalls.h kexec: move sys_kexec_load() prototype to syscalls.h x86/sigreturn: use SYSCALL_DEFINE0 x86: fix sys_sigreturn() return type to be long, not unsigned long x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() ...
2 parents 2103596 + c9a2119 commit 642e7fd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

105 files changed

+3129
-1868
lines changed

Documentation/process/adding-syscalls.rst

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,7 @@ your new syscall number may get adjusted to resolve conflicts.
222222
The file ``kernel/sys_ni.c`` provides a fallback stub implementation of each
223223
system call, returning ``-ENOSYS``. Add your new system call here too::
224224

225-
cond_syscall(sys_xyzzy);
225+
COND_SYSCALL(xyzzy);
226226

227227
Your new kernel functionality, and the system call that controls it, should
228228
normally be optional, so add a ``CONFIG`` option (typically to
@@ -487,6 +487,38 @@ patchset, for the convenience of reviewers.
487487
The man page should be cc'ed to [email protected]
488488
For more details, see https://www.kernel.org/doc/man-pages/patches.html
489489

490+
491+
Do not call System Calls in the Kernel
492+
--------------------------------------
493+
494+
System calls are, as stated above, interaction points between userspace and
495+
the kernel. Therefore, system call functions such as ``sys_xyzzy()`` or
496+
``compat_sys_xyzzy()`` should only be called from userspace via the syscall
497+
table, but not from elsewhere in the kernel. If the syscall functionality is
498+
useful to be used within the kernel, needs to be shared between an old and a
499+
new syscall, or needs to be shared between a syscall and its compatibility
500+
variant, it should be implemented by means of a "helper" function (such as
501+
``kern_xyzzy()``). This kernel function may then be called within the
502+
syscall stub (``sys_xyzzy()``), the compatibility syscall stub
503+
(``compat_sys_xyzzy()``), and/or other kernel code.
504+
505+
At least on 64-bit x86, it will be a hard requirement from v4.17 onwards to not
506+
call system call functions in the kernel. It uses a different calling
507+
convention for system calls where ``struct pt_regs`` is decoded on-the-fly in a
508+
syscall wrapper which then hands processing over to the actual syscall function.
509+
This means that only those parameters which are actually needed for a specific
510+
syscall are passed on during syscall entry, instead of filling in six CPU
511+
registers with random user space content all the time (which may cause serious
512+
trouble down the call chain).
513+
514+
Moreover, rules on how data may be accessed may differ between kernel data and
515+
user data. This is another reason why calling ``sys_xyzzy()`` is generally a
516+
bad idea.
517+
518+
Exceptions to this rule are only allowed in architecture-specific overrides,
519+
architecture-specific compatibility wrappers, or other code in arch/.
520+
521+
490522
References and Sources
491523
----------------------
492524

arch/alpha/kernel/osf_sys.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ SYSCALL_DEFINE6(osf_mmap, unsigned long, addr, unsigned long, len,
189189
goto out;
190190
if (off & ~PAGE_MASK)
191191
goto out;
192-
ret = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
192+
ret = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
193193
out:
194194
return ret;
195195
}

arch/arm/kernel/sys_arm.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,5 +35,5 @@
3535
asmlinkage long sys_arm_fadvise64_64(int fd, int advice,
3636
loff_t offset, loff_t len)
3737
{
38-
return sys_fadvise64_64(fd, offset, len, advice);
38+
return ksys_fadvise64_64(fd, offset, len, advice);
3939
}

arch/arm64/kernel/sys.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
3434
if (offset_in_page(off) != 0)
3535
return -EINVAL;
3636

37-
return sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
37+
return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
3838
}
3939

4040
SYSCALL_DEFINE1(arm64_personality, unsigned int, personality)

arch/ia64/kernel/sys_ia64.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ int ia64_mmap_check(unsigned long addr, unsigned long len,
139139
asmlinkage unsigned long
140140
sys_mmap2 (unsigned long addr, unsigned long len, int prot, int flags, int fd, long pgoff)
141141
{
142-
addr = sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
142+
addr = ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
143143
if (!IS_ERR((void *) addr))
144144
force_successful_syscall_return();
145145
return addr;
@@ -151,7 +151,7 @@ sys_mmap (unsigned long addr, unsigned long len, int prot, int flags, int fd, lo
151151
if (offset_in_page(off) != 0)
152152
return -EINVAL;
153153

154-
addr = sys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
154+
addr = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
155155
if (!IS_ERR((void *) addr))
156156
force_successful_syscall_return();
157157
return addr;

arch/m68k/kernel/sys_m68k.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ asmlinkage long sys_mmap2(unsigned long addr, unsigned long len,
4646
* so we need to shift the argument down by 1; m68k mmap64(3)
4747
* (in libc) expects the last argument of mmap2 in 4Kb units.
4848
*/
49-
return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
49+
return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff);
5050
}
5151

5252
/* Convert virtual (user) address VADDR to physical address PADDR */

arch/microblaze/kernel/sys_microblaze.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
4040
if (pgoff & ~PAGE_MASK)
4141
return -EINVAL;
4242

43-
return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff >> PAGE_SHIFT);
43+
return ksys_mmap_pgoff(addr, len, prot, flags, fd, pgoff >> PAGE_SHIFT);
4444
}
4545

4646
SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
@@ -50,6 +50,6 @@ SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
5050
if (pgoff & (~PAGE_MASK >> 12))
5151
return -EINVAL;
5252

53-
return sys_mmap_pgoff(addr, len, prot, flags, fd,
54-
pgoff >> (PAGE_SHIFT - 12));
53+
return ksys_mmap_pgoff(addr, len, prot, flags, fd,
54+
pgoff >> (PAGE_SHIFT - 12));
5555
}

arch/mips/kernel/linux32.c

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,8 @@ SYSCALL_DEFINE6(32_mmap2, unsigned long, addr, unsigned long, len,
6767
{
6868
if (pgoff & (~PAGE_MASK >> 12))
6969
return -EINVAL;
70-
return sys_mmap_pgoff(addr, len, prot, flags, fd,
71-
pgoff >> (PAGE_SHIFT-12));
70+
return ksys_mmap_pgoff(addr, len, prot, flags, fd,
71+
pgoff >> (PAGE_SHIFT-12));
7272
}
7373

7474
#define RLIM_INFINITY32 0x7fffffff
@@ -82,13 +82,13 @@ struct rlimit32 {
8282
SYSCALL_DEFINE4(32_truncate64, const char __user *, path,
8383
unsigned long, __dummy, unsigned long, a2, unsigned long, a3)
8484
{
85-
return sys_truncate(path, merge_64(a2, a3));
85+
return ksys_truncate(path, merge_64(a2, a3));
8686
}
8787

8888
SYSCALL_DEFINE4(32_ftruncate64, unsigned long, fd, unsigned long, __dummy,
8989
unsigned long, a2, unsigned long, a3)
9090
{
91-
return sys_ftruncate(fd, merge_64(a2, a3));
91+
return ksys_ftruncate(fd, merge_64(a2, a3));
9292
}
9393

9494
SYSCALL_DEFINE5(32_llseek, unsigned int, fd, unsigned int, offset_high,
@@ -105,13 +105,13 @@ SYSCALL_DEFINE5(32_llseek, unsigned int, fd, unsigned int, offset_high,
105105
SYSCALL_DEFINE6(32_pread, unsigned long, fd, char __user *, buf, size_t, count,
106106
unsigned long, unused, unsigned long, a4, unsigned long, a5)
107107
{
108-
return sys_pread64(fd, buf, count, merge_64(a4, a5));
108+
return ksys_pread64(fd, buf, count, merge_64(a4, a5));
109109
}
110110

111111
SYSCALL_DEFINE6(32_pwrite, unsigned int, fd, const char __user *, buf,
112112
size_t, count, u32, unused, u64, a4, u64, a5)
113113
{
114-
return sys_pwrite64(fd, buf, count, merge_64(a4, a5));
114+
return ksys_pwrite64(fd, buf, count, merge_64(a4, a5));
115115
}
116116

117117
SYSCALL_DEFINE1(32_personality, unsigned long, personality)
@@ -131,15 +131,15 @@ SYSCALL_DEFINE1(32_personality, unsigned long, personality)
131131
asmlinkage ssize_t sys32_readahead(int fd, u32 pad0, u64 a2, u64 a3,
132132
size_t count)
133133
{
134-
return sys_readahead(fd, merge_64(a2, a3), count);
134+
return ksys_readahead(fd, merge_64(a2, a3), count);
135135
}
136136

137137
asmlinkage long sys32_sync_file_range(int fd, int __pad,
138138
unsigned long a2, unsigned long a3,
139139
unsigned long a4, unsigned long a5,
140140
int flags)
141141
{
142-
return sys_sync_file_range(fd,
142+
return ksys_sync_file_range(fd,
143143
merge_64(a2, a3), merge_64(a4, a5),
144144
flags);
145145
}
@@ -149,14 +149,14 @@ asmlinkage long sys32_fadvise64_64(int fd, int __pad,
149149
unsigned long a4, unsigned long a5,
150150
int flags)
151151
{
152-
return sys_fadvise64_64(fd,
152+
return ksys_fadvise64_64(fd,
153153
merge_64(a2, a3), merge_64(a4, a5),
154154
flags);
155155
}
156156

157157
asmlinkage long sys32_fallocate(int fd, int mode, unsigned offset_a2,
158158
unsigned offset_a3, unsigned len_a4, unsigned len_a5)
159159
{
160-
return sys_fallocate(fd, mode, merge_64(offset_a2, offset_a3),
161-
merge_64(len_a4, len_a5));
160+
return ksys_fallocate(fd, mode, merge_64(offset_a2, offset_a3),
161+
merge_64(len_a4, len_a5));
162162
}

arch/mips/kernel/syscall.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,8 @@ SYSCALL_DEFINE6(mips_mmap, unsigned long, addr, unsigned long, len,
6363
{
6464
if (offset & ~PAGE_MASK)
6565
return -EINVAL;
66-
return sys_mmap_pgoff(addr, len, prot, flags, fd, offset >> PAGE_SHIFT);
66+
return ksys_mmap_pgoff(addr, len, prot, flags, fd,
67+
offset >> PAGE_SHIFT);
6768
}
6869

6970
SYSCALL_DEFINE6(mips_mmap2, unsigned long, addr, unsigned long, len,
@@ -73,7 +74,8 @@ SYSCALL_DEFINE6(mips_mmap2, unsigned long, addr, unsigned long, len,
7374
if (pgoff & (~PAGE_MASK >> 12))
7475
return -EINVAL;
7576

76-
return sys_mmap_pgoff(addr, len, prot, flags, fd, pgoff >> (PAGE_SHIFT-12));
77+
return ksys_mmap_pgoff(addr, len, prot, flags, fd,
78+
pgoff >> (PAGE_SHIFT - 12));
7779
}
7880

7981
save_static_function(sys_fork);

arch/parisc/kernel/sys_parisc.c

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -270,16 +270,16 @@ asmlinkage unsigned long sys_mmap2(unsigned long addr, unsigned long len,
270270
{
271271
/* Make sure the shift for mmap2 is constant (12), no matter what PAGE_SIZE
272272
we have. */
273-
return sys_mmap_pgoff(addr, len, prot, flags, fd,
274-
pgoff >> (PAGE_SHIFT - 12));
273+
return ksys_mmap_pgoff(addr, len, prot, flags, fd,
274+
pgoff >> (PAGE_SHIFT - 12));
275275
}
276276

277277
asmlinkage unsigned long sys_mmap(unsigned long addr, unsigned long len,
278278
unsigned long prot, unsigned long flags, unsigned long fd,
279279
unsigned long offset)
280280
{
281281
if (!(offset & ~PAGE_MASK)) {
282-
return sys_mmap_pgoff(addr, len, prot, flags, fd,
282+
return ksys_mmap_pgoff(addr, len, prot, flags, fd,
283283
offset >> PAGE_SHIFT);
284284
} else {
285285
return -EINVAL;
@@ -292,24 +292,24 @@ asmlinkage unsigned long sys_mmap(unsigned long addr, unsigned long len,
292292
asmlinkage long parisc_truncate64(const char __user * path,
293293
unsigned int high, unsigned int low)
294294
{
295-
return sys_truncate(path, (long)high << 32 | low);
295+
return ksys_truncate(path, (long)high << 32 | low);
296296
}
297297

298298
asmlinkage long parisc_ftruncate64(unsigned int fd,
299299
unsigned int high, unsigned int low)
300300
{
301-
return sys_ftruncate(fd, (long)high << 32 | low);
301+
return ksys_ftruncate(fd, (long)high << 32 | low);
302302
}
303303

304304
/* stubs for the benefit of the syscall_table since truncate64 and truncate
305305
* are identical on LP64 */
306306
asmlinkage long sys_truncate64(const char __user * path, unsigned long length)
307307
{
308-
return sys_truncate(path, length);
308+
return ksys_truncate(path, length);
309309
}
310310
asmlinkage long sys_ftruncate64(unsigned int fd, unsigned long length)
311311
{
312-
return sys_ftruncate(fd, length);
312+
return ksys_ftruncate(fd, length);
313313
}
314314
asmlinkage long sys_fcntl64(unsigned int fd, unsigned int cmd, unsigned long arg)
315315
{
@@ -320,7 +320,7 @@ asmlinkage long sys_fcntl64(unsigned int fd, unsigned int cmd, unsigned long arg
320320
asmlinkage long parisc_truncate64(const char __user * path,
321321
unsigned int high, unsigned int low)
322322
{
323-
return sys_truncate64(path, (loff_t)high << 32 | low);
323+
return ksys_truncate(path, (loff_t)high << 32 | low);
324324
}
325325

326326
asmlinkage long parisc_ftruncate64(unsigned int fd,
@@ -333,42 +333,42 @@ asmlinkage long parisc_ftruncate64(unsigned int fd,
333333
asmlinkage ssize_t parisc_pread64(unsigned int fd, char __user *buf, size_t count,
334334
unsigned int high, unsigned int low)
335335
{
336-
return sys_pread64(fd, buf, count, (loff_t)high << 32 | low);
336+
return ksys_pread64(fd, buf, count, (loff_t)high << 32 | low);
337337
}
338338

339339
asmlinkage ssize_t parisc_pwrite64(unsigned int fd, const char __user *buf,
340340
size_t count, unsigned int high, unsigned int low)
341341
{
342-
return sys_pwrite64(fd, buf, count, (loff_t)high << 32 | low);
342+
return ksys_pwrite64(fd, buf, count, (loff_t)high << 32 | low);
343343
}
344344

345345
asmlinkage ssize_t parisc_readahead(int fd, unsigned int high, unsigned int low,
346346
size_t count)
347347
{
348-
return sys_readahead(fd, (loff_t)high << 32 | low, count);
348+
return ksys_readahead(fd, (loff_t)high << 32 | low, count);
349349
}
350350

351351
asmlinkage long parisc_fadvise64_64(int fd,
352352
unsigned int high_off, unsigned int low_off,
353353
unsigned int high_len, unsigned int low_len, int advice)
354354
{
355-
return sys_fadvise64_64(fd, (loff_t)high_off << 32 | low_off,
355+
return ksys_fadvise64_64(fd, (loff_t)high_off << 32 | low_off,
356356
(loff_t)high_len << 32 | low_len, advice);
357357
}
358358

359359
asmlinkage long parisc_sync_file_range(int fd,
360360
u32 hi_off, u32 lo_off, u32 hi_nbytes, u32 lo_nbytes,
361361
unsigned int flags)
362362
{
363-
return sys_sync_file_range(fd, (loff_t)hi_off << 32 | lo_off,
363+
return ksys_sync_file_range(fd, (loff_t)hi_off << 32 | lo_off,
364364
(loff_t)hi_nbytes << 32 | lo_nbytes, flags);
365365
}
366366

367367
asmlinkage long parisc_fallocate(int fd, int mode, u32 offhi, u32 offlo,
368368
u32 lenhi, u32 lenlo)
369369
{
370-
return sys_fallocate(fd, mode, ((u64)offhi << 32) | offlo,
371-
((u64)lenhi << 32) | lenlo);
370+
return ksys_fallocate(fd, mode, ((u64)offhi << 32) | offlo,
371+
((u64)lenhi << 32) | lenlo);
372372
}
373373

374374
long parisc_personality(unsigned long personality)

0 commit comments

Comments
 (0)