Skip to content

Commit e7cc4cb

Browse files
committed
PretrainedTokenizer::truncateHelper: prevent array_slice() error
with the if-clause in PretrainedTokenizer::truncateHelper certain input may result in the following error: array_slice(): Argument #1 ($array) must be of type array, null given two tests were added to prove that the fix is working: 1. SummarizationPipelineTest: Integration test which checks behavior using a real model and some extracted text from a PDF. I think there is a better way to accomplish the same test result, because this one test runs 10+ sec. locally. 2. PretrainedTokenizerTest: Unit test to check PretrainedTokenizer::truncateHelper itself. The input is flawed by design, which would trigger the error without the fix.
1 parent 115a397 commit e7cc4cb

File tree

4 files changed

+139
-0
lines changed

4 files changed

+139
-0
lines changed

src/PretrainedTokenizers/PretrainedTokenizer.php

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -468,6 +468,10 @@ function truncateHelper(array &$item, int $length): void
468468
// Setting .length to a lower value truncates the array in-place.
469469
// Note: In PHP, arrays automatically adjust their size, so we don't need to explicitly set the length.
470470
foreach (array_keys($item) as $key) {
471+
if (false == $item[$key]) {
472+
continue;
473+
}
474+
471475
$item[$key] = array_slice($item[$key], 0, $length);
472476
}
473477
}
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Tests\Utils;
6+
7+
use Codewithkyrian\Transformers\PretrainedTokenizers\PretrainedTokenizer;
8+
use Codewithkyrian\Transformers\Transformers;
9+
10+
use function Codewithkyrian\Transformers\Pipelines\pipeline;
11+
12+
beforeEach(function () {
13+
Transformers::setup()
14+
->setCacheDir('tests/models')
15+
->apply();
16+
});
17+
18+
/**
19+
* TODO
20+
*/
21+
it('trigger array_slice error using test data', function () {
22+
$generator = pipeline('summarization', 'Xenova/distilbart-cnn-6-6');
23+
$text = file_get_contents(__DIR__.'/../test_files/extracted_text_pdf.txt');
24+
$result = $generator($text);
25+
26+
expect($result[0]['summary_text'])->toContain('last comprehensive');
27+
});
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Tests\Utils;
6+
7+
use Codewithkyrian\Transformers\PretrainedTokenizers\PretrainedTokenizer;
8+
9+
/**
10+
* TODO
11+
*/
12+
it('truncateHelper ignores invalid array values', function () {
13+
// build dummy variable to pass the constructor without raising an error
14+
$tokenizerJSON = [
15+
'model' => [
16+
'type' => '__test',
17+
'vocab' => [
18+
'<s>' => 0,
19+
],
20+
],
21+
];
22+
23+
$subjectUnderTest = new PretrainedTokenizer($tokenizerJSON, []);
24+
25+
$itemArray = [
26+
'foo' => [0, 1],
27+
'bar' => null
28+
];
29+
30+
// without the fix, it would lead to the following error:
31+
// array_slice(): Argument #1 ($array) must be of type array, null given
32+
$subjectUnderTest->truncateHelper($itemArray, 1024);
33+
});
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
OWL Reasoners still useable in 2023
2+
Konrad Abicht
3+
4+
13.09.2023
5+
Abstract
6+
In a systematic literature and software review over 100 OWL reasoners/systems were analyzed to
7+
see if they would still be usable in 2023. This has never been done in this capacity. OWL reasoners
8+
still play an important role in knowledge organisation and management, but the last comprehensive
9+
surveys/studies are more than 8 years old. The result of this work is a comprehensive list of 95
10+
standalone OWL reasoners and systems using an OWL reasoner. For each item, information on
11+
project pages, source code repositories and related documentation was gathered. The raw research
12+
data is provided in a Github repository for anyone to use.
13+
1 Introduction
14+
There are many surveys and studies concerning OWL reasoners. Some examine the underlying methods
15+
and functionality, others compare performance metrics. One might think that the field of OWL reasoners is
16+
well established and that there is software for each relevant application. But this is not the case. Instead I
17+
have noticed that well known reasoners have hardly been updated in the last 10 years (e.g. HermiT). Some
18+
are still usable, mostly as Prot´eg´e plugins, but it raises the question whether new (research or commercial)
19+
projects should rely on them. How are they maintained? Are bugs detected and dealt with? Do projects
20+
maintain their software dependencies? People interested in OWL reasoners today face many obstacles. To
21+
get a neutral view on the software landscape, I conducted a survey between May and July 2023. You hold
22+
the results of this work in your hands.
23+
This paper is structured as follows: Section 2 contains short summary of required background knowl-
24+
edge. Section 3 then summarises related work. Section 4 describes my methodology and the section 5
25+
presents results of my research. Finally, in section 6, I draw my conclusions and in section 7, I provide
26+
further starting points for future work.
27+
1.1 Publicly available research data
28+
All research data is publicly available via a Github repository. It contains a CSV file with a list of analyzed
29+
OWL reasoners as well a CSV file with systems using a foreign OWL reasoner. For each entry there is
30+
metadata about installation, usability and references such as source code repository. All this data is
31+
available at the following URL:
32+
https://github.com/k00ni/owl-reasoner-list
33+
I invite everyone to contribute. The repository is designed in a way to support further research and
34+
additions, so that others can continue the work in the years to come without having to start from scratch
35+
each time.
36+
1
37+
38+
39+
Figure 1Figure 2
40+
2 Reader background
41+
You should have an extended knowledge of Semantic Web technologies and concepts such as RDF, RDFS,
42+
OWL 1/2 and OWL reasoning. There are many programming/software environments used to develop OWL
43+
reasoners, so basic knowledge in compiling and executing programs is recommended. Basic knowledge of
44+
software development using distributed version control systems, such as Git, is helpful. Below is a brief
45+
summary of the most widely used systems.
46+
2.1 Prot´eg´e
47+
Prot´eg´e[73] is an ontology editor well known to ontologists and Semantic Web developers. It has been
48+
developed by Stanford University
49+
1
50+
. It provides tools for developing and maintaining OWL ontologies.
51+
There are many plugins available, for instance to use an OWL reasoner. Prot´eg´e is written in Java and
52+
runs on Windows 10/11 as well as Ubuntu Linux.
53+
2.2 OWL API
54+
OWL-API [24] is written in Java and provides an Application Programming Interface for managing OWL
55+
ontologies. In addition to parsing and manipulating OWL ontologies, it also allows the use of reasoners.
56+
It also includes validators for different OWL profiles, for instance OWL 2 QL
57+
2
58+
, OWL 2 EL
59+
3
60+
or OWL 2
61+
RL
62+
4
63+
. Further information and source code can be found on the project page
64+
5
65+
.
66+
3 Related work
67+
Since the publication of OWL in 2001, there have been many benchmarks and surveys comparing and
68+
evaluating OWL reasoners. In the following only the most recent and relevant ones are presented.
69+
The most recent and relevant publication [30] is from 2023. The authors evaluated the performance of
70+
six prominent OWL 2 DL compliant reasoners (such as Pellet, FaCT++ and Hermit) on various reasoning
71+
tasks. One of their findings was that many projects are no longer actively maintained. This supports my
72+
results and observations, even though their metrics differ from the ones used in this paper (they used a
73+
wider range for activity: last 10 years).
74+
1
75+
https://protege.stanford.edu/

0 commit comments

Comments
 (0)