Skip to content

Error while re-indexing pdfs #1844

@arrlee

Description

@arrlee

I am running caUtils reindex-pdfs on my installation. After indexing all pdf documents it always throws an exception (see below). I can use the full-text search and get results. However, I am missing some terms which are present in document processed at the end of the reindexing.

I am using providence+pawtucket 2.0.9. This is not a fresh installation but a migration from 1.7.17.

93^MRe...99.0% 316/318 ETC: 44s. Elapsed: 01h:56m [▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩| ]
94^MR...100.0% 317/318 ETC: 21s. Elapsed: 01h:56m [▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩| ]
92^MRei...100.0% 318/318 ETC: -. Elapsed: 01h:56m [▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩▩]tput: No value for $TERM and no -T specified

PHP Fatal error:  Uncaught DatabaseException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ')' at line 4 in /var/www/providence/app/lib/Db/mysqli.php:358
Stack trace:
#0 /var/www/providence/app/lib/Db/DbStatement.php(150): Db_mysqli->execute()
#1 /var/www/providence/app/lib/Db.php(259): DbStatement->executeWithParamsAsArray()
#2 /var/www/providence/app/lib/Utils/CLIUtils/Media.php(649): Db->query()
#3 /var/www/providence/support/bin/caUtils(171): CLIUtils::reindex_pdfs()
#4 {main}
  thrown in /var/www/providence/app/lib/Db/mysqli.php on line 358

To find the exact statement I edited mysqli.php to output every sql statement:

...
56, 514, 'accumulateurs', 'a:1:{i:0;a:5:{s:1:\"p\";i:6;s:2:\"x1\";s:7:\"313.254\";s:2:\"y1\";s:7:\"165.062\";s:2:\"x2\";s:7:\"371.322\";s:2:\"y2\";s:7:\"178.320\";}}')INSERT INTO ca_media_content_locations (table_num, row_id, content, loc) VALUES (56, 514, 'faible', 'a:1:{i:0;a:5:{s:1:\"p\";i:6;s:2:\"x1\";s:7:\"423.332\";s:2:\"y1\";s:7:\"156.999\";s:2:\"x2\";s:7:\"445.873\";s:2:\"y2\";s:7:\"170.993\";}}')INSERT INTO ca_media_content_locations (table_num, row_id, content, loc) VALUES (56, 514, 'radio', 'a:1:{i:0;a:5:{s:1:\"p\";i:6;s:2:\"x1\";s:7:\"461.718\";s:2:\"y1\";s:7:\"156.999\";s:2:\"x2\";s:7:\"481.806\";s:2:\"y2\";s:7:\"170.993\";}}')

                                SELECT *
                                FROM ca_metadata_elements cme

                        PHP Warning:  Invalid argument supplied for foreach() in /var/www/providence/app/helpers/utilityHelpers.php on line 2535

                                        SELECT count(*) c
                                        FROM ca_attribute_values
                                        WHERE
                                                element_id in ()
                                PHP Fatal error:  Uncaught DatabaseException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ')' at line 4 in /var/www/providence/app/lib/Db/mysqli.php:317
Stack trace:
#0 /var/www/providence/app/lib/Db/DbStatement.php(151): Db_mysqli->execute()
#1 /var/www/providence/app/lib/Db.php(259): DbStatement->executeWithParamsAsArray()
#2 /var/www/providence/app/lib/Utils/CLIUtils.php(1117): Db->query()
#3 /var/www/providence/support/bin/caUtils(167): CLIUtils::reindex_pdfs()
#4 {main}
  thrown in /var/www/providence/app/lib/Db/mysqli.php on line 317

The statement is built in providence/app/lib/Utils/CLIUtils/Media.php at this lines:


 647                         $va_elements = ca_metadata_elements::getElementsAsList(false, null, null, true, false, true, array(16)); // 16=media
 648                         echo 'va_kinds='.$va_kinds.', va_elements='.print_r($va_elements, true);
 649
 650                         $qr_c = $o_db->query("
 651                                 SELECT count(*) c
 652                                 FROM ca_attribute_values
 653                                 WHERE
 654                                         element_id in (?)
 655                         ", array(caExtractValuesFromArrayList($va_elements, 'element_id', array('preserveKeys' => false))));

If $va_elements is empty, then the query it syntactically wrong. The va_elements are populated from the ca_metadata_elements and filtered by the media type, which I guess corresponds to the datatype column in the table. My table is missing the number 16, which explains the exception.

mysql> select distinct datatype from ca_metadata_elements;
+----------+
| datatype |
+----------+
|        1 |
|        3 |
|        2 |
|        4 |
|        0 |
|       13 |
|        5 |
|       14 |
|        8 |
|        6 |
|       10 |
+----------+
11 rows in set (0.00 sec)

I stopped here for time being. Would be interesting to know who populates this table and when and whether the missing 16 is an error itself or a valid case.

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions