Continue autograding if some cells are duplicated#18
Continue autograding if some cells are duplicated#18
Conversation
…release feedback (cherry picked from commit e4801f5)
|
The failed checks seem to be mostly because of the changed |
|
Great explanation! And great work figuring stuff out. I think your analysis looks good, and it's a main question of what to do to make it reusable for everyone. The question of updating the global metadata is good - I don't know how hard that is to do or how disruptive it is. I find it interesting how nbgrader tries to handle errors of duplicated cells, but that just makes a different error! With a worse error message. Now my biggest thought is, long-term: what's the best way to handle this? Print a warning or fail completely. If it removes metadata for all later cells, will that fail autograding? Or is it that people are adding too much metadata that isn't needed (e.g. giving cells an ID when it's only some supporting stuff that may as well be split). Makes me wonder: should we recommend not giving cells an ID or any nbgrader metadata when there is no purpose to do so? Should it be configurable: command line option to fail or not? Should it look and see if it is an important cell type (autograded solution, etc.) and fail if this is duplicated, but not if it is an unimportant type? These are all questions that the the bigger nbgrader community could answer, too... |
2340d59 to
5b7acf9
Compare
Just putting this here to have it in the repo as well. This is not necessarily complete. I can also make this pr towards another dev branch, one that is numbered
dev700or something instead ofdev999.Changes
deduplicateids.pyupon seeing a duplicated id, converts the cell into an "unimportant" cell that nbgrader can ignore, but leaves additional cell metadata under"nbgrader_local".checkduplicateflag.pypostprocessor that is called after autograding to allow logging aDuplicateCellErrorafter autograding is complete.Converterbase class to catchDuplicateCellErrors for each notebook, instead of each assignment.checkcellmetadata.pyto give more informative message (i.e. when validating).Long Explanation
Currently, encountering multiple cells with the same
grade_id(which cell duplication results in) leads to nbgrader deleting the content of the nbgrader metadata of one of the cells (I believe the one that occurs the first in the notebook). The problem is that the deletion happens like thisinstead of
This then creates a problem in
CheckCellMetadata, which sees the"nbgrader"dictionary being empty and interprets that as corrupted metadata.Thus, the initial idea is to delete the
"nbgrader"dictionary completely. The problem is that we will have no idea that something might have gone wrong, like a potential duplication.Another complication is that we cannot take much action in the preprocessing stage in order to not interrupt the autograding of the rest of the assignment (for a given person).
Thus, I decided to change the cell into one that
nbgradercan ignore (without deleting the basic metadata to differentiate it from other cells, this part might not be too necessary), and add additional metadata under"nbgrader_local"that we can detect later in post-processing.This
"duplicate"flag can be incorporated into"nbgrader"dictionary as well, but that would require changing the schema innbgraderformat, so it might create future merging headaches unless upstreamed.With this change, autograding can continue without any problems and
duplicateflags can be detected when each notebook is graded.CheckDuplicateFlagjust looks if there is a flag added inDeduplicateIds, and if so, removes it and raises an error forConverterto log.One significant change is that the general catching of errors during autograding happens on an assignment level, see
converters/base.py::convert_notebooks. In order to not skip the rest of the assignment upon encountering a duplication error, I added a secondtry/exceptchunk inside theforloop over notebooks that only catchesDuplicateCellError.The changes under
nbgraderformatare so that the required metadata for "unimportant" cells are not hard-coded intoDeduplicateIds, so that it will be easier to update if the format ever changes.