Replies: 17 comments
-
@hoxbro and I were just discussing this and we both felt that there is a huge risk of scope creep here. The explorer is great functionality and fits in well in hvPlot itself because it is simply offering a UI around functionality we already provide. However as we get into the application and the CLI around it we start having to build out a lot of other functionality, including the CLI itself, the data loaders, a nice Panel template and application, and lots more. My feeling is that this should be shipped as a separate package entirely. |
Beta Was this translation helpful? Give feedback.
-
I also believe this should be a separate project, and to gain adoption I'd say it'd need to be distributed not only as a standalone application (e.g. .exe on Windows). For sure that'd be an interesting project, i.e. picking one scientific domain and building a tool that solves practitioners' needs in that field. This is very likely out of scope of hvPlot though. |
Beta Was this translation helpful? Give feedback.
-
I still think this is important. Personally, when I used to post-process model output, it was nice to do a quick check to ensure the data looked right. This meant I had to navigate the the data dir, e.g. So, I imagine a very thin CLI wrapper around hvplot, the kind defaulting to explorer if undefined:
This internally invokes import xarray as xr
import hvplot.xarray
ds = xr.open_dataset("test.nc")
ds.hvplot.explorer("lon", "lat", c="air", groupby="time").show() Or: import pandas as pd
import hvplot.pandas
df = pd.read_csv("test.csv")
df.hvplot.line("time", "temp") There'd be a mapping of extensions to file readers, e.g. EXTENSIONS_TO_FILE_READER = {
".nc": (xr.open_dataset, {}),
".csv": (pd.read_csv, {}),
".parquet": (pd.read_parquet, {}),
".grib": (xr.open_dataset, {"engine": "cfgrib"})
...
} If there's unrecognized file extension, like
Although this isn't 100% comprehensive, I think it could cover at least 65% of the scientists needs, which is enough to gain traction. I don't think it needs to be a standalone app, but that'd be a nice thing to have. |
Beta Was this translation helpful? Give feedback.
-
@ahuang11 's description sounds very reasonable to me. I'd argue that the file-reader type guessing isn't specific to the CLI reader; it's a valuable function that could be provided to open hvplottable data in general, letting people focus on loading and plotting some data file without necessarily having to understand that Xarray is what you should read NetCDF into and Pandas is what you read Parquet into. Seems useful for people getting started who may know about one data API (typically Pandas) but not others, and doesn't need to be tied to the CLI. Also seems useful for people writing code for working with data, so that they only have to write a switch statement to deal with the various distinct data objects they get back, which is vastly smaller than the number of file formats involved. So ignoring the file reading, I think I'm agreeing with @ahuang11 that the rest can be a very thin wrapper around hvPlot's Python API. In fact, like @philippjfr says of the Explorer:
I'd think we could say the same of the proposed CLI:
I.e., isn't the proposed CLI just another interface, same as the Explorer? Seems to me like it's much less heavy weight than the Explorer.
I'm not sure what Panel template and application would be needed here. Isn't the Explorer already servable as it is? I think this is coming down to @hoxbro , @maximlt , and @philippjfr imagining this to become a full-fledged standalone image-plotting application with widgets and functionality of its own, which I agree would be a separate project in its own repository and potentially expansive in scope. That's a great project for someone else to do, based on hvPlot! But it's not what I think @ahuang11 is proposing and what I'm imagining, which is a very lightweight CLI-based way of invoking hvPlot plotting and the hvPlot Explorer to do whatever they already do. Maybe the best approach here is to make a PR with an MVP of the proposed CLI along with a list of desired but unimplemented features and a list of non-features (things explicitly not considered in scope). My guess is that such a PR won't be big and the list of unimplemented but desired features won't be long, and that it should be clear whether this indeed can be a simple CLI for the hvPlot Python functionality or if it's in danger of becoming some standalone GUI application that belongs elsewhere. |
Beta Was this translation helpful? Give feedback.
-
Yes, but I seriously doubt that this is going to replace the more specialized and easier to install tools Andrew was mentioning in his first post in scientists' workflow (there wasn't much reaction on the Discourse post).
I'd guess the opposite as this interface is going to be very generic and be limiting for really exploring data (let me filter / transform the data I have in this CSV file before plotting it, let me see the original data in a table, etc.). I'd be more convinced this feature was needed in hvPlot if users were showing more interest (other people commenting, likes on the issue , etc.), if we'd find other places (Bokeh / Plotly / Pandas / Xarray / ggplot2 / etc.) where users asked for this feature (IIRC the interactive interface came from a discussion in Xarray, showing clear interest in it), if we'd find a similar tool that has some good adoption, etc.
That's maybe true. If someone embarks on working on this PR I'd just like to say that it'd have to come with documentation (reference + mentions where needed) and tests. I've also recently read on some forums that users find hvPlot not lightweight at all, and they're right, having a new dependency won't improve that so it'd be best if it could be avoided.
I think that would be a much more useful application with more chance to gain adoption. Maybe it could be based on Lumen, as Lumen does I/O and not hvPlot, and it makes it easier to add filters/transformations and custom views. |
Beta Was this translation helpful? Give feedback.
-
I think what you're imagining vs what I'm imagining is way different. From personal experience, most scientists are satisfied with a glance of their data to ensure sensible model output, which the PR does; it's quite lightweight IMO. |
Beta Was this translation helpful? Give feedback.
-
I think you're right that Lumen would be a good base for really addressing the needs of a scientist who is comfortable using the command line but not comfortable with Python and not wanting to work in Jupyter. I was such a person before coming to Python, in fact! My Masters and PhD were largely written in that way -- elaborate shell scripts that invoked commands, each written in various languages, all patched together with shell scripting rather than Python. It wouldn't be crazy to exploit Lumen's declarative interface to build a full-featured data-exploration and handling tool that would fit well into a scientist's or engineer's workflow like that. But because I've been fully bought into "just use Python for everything" for the 20 years since my PhD, and the world finally seems leaning into that too, I wouldn't be the one to push for such a project. If someone external to these projects sees that potential and wants to go for it, I'd be happy to encourage and advise them; there's tons of cool functionality they could get that way. Meanwhile, I'm very happy to keep this CLI focused squarely on exposing "whatever hvPlot Explorer already does" rather than trying to make the CLI be a complete alternative to writing Python. |
Beta Was this translation helpful? Give feedback.
-
I honestly have no idea what you are talking about :) My point was to highlight that a Lumen app would be a much better approach to build a good data explorer. The hvPlot explorer doesn't allow filtering out data which I think would be the first thing I'd implement if I have to build a data explorer. |
Beta Was this translation helpful? Give feedback.
-
I'm agreeing with you. :-) Yes, Lumen's functionality is needed to build a full-featured data explorer, as hvPlot's functionality is too limited to cover what a scientist or engineer needs. But I'm also trying to make a distinction between a CLI-focused explorer and Lumen itself, because Lumen itself (at least when using the Lumen Builder) already is such an explorer, or at least I think that's a fully valid use of Lumen Builder. But making Lumen Builder be a great data explorer isn't the same as making a full-featured CLI-based interface to Lumen (not GUI, not YAML, and not Python) that exposes its power in a shell-scriptable way. And I'm saying that doing so, putting all the power of Lumen into a CLI and not just the power of hvPlot, is a viable project but not one that I personally would want to undertake. CLI for quick plots only; Python for the rest! |
Beta Was this translation helpful? Give feedback.
-
I want to make two points here:
Lumen would have to be generalized to support xarray data and some other data formats first anyway but I agree with the point that Lumen + hvPlot is a much more sensible starting point because it encapsulates the basic building blocks that you need which is "data loaders" + "views" + "UI", instead we are now putting data loading code into hvPlot. |
Beta Was this translation helpful? Give feedback.
-
I can appreciate the immediate scope creep and data loading shouldn't be in hvplot. On the other hand, since it's already in a mostly working state (even though it doesn't work for all scenarios), I was thinking maybe it can live in holoviz-dev (or even my personal account) to prevent it from becoming completely wasted / thrown away.
Minor change made it work: ![]() |
Beta Was this translation helpful? Give feedback.
-
I don't think that's consistent with a multi-stakeholder OSS project; there can't be a requirement for prior review. Making a small PR like this makes it clear what it is and what it does, and it can then be accepted or not accepted. |
Beta Was this translation helpful? Give feedback.
-
In terms of code it's a small PR but it certainly was multiple hours of effort on Andrew's part. Anyone can of course propose any change they want and the risk you take as a external contributor that a PR isn't merged if it doesn't align with the maintainers' goals, however usually you would seek agreement from the maintainers before you make such an effort. To ignore that is of course everyone's prerogative, but here we had a situation where all three maintainers were at minimum skeptical if not opposed and the contribution isn't external, so it's not an issue of requesting prior review but one of team allocation and not wasting effort. Whatever happens, now that it exists we will find a home for it. |
Beta Was this translation helpful? Give feedback.
-
I agree; for Pandata projects the logical place for data loading code is intake and/or fsspec. Not being able to divide up responsibilities cleanly in that way is an ongoing issue beyond this PR. |
Beta Was this translation helpful? Give feedback.
-
Running:
I get this error:
Happy to try the right version. |
Beta Was this translation helpful? Give feedback.
-
Oh I haven't pushed the updates yet. |
Beta Was this translation helpful? Give feedback.
-
Okay I just updated it; missed it earlier because it was not a comment on the PR. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
With hvplot explorer soon supporting xarray/gridded datasets, I think the next logical step for hvplot explorer is a CLI (in addition to ideas from #1149)
From my experience, scientists call ncview or panoply in the terminal to do a quick validation on their datasets. This is useful and convenient because they don't have to:
Plus, it often supports most legacy, file formats.
The edge that hvplot has over these tools is probably:
(pip install hvplot geoviews)
I think it's valuable to wrap a CLI around hvplot explorer, but not just a simple argparse one, but one that's super user-friendly, like auto-complete, so that it's able to new users are able to immediately jump in and get starting using it--imagine, if the auto-complete can auto-complete the desired -x and -y from the file.
Additional discussion here: https://discourse.pangeo.io/t/do-you-use-panoply-ncview-other-command-line-viz-tool/3693/2
Beta Was this translation helpful? Give feedback.
All reactions