-
Notifications
You must be signed in to change notification settings - Fork 373
-
DescriptionHello, I am working on a quarto document using exclusively python code cells. I work with VS code Quarto extension so I interactively run my code cells as I go along. I then usually render to HTML on the external terminal (or in the VS Code internal terminal, whichever). I am now getting an error midway the rendering saying that a variable is not defined, while it has clearly been defined in a previous code cell. I am 100% sure there is no type, and just clicking (run cell) sequentially in VS Code does not raise the same error. I even noticed the following. Let's say that the variable in question "x" is defined in cell 12/33. Initially I got the error at cell 13/33 that tries to re-use the variable. I then merged the two code cells, and now the error comes with the next separate code cell that tries to use the variable again. Any ideas what is going on here? Thanks. |
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment · 6 replies
-
Without the means to reproduce such as providing the steps, the setup and a document or repository, there is virtually no way we can answer or help. You can share a Quarto document using the following syntax, i.e., using more backticks than you have in your document (usually four ````qmd
---
title: "Reproducible Quarto Document"
format: html
---
This is a reproducible Quarto document using `format: html`.
It is written in Markdown and contains embedded R code.
When you run the code, it will produce a plot.
```{r}
plot(cars)
```
The end.
```` |
Beta Was this translation helpful? Give feedback.
All reactions
-
From this comment I am guessing you are using Jupyter Engine with Quarto. It seems also you are using Project with a specific Python environment management using poetry. Anyhow, as you see, we are currently trying to guess the situation and in which context you are using Quarto when this error happens. I would be really helpful to share with us as much information as can be of interest to understand the context that leads to this situation. Quarto is a broad tool that can do a lot of things, with different technologie when using computation so some details is required to help us help you.
When using Jupyter engine, we are using some Jupyter toolchain (nbclient I believe) to get the results information from the executed notebook. Maybe some limit is hit there 🤔 Anyhow, you did not share it so hard to know for sure, but it seems to me the error you get may not be directly from Quarto but maybe from Jupyter or another tool in the stack. More details would help again. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thanks for following up! Here's an attempt to share more details. In the meantime, I will look into your suggestion regarding the Jupyter toolchain. About the environment Indeed, I am using the Jupyter engine as the code in the .qmd file is exclusively python. And indeed, my environment is managed by poetry. Here are the contents of the Details
More context First, let me re-iterate that with VS Code and the Quarto extension, I can click To render the document on the command line I used the following command: Usually it takes ~3 minutes to get going (so no output whatsoever on the CLI - I never figured out why, and is not the issue I want to discuss here) and then this is the stdout:
This is the header of the
Below is the content of the cells 23 and 24. In the former I have included the comment Load the DIA-NN data, using the *gg_matrix* file. The dataframe is manipulated to transform intensities using log2, and add columns with the mean intensity for each gene and whether they have missing values:
```{python}:
# Read the diann precursors matrix
def load_diann_gg_matrix(infile):
df = pl.read_csv(
infile,
separator="\t",
)
return df
# Set the paths
dataset2_gg_path = "results/report.gg_matrix.tsv"
dataset3_gg_path = "results/report.gg_matrix.tsv"
# Load to polars dataframes
dataset2_gg_df = load_diann_gg_matrix(
dataset2_gg_path,
)
dataset3_gg_df = load_diann_gg_matrix(
dataset3_gg_path,
)
# Melt dataframe based on genes
def melt_gg_matrix(df):
return dataset2_gg_df.melt(
id_vars="Genes",
variable_name="Samples",
value_name="Intensities"
)
dataset2_gg_melted = melt_gg_matrix(dataset2_gg_df)
dataset3_gg_melted = melt_gg_matrix(dataset3_gg_df)
# Transform the intensities to log2
def log2_transform(df):
return df.with_columns(
pl.col("Intensities").log(base=2)
)
dataset2_gg_melted = log2_transform(dataset2_gg_melted)
dataset3_gg_melted = log2_transform(dataset3_gg_melted)
# Groupby gene and aggregate adding mean intensities and number of missing values
def groupby_gene(df, groupby_col="Genes"):
return (
df
.groupby(groupby_col)
.agg(
pl.mean("Intensities").alias("avg_intensity"),
pl.col("Intensities").null_count().alias("null_count"),
pl.col("Samples").count().alias("Count"),
)
)
dataset2_genes = groupby_gene(dataset2_gg_melted)
dataset3_genes = groupby_gene(dataset3_gg_melted)
# Add column with true/false for missingness
def add_missingness_column(df):
return df.with_columns((pl.col("null_count") > 0).alias("has_missing"))
dataset2_genes = add_missingness_column(dataset2_genes)
dataset3_genes = add_missingness_column(dataset3_genes)
### SPLIT WAS HERE
def plot_missing_intensities(df, dataset_name):
plt.figure()
sns.set_theme(style="whitegrid")
ax = sns.kdeplot(
data=df,
x="avg_intensity",
hue="has_missing",
common_norm=False,
)
ax.set_title(
f"Density plot for mean intensity for\n"
f"genes with and without missing values,\n"
f"{dataset_name}"
)
ax.set_xlabel("Mean log2-intensity")
# Change the legend title to "Has missing values"
sns.move_legend(
ax,
"upper right",
title="Missing values",
)
return
plot_missing_intensities(dataset2_genes, "dataset 2") ### ERROR WAS HERE
plot_missing_intensities(dataset3_genes, "dataset 3")
```
We can also plot the number of missing values and percentage of missingness in a linear regression model.
```{python}
def plot_missing_values_reg(df, dataset_name):
plt.figure()
sns.set_theme(style="darkgrid")
x_axis_label = "Missingness (%)"
y_axis_label = "Mean log2 intensity"
# Prepare df
df = df.with_columns(
((pl.col("null_count") / pl.col("Count")) * 100).alias(x_axis_label),
).rename(
{"avg_intensity": y_axis_label}
).to_pandas()
g = sns.JointGrid()
x, y = df[x_axis_label], df[y_axis_label]
k = sns.regplot(
x=x,
y=y,
truncate=True,
color="steelblue",
marker=".",
scatter_kws={"alpha": 0.7, "edgecolors": "steelblue", "linewidths": 0.5},
line_kws={"linewidth": 0.8},
ax=g.ax_joint,
)
plt.setp(k.collections[1], alpha=0.6, color="white")
sns.histplot(
x=x,
kde=True,
color="steelblue",
ax=g.ax_marg_x,
)
sns.histplot(
y=y,
kde=True,
color="steelblue",
ax=g.ax_marg_y,
)
g.figure.set_layout_engine(layout="tight")
g.figure.suptitle(
f"Linear regression for missingness and meani log2-intensity\n"
f"{dataset_name}"
)
return
plot_missing_values_reg(dataset2_genes, "dataset 2") ### ERROR IS NOW HERE
plot_missing_values_reg(dataset3_genes, "dataset 3") |
Beta Was this translation helpful? Give feedback.
All reactions
-
How is that obvious that the environment is the same from VSCode UI and when running Did you try to write the file as a Jupyter notebook directly? |
Beta Was this translation helpful? Give feedback.
All reactions
-
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Glad your problem was resolved! |
Beta Was this translation helpful? Give feedback.
Because I can see in VS Code's internal terminal that upon opening this workspace it activates the same environment:
source /home/username/.cache/pypoetry/virtualenvs/nmd-project-7CCkg3Km-py3.10/bin/activate
I can also see the correct environment activated at the top of the interactive python window:
Also, I am using packages that are not installed anywhere else outside of this environment, so I would be getting errors related to that.
Finally, I would say it really does not seem that this issue is related to environment configuration right? I have rendered the document other times when it was shorter and it used to work.
Resolved
This is embarassing. Somehow I had a colon (
:
) right afte…