Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
to create new columns (row entries) we always use `DataFrame::withEntry(string $entryName, ScalarFunction|WindowFunction $ref)` method.
To create new columns (row entries) we always use `DataFrame::withEntry(string $entryName, ScalarFunction|WindowFunction $ref)` method.
We can create new entry by providing a unique `$entryName`, if the entry already exists it will be replaced.

As a second argument we can provide a static value or a function that will be evaluated for each row.
Expand Down
20 changes: 8 additions & 12 deletions examples/topics/data_frame/data_frame/code.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,20 @@
declare(strict_types=1);

use function Flow\ETL\DSL\{
array_expand,
data_frame,
from_rows,
int_entry,
json_entry,
ref,
row,
rows,
from_array,
to_stream};

data_frame()
->read(
from_rows(
rows(
row(int_entry('id', 1), json_entry('array', ['a' => 1, 'b' => 2, 'c' => 3])),
)
from_array(
[
['id' => 1, 'array' => ['a' => 1, 'b' => 2, 'c' => 3]],
['id' => 2, 'array' => ['a' => 4, 'b' => 5, 'c' => 6]],
['id' => 3, 'array' => ['a' => 7, 'b' => 8, 'c' => 9]],
],
)
)
->withEntry('expanded', array_expand(ref('array')))
->collect()
->write(to_stream(__DIR__ . '/output.txt', truncate: false))
->run();
11 changes: 10 additions & 1 deletion examples/topics/data_frame/data_frame/description.md
Original file line number Diff line number Diff line change
@@ -1 +1,10 @@
Simple example of reading from rows and writing to stdout.
A Data Frame is a structured collection of tabular data, similar to a spreadsheet.
It organizes information into rows and columns, making it easy to understand, filter, and transform.
Using a Data Frame, you can quickly merge, clean, or modify data for your ETL processes,
allowing developers to focus more on transformations rather than low-level data handling.

Unlike loading an entire dataset at once, a Data Frame processes information in smaller, manageable chunks.
As it moves through the data, it only keeps a limited number of rows in memory at any given time.
This approach helps avoid running out of memory, making it efficient and scalable for handling large datasets.

Simple example of reading from php array and writing to stdout.
14 changes: 7 additions & 7 deletions examples/topics/data_frame/data_frame/output.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
+----+---------------------+----------+
| id | array | expanded |
+----+---------------------+----------+
| 1 | {"a":1,"b":2,"c":3} | 1 |
| 1 | {"a":1,"b":2,"c":3} | 2 |
| 1 | {"a":1,"b":2,"c":3} | 3 |
+----+---------------------+----------+
+----+---------------------+
| id | array |
+----+---------------------+
| 1 | {"a":1,"b":2,"c":3} |
| 2 | {"a":4,"b":5,"c":6} |
| 3 | {"a":7,"b":8,"c":9} |
+----+---------------------+
3 rows
2 changes: 1 addition & 1 deletion examples/topics/data_frame/priority.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1
3
1 change: 0 additions & 1 deletion examples/topics/data_source/csv/priority.txt

This file was deleted.

1 change: 1 addition & 0 deletions examples/topics/data_writing/array/priority.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
* @param null|Schema $schema - @deprecated use $loader->withSchema() instead
*/
#[DocumentationDSL(module: Module::CSV, type: DSLType::EXTRACTOR)]
#[DocumentationExample(topic: 'data_source', example: 'csv')]
#[DocumentationExample(topic: 'data_reading', example: 'csv')]
function from_csv(
string|Path $path,
bool $with_header = true,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
* @param ?Schema $schema - enforce schema on the extracted data - @deprecate use withSchema method instead
*/
#[DocumentationDSL(module: Module::JSON, type: Type::EXTRACTOR)]
#[DocumentationExample(topic: 'data_source', example: 'json')]
#[DocumentationExample(topic: 'data_reading', example: 'json')]
function from_json(
string|Path $path,
?string $pointer = null,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
* @param null|int $offset - @deprecated use `withOffset` method instead
*/
#[DocumentationDSL(module: Module::PARQUET, type: DSLType::EXTRACTOR)]
#[DocumentationExample(topic: 'data_source', example: 'parquet')]
#[DocumentationExample(topic: 'data_reading', example: 'parquet')]
function from_parquet(
string|Path $path,
array $columns = [],
Expand Down Expand Up @@ -52,7 +52,7 @@ function from_parquet(
* @param null|Schema $schema - @deprecated use `withSchema` method instead
*/
#[DocumentationDSL(module: Module::PARQUET, type: DSLType::LOADER)]
#[DocumentationExample(topic: 'data_sink', example: 'parquet')]
#[DocumentationExample(topic: 'data_writing', example: 'parquet')]
function to_parquet(
string|Path $path,
?Options $options = null,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
* @param string $xml_node_path - @deprecated use `from_xml($file)->withXMLNodePath($xmlNodePath)` method instead
*/
#[DocumentationDSL(module: Module::XML, type: DSLType::EXTRACTOR)]
#[DocumentationExample(topic: 'data_source', example: 'xml')]
#[DocumentationExample(topic: 'data_reading', example: 'xml')]
function from_xml(
Path|string $path,
string $xml_node_path = '',
Expand Down
11 changes: 6 additions & 5 deletions src/core/etl/src/Flow/ETL/DSL/functions.php
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,8 @@ function from_path_partitions(Path|string $path) : Extractor\PathPartitionsExtra
* @param null|Schema $schema - @deprecated use withSchema() method instead
*/
#[DocumentationDSL(module: Module::CORE, type: DSLType::EXTRACTOR)]
#[DocumentationExample(topic: 'data_source', example: 'array')]
#[DocumentationExample(topic: 'data_reading', example: 'array')]
#[DocumentationExample(topic: 'data_frame', example: 'data_frame')]
function from_array(iterable $array, ?Schema $schema = null) : ArrayExtractor
{
$extractor = new ArrayExtractor($array);
Expand Down Expand Up @@ -287,7 +288,7 @@ function to_memory(Memory $memory) : MemoryLoader
* @param-out array<array<mixed>> $array
*/
#[DocumentationDSL(module: Module::CORE, type: DSLType::LOADER)]
#[DocumentationExample(topic: 'data_sink', example: 'array')]
#[DocumentationExample(topic: 'data_writing', example: 'array')]
function to_array(array &$array) : ArrayLoader
{
return new ArrayLoader($array);
Expand Down Expand Up @@ -658,14 +659,14 @@ function col(string $entry) : EntryReference
* An alias for `ref`.
*/
#[DocumentationDSL(module: Module::CORE, type: DSLType::SCALAR_FUNCTION)]
#[DocumentationExample(topic: 'data_frame', example: 'create_entries')]
#[DocumentationExample(topic: 'data_frame', example: 'create_columns')]
function entry(string $entry) : EntryReference
{
return new EntryReference($entry);
}

#[DocumentationDSL(module: Module::CORE, type: DSLType::SCALAR_FUNCTION)]
#[DocumentationExample(topic: 'data_frame', example: 'create_entries')]
#[DocumentationExample(topic: 'data_frame', example: 'create_columns')]
function ref(string $entry) : EntryReference
{
return new EntryReference($entry);
Expand Down Expand Up @@ -696,7 +697,7 @@ function optional(ScalarFunction $function) : Optional
}

#[DocumentationDSL(module: Module::CORE, type: DSLType::SCALAR_FUNCTION)]
#[DocumentationExample(topic: 'data_frame', example: 'create_entries')]
#[DocumentationExample(topic: 'data_frame', example: 'create_columns')]
function lit(mixed $value) : Literal
{
return new Literal($value);
Expand Down
2 changes: 1 addition & 1 deletion web/landing/resources/dsl.json

Large diffs are not rendered by default.

23 changes: 23 additions & 0 deletions web/landing/src/Flow/Website/Controller/ExamplesController.php
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,22 @@ public function __construct(
#[Route('/{topic}/{example}/', name: 'example', priority: -2)]
public function example(string $topic, string $example) : Response
{
switch (\mb_strtolower($topic)) {
case 'data_sink':
return $this->redirectToRoute('example', ['topic' => 'data_writing', 'example' => $example], 301);
case 'data_source':
return $this->redirectToRoute('example', ['topic' => 'data_reading', 'example' => $example], 301);
case 'data_frame':
switch (\mb_strtolower($example)) {
case 'create_entries':
return $this->redirectToRoute('example', ['topic' => 'data_frame', 'example' => 'create_columns'], 301);
case 'rename_entries':
return $this->redirectToRoute('example', ['topic' => 'data_frame', 'example' => 'rename_columns'], 301);
case 'reorder_entries':
return $this->redirectToRoute('example', ['topic' => 'data_frame', 'example' => 'reorder_columns'], 301);
}
}

$topics = $this->examples->topics();
$currentTopic = $topic;

Expand All @@ -39,6 +55,13 @@ public function example(string $topic, string $example) : Response
#[Route('/{topic}/', name: 'topic', priority: -1)]
public function topic(string $topic) : Response
{
switch (\mb_strtolower($topic)) {
case 'data_sink':
return $this->redirectToRoute('topic', ['topic' => 'data_writing'], 301);
case 'data_source':
return $this->redirectToRoute('topic', ['topic' => 'data_reading'], 301);
}

$topics = $this->examples->topics();
$currentTopic = $topic;

Expand Down
26 changes: 14 additions & 12 deletions web/landing/templates/example/index.html.twig
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
{%- endblock -%}

{% block main %}
<div class="py-10 px-2 sm:px-4 mx-auto max-w-screen-xl" data-hx-boost="true">
<div class="py-10 px-2 sm:px-4 mx-auto max-w-screen-xl">
<h2 class="mb-4 text-2xl font-semibold tracking-wide">Examples:</h2>
<nav class="font-medium text-center bg-orange-100 rounded">
<ul class="flex whitespace-nowrap overflow-auto justify-between">
Expand Down Expand Up @@ -49,27 +49,29 @@
{% endif %}

{% apply spaceless %}
<h2 class="text-xl mt-5 mb-5">Code</h2>
<pre class="rounded p-4 overflow-auto shadow-2xl shadow-gray rounded border-gray border-2 relative">
<button class="absolute top-0 right-0 bg-orange-100 rounded px-4 leading-9 [&.copied]:before:content-['Copied!'] before:absolute before:-translate-x-24" title="copy code" data-clipboard-target="#code" {{ stimulus_controller('clipboard') }}>
<img src="{{ asset('images/icons/copy.svg') }}" alt="copy code" width="20" height="20" class="inline">
</button>
<h2 class="text-xl mt-5 mb-5">Code</h2>
<div id="code" class="relative">
<button class="absolute top-[12px] right-[12px] bg-orange-100 rounded px-4 leading-9 [&.copied]:before:content-['Copied!'] before:absolute before:-translate-x-20" data-clipboard-target="#code" {{ stimulus_controller('clipboard') }}>
<img src="{{ asset('images/icons/copy.svg') }}" alt="copy code" width="20" height="20" class="inline">
</button>
<pre class="rounded p-4 overflow-auto shadow-2xl shadow-gray rounded border-gray border-2">
<code id="code" class="language-php" data-controller="syntax-highlight" >
{{- code | escape('html') -}}
</code>
</pre>
</div>
{% endapply %}
</div>

{% if output %}
<h2 class="text-xl mt-5 mb-5">Output</h2>
<div id="output">
<div id="output" class="relative">
{% apply spaceless %}
<pre class="rounded p-4 overflow-auto shadow-2xl shadow-gray rounded border-gray border-2 relative">
<button class="absolute top-0 right-0 bg-orange-100 rounded px-4 leading-9 [&.copied]:before:content-['Copied!'] before:absolute before:-translate-x-24" title="copy code" data-clipboard-target="#output" {{ stimulus_controller('clipboard') }}>
<img src="{{ asset('images/icons/copy.svg') }}" alt="copy code" width="20" height="20" class="inline">
</button>
<code id="output" class="language-bash" {{ stimulus_controller('syntax_highlight') }}>
<button class="absolute top-[12px] right-[12px] bg-orange-100 rounded px-4 leading-9 [&.copied]:before:content-['Copied!'] before:absolute before:-translate-x-20" data-clipboard-target="#output" {{ stimulus_controller('clipboard') }}>
<img src="{{ asset('images/icons/copy.svg') }}" alt="copy code" width="20" height="20" class="inline">
</button>
<pre class="rounded p-4 shadow-2xl shadow-gray rounded border-gray border-2">
<code id="output" class="language-bash overflow-auto" {{ stimulus_controller('syntax_highlight') }}>
{{- output | escape('html') -}}
</code>
</pre>
Expand Down
Loading