Skip to content

Commit

Permalink
[Data Liberation] Add XML API, Stream API, WXR URL Rewriter API (#1952)
Browse files Browse the repository at this point in the history
A part of #1894.
Follows up on
#1893.

This PR brings in a few more PHP APIs that were initially explored
outside of Playground so that they can be incubated in Playground. See
the linked descriptions for more details about each API:

* XML Processor from
WordPress/wordpress-develop#6713
* Stream chain from adamziel/wxr-normalize#1
* A draft of a WXR URL Rewriter class capable of rewriting URLs in WXR
files

## Testing instructions

* Confirm the PHPUnit tests pass in CI
* Confirm the test suite looks reasonabel
* That's it for now! It's all new code that's not actually used anywhere
in Playground yet. I just want to merge it to keep iterating and
improving.
  • Loading branch information
adamziel authored Oct 28, 2024
1 parent e5813df commit 4ecf6cc
Show file tree
Hide file tree
Showing 25 changed files with 6,400 additions and 172 deletions.
41 changes: 18 additions & 23 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,65 +4,60 @@ All notable changes to this project are documented in this file by a CI job
that runs on every NPM release. The file follows the [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
format.

## [v1.0.7] (2024-10-28)
## [v1.0.7] (2024-10-28)




## [v1.0.6] (2024-10-28)
## [v1.0.6] (2024-10-28)

### Website

- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))
- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))

### Contributors

The following contributors merged PRs in this release:

@adamziel @bgrgicak


## [v1.0.5] (2024-10-25)
## [v1.0.5] (2024-10-25)

### Enhancements

- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))
- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))

### Blueprints

- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))
- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))

### Documentation

- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))
- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))

### PHP WebAssembly

- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))
- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))

### Website

- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))
- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))

### Bug Fixes

- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))
- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))

### Contributors

The following contributors merged PRs in this release:

@adamziel @ajotka @bgrgicak @bph @brandonpayton @ockham @psrpinto


## [v1.0.4] (2024-10-21)

### Enhancements
Expand Down
41 changes: 18 additions & 23 deletions packages/docs/site/docs/main/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,65 +9,60 @@ All notable changes to this project are documented in this file by a CI job
that runs on every NPM release. The file follows the [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
format.

## [v1.0.7] (2024-10-28)
## [v1.0.7] (2024-10-28)




## [v1.0.6] (2024-10-28)
## [v1.0.6] (2024-10-28)

### Website

- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))
- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))

### Contributors

The following contributors merged PRs in this release:

@adamziel @bgrgicak


## [v1.0.5] (2024-10-25)
## [v1.0.5] (2024-10-25)

### Enhancements

- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))
- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))

### Blueprints

- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))
- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))

### Documentation

- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))
- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))

### PHP WebAssembly

- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))
- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))

### Website

- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))
- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))

### Bug Fixes

- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))
- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))

### Contributors

The following contributors merged PRs in this release:

@adamziel @ajotka @bgrgicak @bph @brandonpayton @ockham @psrpinto


## [v1.0.4] (2024-10-21)

### Enhancements
Expand Down
14 changes: 14 additions & 0 deletions packages/playground/data-liberation/bootstrap.php
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
<?php

require_once __DIR__ . '/src/stream-api/WP_Stream_Processor.php';
require_once __DIR__ . '/src/stream-api/WP_Byte_Stream_State.php';
require_once __DIR__ . '/src/stream-api/WP_Byte_Stream.php';
require_once __DIR__ . '/src/stream-api/WP_Processor_Byte_Stream.php';
require_once __DIR__ . '/src/stream-api/WP_File_Byte_Stream.php';
require_once __DIR__ . '/src/stream-api/WP_Stream_Paused_State.php';
require_once __DIR__ . '/src/stream-api/WP_Stream_Chain.php';

require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-token.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-span.php";
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-text-replacement.php";
Expand All @@ -20,6 +28,12 @@
require_once __DIR__ . '/src/WP_Block_Markup_Url_Processor.php';
require_once __DIR__ . '/src/WP_URL_In_Text_Processor.php';
require_once __DIR__ . '/src/WP_URL.php';

require_once __DIR__ . '/src/xml-api/WP_XML_Decoder.php';
require_once __DIR__ . '/src/xml-api/WP_XML_Tag_Processor.php';
require_once __DIR__ . '/src/xml-api/WP_XML_Processor.php';
require_once __DIR__ . '/src/WP_WXR_URL_Rewrite_Processor.php';

require_once __DIR__ . '/vendor/autoload.php';


Expand Down
5 changes: 4 additions & 1 deletion packages/playground/data-liberation/phpunit.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,15 @@
<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" bootstrap="bootstrap.php" colors="true" xsi:noNamespaceSchemaLocation="https://schema.phpunit.de/10.0/phpunit.xsd" cacheDirectory=".phpunit.cache">
<testsuites>
<testsuite name="Application Test Suite">
<file>tests/WPWXRURLRewriterTests.php</file>
<file>tests/WPRewriteUrlsTests.php</file>
<file>tests/WPURLInTextProcessorTests.php</file>
<file>tests/WPBlockMarkupProcessorTests.php</file>
<file>tests/WPBlockMarkupUrlProcessorTests.php</file>
<file>tests/URLParserWHATWGComplianceTests.php</file>
<file>tests/UrldecodeNTests.php</file>
<file>tests/WPXMLProcessorTests.php</file>
<file>tests/WPXMLTagProcessorTests.php</file>
<file>tests/UrldecodeNTests.php</file>
</testsuite>
</testsuites>
</phpunit>
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ public function next_url() {
}

$tld = strtolower( substr( $parsed_url->hostname, $last_dot_position + 1 ) );
if ( empty( self::$public_suffix_list[ $tld ] ) ) {
if ( empty( self::$public_suffix_list[ $tld ] ) && $tld !== 'internal' ) {
// This TLD is not in the public suffix list. It's not a valid domain name.
continue;
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
<?php

class WP_WXR_URL_Rewrite_Processor {


public static function stream( $current_site_url, $new_site_url ) {
return WP_XML_Processor::stream(
function ( $processor ) use ( $current_site_url, $new_site_url ) {
if ( static::is_wxr_content_node( $processor ) ) {
$text = $processor->get_modifiable_text();
$updated_text = wp_rewrite_urls(
array(
'block_markup' => $text,
'current-site-url' => $current_site_url,
'new-site-url' => $new_site_url,
)
);
if ( $updated_text !== $text ) {
$processor->set_modifiable_text( $updated_text );
}
}
}
);
}

private static function is_wxr_content_node( WP_XML_Processor $processor ) {
$breadcrumbs = $processor->get_breadcrumbs();
if (
! in_array( 'excerpt:encoded', $breadcrumbs, true ) &&
! in_array( 'content:encoded', $breadcrumbs, true ) &&
! in_array( 'guid', $breadcrumbs, true ) &&
! in_array( 'link', $breadcrumbs, true ) &&
! in_array( 'wp:attachment_url', $breadcrumbs, true ) &&
! in_array( 'wp:comment_content', $breadcrumbs, true ) &&
! in_array( 'wp:base_site_url', $breadcrumbs, true ) &&
! in_array( 'wp:base_blog_url', $breadcrumbs, true )
// Meta values are not supported yet. We'll need to support
// WordPress core options that may be saved as JSON, PHP Deserialization, and XML,
// and then provide extension points for plugins authors support
// their own options.
// !in_array('wp:postmeta', $processor->get_breadcrumbs())
) {
return false;
}
return true;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<?php

abstract class WP_Byte_Stream {

protected $state;

public function __construct() {
$this->state = new WP_Byte_Stream_State();
}

public function is_eof(): bool {
return ! $this->state->output_bytes && $this->state->state === WP_Byte_Stream_State::STATE_FINISHED;
}

public function get_file_id() {
return $this->state->file_id;
}

public function skip_file(): void {
$this->state->last_skipped_file = $this->state->file_id;
}

public function is_skipped_file() {
return $this->state->file_id === $this->state->last_skipped_file;
}

public function get_chunk_type() {
if ( $this->get_last_error() ) {
return '#error';
}

if ( $this->is_eof() ) {
return '#eof';
}

return '#bytes';
}

public function append_eof() {
$this->state->input_eof = true;
}

public function append_bytes( string $bytes, $context = null ) {
$this->state->input_bytes .= $bytes;
$this->state->input_context = $context;
}

public function get_bytes() {
return $this->state->output_bytes;
}

public function next_bytes() {
$this->state->reset_output();
if ( $this->is_eof() ) {
return false;
}

// Process any remaining buffered input:
if ( $this->generate_next_chunk() ) {
return ! $this->is_skipped_file();
}

if ( ! $this->state->input_bytes ) {
if ( $this->state->input_eof ) {
$this->state->finish();
}
return false;
}

$produced_bytes = $this->generate_next_chunk();

return $produced_bytes && ! $this->is_skipped_file();
}

abstract protected function generate_next_chunk(): bool;

public function get_last_error(): string|null {
return $this->state->last_error;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
<?php

/**
* This interface describes standalone streams, but it can also be
* used to describe a stream Processor like WP_XML_Processor.
*
* In this prototype there are no pipes, streams, and processors. There
* are only Byte Streams that can be chained together with the StreamChain
* class.
*/
class WP_Byte_Stream_State {
const STATE_STREAMING = '#streaming';
const STATE_FINISHED = '#finished';

public $input_eof = false;
public $input_bytes = null;
public $output_bytes = null;
public $state = self::STATE_STREAMING;
public $last_error = null;
public $input_context = null;

public $file_id;
public $last_skipped_file;

public function reset_output() {
$this->output_bytes = null;
$this->file_id = 'default';
$this->last_error = null;
}

public function consume_input_bytes() {
$bytes = $this->input_bytes;
$this->input_bytes = null;
return $bytes;
}

public function finish() {
$this->state = self::STATE_FINISHED;
}
}
Loading

0 comments on commit 4ecf6cc

Please sign in to comment.