Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

gh-119182: Add PyUnicodeWriter C API #119184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 17, 2024
Merged
80 changes: 80 additions & 0 deletions 80 Doc/c-api/unicode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1502,3 +1502,83 @@ They all return ``NULL`` or ``-1`` if an exception occurs.
:c:func:`PyUnicode_InternInPlace`, returning either a new Unicode string
object that has been interned, or a new ("owned") reference to an earlier
interned string object with the same value.

PyUnicodeWriter
^^^^^^^^^^^^^^^

The :c:type:`PyUnicodeWriter` API can be used to create a Python :class:`str`
object.

.. versionadded:: 3.14

.. c:type:: PyUnicodeWriter

An Unicode writer instance.
vstinner marked this conversation as resolved.
Show resolved Hide resolved

The instance must be destroyed by :c:func:`PyUnicodeWriter_Finish` on
success, or :c:func:`PyUnicodeWriter_Discard` on error.

.. c:function:: PyUnicodeWriter* PyUnicodeWriter_Create(Py_ssize_t length)

Create an Unicode writer instance.
encukou marked this conversation as resolved.
Show resolved Hide resolved
vstinner marked this conversation as resolved.
Show resolved Hide resolved

Set an exception and return ``NULL`` on error.

.. c:function:: PyObject* PyUnicodeWriter_Finish(PyUnicodeWriter *writer)

Return the final Python :class:`str` object and destroy the writer instance.

Set an exception and return ``NULL`` on error.
erlend-aasland marked this conversation as resolved.
Show resolved Hide resolved

.. c:function:: void PyUnicodeWriter_Discard(PyUnicodeWriter *writer)

Discard the internal Unicode buffer and destroy the writer instance.

.. c:function:: int PyUnicodeWriter_WriteChar(PyUnicodeWriter *writer, Py_UCS4 ch)

Write the single Unicode character *ch* into *writer*.

On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.
serhiy-storchaka marked this conversation as resolved.
Show resolved Hide resolved

.. c:function:: int PyUnicodeWriter_WriteUTF8(PyUnicodeWriter *writer, const char *str, Py_ssize_t size)

Decode the string *str* from UTF-8 in strict mode and write the output into *writer*.
vstinner marked this conversation as resolved.
Show resolved Hide resolved

*size* is the string length in bytes. If *size* is equal to ``-1``, call
``strlen(str)`` to get the string length.

On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.

.. c:function:: int PyUnicodeWriter_WriteStr(PyUnicodeWriter *writer, PyObject *obj)

Call :c:func:`PyObject_Str` on *obj* and write the output into *writer*.

On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.

.. c:function:: int PyUnicodeWriter_WriteRepr(PyUnicodeWriter *writer, PyObject *obj)

Call :c:func:`PyObject_Repr` on *obj* and write the output into *writer*.

On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.

.. c:function:: int PyUnicodeWriter_WriteSubstring(PyUnicodeWriter *writer, PyObject *str, Py_ssize_t start, Py_ssize_t end)

Write the substring ``str[start:end]`` into *writer*.

*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
Comment on lines +1576 to +1578
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; I prefer to use SemBr for paragraphs like this.

Suggested change
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
*str* must be Python :class:`str` object.
*start* must be greater than or equal to 0,
and less than or equal to *end*.
*end* must be less than or equal to *str* length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that this is called SemBr!

Breaking on comma may be too much, but I prefer to break at the sentence boundary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. You don't need to break at comma, but I often do to minimise future diffs.

Alternative suggestion:

Suggested change
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
*str* must be Python :class:`str` object.
*start* must be greater than or equal to 0, and less than or equal to *end*.
*end* must be less than or equal to *str* length.


On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.

.. c:function:: int PyUnicodeWriter_Format(PyUnicodeWriter *writer, const char *format, ...)

Similar to :c:func:`PyUnicode_FromFormat`, but write the output directly into *writer*.

On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.
15 changes: 15 additions & 0 deletions 15 Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,21 @@ New Features
* Add :c:func:`PyLong_GetSign` function to get the sign of :class:`int` objects.
(Contributed by Sergey B Kirpichev in :gh:`116560`.)

* Add a new :c:type:`PyUnicodeWriter` API to create a Python :class:`str`
object:

* :c:func:`PyUnicodeWriter_Create`.
* :c:func:`PyUnicodeWriter_Discard`.
* :c:func:`PyUnicodeWriter_Finish`.
* :c:func:`PyUnicodeWriter_WriteChar`.
* :c:func:`PyUnicodeWriter_WriteUTF8`.
* :c:func:`PyUnicodeWriter_WriteStr`.
* :c:func:`PyUnicodeWriter_WriteRepr`.
* :c:func:`PyUnicodeWriter_WriteSubstring`.
* :c:func:`PyUnicodeWriter_Format`.

(Contributed by Victor Stinner in :gh:`119182`.)

Porting to Python 3.14
----------------------

Expand Down
37 changes: 35 additions & 2 deletions 37 Include/cpython/unicodeobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,40 @@ PyAPI_FUNC(PyObject*) PyUnicode_FromKindAndData(
Py_ssize_t size);


/* --- _PyUnicodeWriter API ----------------------------------------------- */
/* --- Public PyUnicodeWriter API ----------------------------------------- */

typedef struct PyUnicodeWriter PyUnicodeWriter;

PyAPI_FUNC(PyUnicodeWriter*) PyUnicodeWriter_Create(Py_ssize_t length);
PyAPI_FUNC(void) PyUnicodeWriter_Discard(PyUnicodeWriter *writer);
PyAPI_FUNC(PyObject*) PyUnicodeWriter_Finish(PyUnicodeWriter *writer);

PyAPI_FUNC(int) PyUnicodeWriter_WriteChar(
PyUnicodeWriter *writer,
Py_UCS4 ch);
PyAPI_FUNC(int) PyUnicodeWriter_WriteUTF8(
PyUnicodeWriter *writer,
const char *str,
Py_ssize_t size);

PyAPI_FUNC(int) PyUnicodeWriter_WriteStr(
PyUnicodeWriter *writer,
PyObject *obj);
PyAPI_FUNC(int) PyUnicodeWriter_WriteRepr(
PyUnicodeWriter *writer,
PyObject *obj);
PyAPI_FUNC(int) PyUnicodeWriter_WriteSubstring(
PyUnicodeWriter *writer,
PyObject *str,
Py_ssize_t start,
Py_ssize_t end);
PyAPI_FUNC(int) PyUnicodeWriter_Format(
PyUnicodeWriter *writer,
const char *format,
...);


/* --- Private _PyUnicodeWriter API --------------------------------------- */

typedef struct {
PyObject *buffer;
Expand All @@ -466,7 +499,7 @@ typedef struct {
/* If readonly is 1, buffer is a shared string (cannot be modified)
and size is set to 0. */
unsigned char readonly;
} _PyUnicodeWriter ;
} _PyUnicodeWriter;

// Initialize a Unicode writer.
//
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Add a new :c:type:`PyUnicodeWriter` API to create a Python :class:`str` object:

* :c:func:`PyUnicodeWriter_Create`.
* :c:func:`PyUnicodeWriter_Discard`.
* :c:func:`PyUnicodeWriter_Finish`.
* :c:func:`PyUnicodeWriter_WriteChar`.
* :c:func:`PyUnicodeWriter_WriteUTF8`.
* :c:func:`PyUnicodeWriter_WriteStr`.
* :c:func:`PyUnicodeWriter_WriteRepr`.
* :c:func:`PyUnicodeWriter_WriteSubstring`.
* :c:func:`PyUnicodeWriter_Format`.

Patch by Victor Stinner.
Loading
Loading
Morty Proxy This is a proxified and sanitized view of the page, visit original site.