C strings conversion to Python
Last Updated :
02 Apr, 2019
Improve
For C strings represented as a pair char *, int
, it is to decide whether or not – the string presented as a raw byte string or as a Unicode string.
Byte objects can be built using Py_BuildValue()
as
// Pointer to C string data char *s; // Length of data int len; // Make a bytes object PyObject *obj = Py_BuildValue( "y#" , s, len); |
To create a Unicode string and is it is known that s points to data encoded as UTF-8, the code given below can be used as –
PyObject *obj = Py_BuildValue( "s#" , s, len); |
If s is encoded in some other known encoding, a string using PyUnicode_Decode()
can be made as:
PyObject *obj = PyUnicode_Decode(s, len, "encoding" , "errors" ); // Example obj = PyUnicode_Decode(s, len, "latin-1" , "strict" ); obj = PyUnicode_Decode(s, len, "ascii" , "ignore" ); |
If a wide string needs to be represented as wchar_t *, len
pair. Then are few options as shown below –
// Wide character string wchar_t *w; // Length int len; // Option 1 - use Py_BuildValue() PyObject *obj = Py_BuildValue( "u#" , w, len); // Option 2 - use PyUnicode_FromWideChar() PyObject *obj = PyUnicode_FromWideChar(w, len); |
- The data from C must be explicitly decoded into a string according to some codec
- Common encodings include ASCII, Latin-1, and UTF-8.
- If you’re encoding is not known, then it is best off to encode the string as bytes instead.
- Python always copies the string data (being provided) when making an object.
- Also, for better reliability, strings should be created using both a pointer and a size rather than relying on NULL-terminated data.