■
When a character's codepoint is beyond U+10000
, I should use 32-bit literal.
>>> '\u1f4a9' 'Ὂ9' >>> '\U0001f4a9' '💩'
If I try to input '💩' directly in Jupyter console
on Windows cmd
, it aborts with the error:
Traceback (most recent call last): File "C:\ProgramData\Anaconda3\Scripts\jupyter-console-script.py", line 10, in <module> sys.exit(main()) File "C:\ProgramData\Anaconda3\lib\site-packages\jupyter_core\application.py", line 267, in launch_instance return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance app.start() File "C:\ProgramData\Anaconda3\lib\site-packages\jupyter_console\app.py", line 155, in start self.shell.mainloop() File "C:\ProgramData\Anaconda3\lib\site-packages\jupyter_console\ptshell.py", line 508, in mainloop self.interact() File "C:\ProgramData\Anaconda3\lib\site-packages\jupyter_console\ptshell.py", line 492, in interact code = self.prompt_for_code() File "C:\ProgramData\Anaconda3\lib\site-packages\jupyter_console\ptshell.py", line 440, in prompt_for_code reset_current_buffer=True) File "C:\ProgramData\Anaconda3\lib\site-packages\prompt_toolkit\interface.py", line 415, in run self.eventloop.run(self.input, self.create_eventloop_callbacks()) File "C:\ProgramData\Anaconda3\lib\site-packages\prompt_toolkit\eventloop\win32.py", line 80, in run for k in keys: File "C:\ProgramData\Anaconda3\lib\site-packages\prompt_toolkit\terminal\win32_input.py", line 143, in read all_keys = list(self._get_keys(read, input_records)) File "C:\ProgramData\Anaconda3\lib\site-packages\prompt_toolkit\terminal\win32_input.py", line 186, in _get_keys for key_press in self._event_to_key_presses(ev): File "C:\ProgramData\Anaconda3\lib\site-packages\prompt_toolkit\terminal\win32_input.py", line 225, in _event_to_key_presses ascii_char = u_char.encode('utf-8') UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83d' in position 0: surrogates not allowed
This is because ev.uChar.UnicodeChar
returns one of a surrogate pair, \ud83d
, instead of a complete character \U0001F4A9
. It depends on Windows KEY_EVENT_RECORD
structure (https://docs.microsoft.com/en-us/windows/console/key-event-record-str).
It seems _event_to_key_resses(ev)
is called for each element of a surrogate pair. So we should check the key press is one of a surrogate pair and store an element of the pair.
def _get_keys(self, read, input_records): """ Generator that yields `KeyPress` objects from the input records. """ for i in range(read.value): ir = input_records[i] # Get the right EventType from the EVENT_RECORD. # (For some reason the Windows console application 'cmder' # [http://gooseberrycreative.com/cmder/] can return '0' for # ir.EventType. -- Just ignore that.) if ir.EventType in EventTypes: ev = getattr(ir.Event, EventTypes[ir.EventType]) # Process if this is a key event. (We also have mouse, menu and # focus events.) if type(ev) == KEY_EVENT_RECORD and ev.KeyDown: for key_press in self._event_to_key_presses(ev): yield key_press elif type(ev) == MOUSE_EVENT_RECORD: for key_press in self._handle_mouse(ev): yield key_press