From patchwork Thu Apr 16 16:56:53 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Simon Glass X-Patchwork-Id: 2199 Return-Path: X-Original-To: u-boot-concept@u-boot.org Delivered-To: u-boot-concept@u-boot.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=u-boot.org; s=default; t=1776358738; bh=1FkmUc7K8rNoVp/W2fSmf1w+S3U4ClZFEHkP9kP/eho=; h=From:To:Date:In-Reply-To:References:CC:Subject:List-Id: List-Archive:List-Help:List-Owner:List-Post:List-Subscribe: List-Unsubscribe:From; b=KaJUx3Izx4vDYcQJAVqrG8ZDj81rtNOSoZdjLYxQLmAmsA8zpp02g1G8rtIOvtNtT akcCvuAp0/3WeGuli+g0mfUWzkLH/eUB4JHu1025pcjPx5xNeHvfj37wqdkqd3Bazt VgPfly6OdsS2SiiNMqVKPtS7sdG2ORnIssUhFMEI= Received: from localhost (localhost [127.0.0.1]) by mail.u-boot.org (Postfix) with ESMTP id 857CE6A4DA for ; Thu, 16 Apr 2026 10:58:58 -0600 (MDT) X-Virus-Scanned: Debian amavis at Received: from mail.u-boot.org ([127.0.0.1]) by localhost (mail.u-boot.org [127.0.0.1]) (amavis, port 10024) with ESMTP id B1sd7KJyCrr0 for ; Thu, 16 Apr 2026 10:58:58 -0600 (MDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=u-boot.org; s=default; t=1776358736; bh=1FkmUc7K8rNoVp/W2fSmf1w+S3U4ClZFEHkP9kP/eho=; h=From:To:Date:In-Reply-To:References:CC:Subject:List-Id: List-Archive:List-Help:List-Owner:List-Post:List-Subscribe: List-Unsubscribe:From; b=prZHV08qv8VywiPpUxkaR55AXEluv6FHkDWIGQ2mIJnn1Wb8D96DD6NMRvKdEfTAK qvVurxXCeawmRYGtlzcMN8TqoHsxaKxnbRkqGvq83ySG4pUhV4VvH2DWxeguNr4uIh 8wyBsneJ0KXUMIO92VKA5YW1GJcuUO6pT+8S8xt8= Received: from mail.u-boot.org (localhost [127.0.0.1]) by mail.u-boot.org (Postfix) with ESMTP id 8CBC26A4E4 for ; Thu, 16 Apr 2026 10:58:56 -0600 (MDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=u-boot.org; s=default; t=1776358734; bh=pwatgte31XBJezwfrTBVxYYH+Fcjgk7p2l+NHI5bzFk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kCvrxWBYy24+RnSCenE9HoOXzfQA9vrF4WRh5NCBcxpV1YCJ31ZFkfJ6LskzAnBAC ZDb/d6IEs4H1b9wC9/9jCiOQVScJIBjm5QOgOGL6MK9Qp0qZXtZlfOvZvZJVxarO+M VLz9ZJfnfAgBCozNjhVG/UTAFIhR8tkXKJS22m40= Received: from localhost (localhost [127.0.0.1]) by mail.u-boot.org (Postfix) with ESMTP id 83A766A4D6; Thu, 16 Apr 2026 10:58:54 -0600 (MDT) X-Virus-Scanned: Debian amavis at Received: from mail.u-boot.org ([127.0.0.1]) by localhost (mail.u-boot.org [127.0.0.1]) (amavis, port 10026) with ESMTP id lQiHt9vd3WMz; Thu, 16 Apr 2026 10:58:54 -0600 (MDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=u-boot.org; s=default; t=1776358734; bh=0UK6mzXU27EXjonTKCskgECsTwlPyJtIvBOzK89J6pI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QWLDJyoSffIwl/2QwqUnuMi1uVG4/SjWG8i0uMRaGbErBkkBLzjqyaZdz5Dbgu2At zHNsQ9/AxWFsYE8hV4uAD3K0ib3UOVt7JNf20EAy5DjmgVe3aWqgLS0erfVApnJ1uS 96OvZxzdtYrsQuJvg5f3kQIcBO1FO4hwLOZA9GFY= Received: from u-boot.org (unknown [73.34.74.121]) by mail.u-boot.org (Postfix) with ESMTPSA id BD59D6A4D8; Thu, 16 Apr 2026 10:58:53 -0600 (MDT) From: Simon Glass To: U-Boot Concept Date: Thu, 16 Apr 2026 10:56:53 -0600 Message-ID: <20260416165733.2923423-4-sjg@u-boot.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260416165733.2923423-1-sjg@u-boot.org> References: <20260416165733.2923423-1-sjg@u-boot.org> MIME-Version: 1.0 Message-ID-Hash: ICSMEMPMCTTMRBMHET35NKKDCX2AORSK X-Message-ID-Hash: ICSMEMPMCTTMRBMHET35NKKDCX2AORSK X-MailFrom: sjg@u-boot.org X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: Simon Glass X-Mailman-Version: 3.3.10 Precedence: list Subject: [Concept] [PATCH 03/21] linux: Add NLS stubs with UTF-16 to UTF-8 conversion List-Id: Discussion and patches related to U-Boot Concept Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: From: Simon Glass The Linux kernel's NLS (National Language Support) subsystem provides character set conversion for filesystems with Unicode support. Add minimal stubs for struct nls_table, load_nls(), unload_nls(), and a basic utf16s_to_utf8s() implementation. These are needed by the isofs Joliet extension for Unicode filename support. Signed-off-by: Simon Glass --- include/charset.h | 20 ++++++++++++++++++ include/linux/nls.h | 50 +++++++++++++++++++++++++++++++++++++++++++++ lib/charset.c | 29 ++++++++++++++++++++++++++ 3 files changed, 99 insertions(+) create mode 100644 include/linux/nls.h diff --git a/include/charset.h b/include/charset.h index 348bad5883a..2d1964287d8 100644 --- a/include/charset.h +++ b/include/charset.h @@ -303,6 +303,26 @@ size_t u16_strlcat(u16 *dest, const u16 *src, size_t count); */ uint8_t *utf16_to_utf8(uint8_t *dest, const uint16_t *src, size_t size); +enum utf16_endian; + +/** + * utf16s_to_utf8s() - convert a UTF-16 string to UTF-8 with explicit endianness + * + * Linux NLS-compatible interface that wraps utf16_to_utf8(). Converts at + * most @inlen UTF-16 code units from @pwcs to UTF-8, stopping at a null + * character or when @maxout bytes have been written. Surrogate pairs are + * handled by the underlying utf16_to_utf8() implementation. + * + * @pwcs: source UTF-16 string + * @inlen: number of UTF-16 code units to convert + * @endian: byte order of the source string (UTF16_BIG_ENDIAN, etc.) + * @s: destination buffer for UTF-8 output + * @maxout: size of the destination buffer in bytes + * Return: number of bytes written to @s + */ +int utf16s_to_utf8s(const u16 *pwcs, int inlen, enum utf16_endian endian, + u8 *s, int maxout); + /** * utf_to_cp() - translate Unicode code point to 8bit codepage * diff --git a/include/linux/nls.h b/include/linux/nls.h new file mode 100644 index 00000000000..939dc1afaaa --- /dev/null +++ b/include/linux/nls.h @@ -0,0 +1,50 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Minimal NLS (National Language Support) stubs for U-Boot + * + * Based on the interface from Linux's include/linux/nls.h but heavily + * simplified: struct nls_table is trimmed to the fields used by isofs, + * load_nls() and unload_nls() are no-ops, and utf16s_to_utf8s() is a + * implementation wrapping utf16_to_utf8() in lib/charset.c + * + * Joliet support requires NLS for character set conversion. These stubs + * allow the code to compile without full NLS infrastructure. + */ +#ifndef _LINUX_NLS_H +#define _LINUX_NLS_H + +#include + +#define NLS_MAX_CHARSET_SIZE 6 + +/* UTF-16 byte order */ +enum utf16_endian { + UTF16_HOST_ENDIAN, + UTF16_LITTLE_ENDIAN, + UTF16_BIG_ENDIAN, +}; + +struct nls_table { + const char *charset; + int (*uni2char)(wchar_t uni, unsigned char *out, int boundlen); + int (*char2uni)(const unsigned char *rawstring, int boundlen, + wchar_t *uni); +}; + +static inline struct nls_table *load_nls(const char *charset) +{ + return NULL; +} + +static inline struct nls_table *load_nls_default(void) +{ + return NULL; +} + +static inline void unload_nls(struct nls_table *nls) +{ +} + +#include + +#endif /* _LINUX_NLS_H */ diff --git a/lib/charset.c b/lib/charset.c index 182c92a50c4..911659199d7 100644 --- a/lib/charset.c +++ b/lib/charset.c @@ -10,6 +10,7 @@ #include #include #include +#include #include /** @@ -311,6 +312,34 @@ int utf16_utf8_strncpy(char **dst, const u16 *src, size_t count) return 0; } +int utf16s_to_utf8s(const u16 *pwcs, int inlen, enum utf16_endian endian, + u8 *s, int maxout) +{ + u16 *tmp; + u8 *start = s; + int i; + + tmp = malloc(inlen * sizeof(u16)); + if (!tmp) + return 0; + + for (i = 0; i < inlen; i++) { + if (endian == UTF16_BIG_ENDIAN) + tmp[i] = __be16_to_cpu(pwcs[i]); + else + tmp[i] = __le16_to_cpu(pwcs[i]); + if (!tmp[i]) { + inlen = i; + break; + } + } + + s = utf16_to_utf8(s, tmp, inlen); + free(tmp); + + return min((int)(s - start), maxout); +} + s32 utf_to_lower(const s32 code) { struct capitalization_table *pos = capitalization_table;