关于 Undertow 和 RxNetty 的一次错误排查

undertow 是一个快速高性能的容器。但在 SpringBoot 中集成 undertow 作为内置容器后,测试我司 RPC 框架序列化扩展的时候却发生了错误。客户端报的异常,仅仅是判断出报了一个 400 BadRequest,没有其他任何有效信息。

首先需要确定是客户端出现的问题还是服务端出现的问题,先从客户端角度入手,比较单测调用和 postMan 调用,结果单测是有问题的,而 postMan 是好的,那么问题肯定出现在请求报文上。

抓包两者的报文,postMan 生成的如下:

Frame 33: 458 bytes on wire (3664 bits), 458 bytes captured (3664 bits)
 
Null/Loopback
 
Internet Protocol Version 6, Src: ::1, Dst: ::1
 
Transmission Control Protocol, Src Port: 65459, Dst Port: 8080, Seq: 1530, Ack: 601, Len: 382
 
Hypertext Transfer Protocol
 
POST /soatest/provider/testJson HTTP/1.1\r\n
 
Content-Type: application/json\r\n
 
cache-control: no-cache\r\n
 
Postman-Token: c87d017d-0cef-4ac3-b365-a0c95acc28e4\r\n
 
User-Agent: PostmanRuntime/7.3.0\r\n
 
Accept: */*\r\n
 
Host: localhost:8080\r\n
 
cookie: DIYSERVERS=1; JSESSIONID=pjff2tnyhgmqa7zeyoury11f\r\n
 
accept-encoding: gzip, deflate\r\n
 
content-length: 24\r\n
 
Connection: keep-alive\r\n
 
\r\n
 
[Full request URI: http://localhost:8080/soatest/provider/testJson]
 
[HTTP request 5/5]
 
[Prev request in frame: 27]
 
[Response in frame: 37]
 
File Data: 24 bytes
 
JavaScript Object Notation: application/json

单测由框架生成的报文如下:

Frame 95: 363 bytes on wire (2904 bits), 363 bytes captured (2904 bits)
 
Null/Loopback
 
Internet Protocol Version 4, Src: 10.10.10.100, Dst: 10.10.10.100
 
Transmission Control Protocol, Src Port: 65476, Dst Port: 8080, Seq: 1, Ack: 1, Len: 307
 
Hypertext Transfer Protocol
 
POST /soatest/provider/testJson HTTP/1.1\r\n
 
content-type: application/json\r\n
 
cli-ver: soa-unit-test\r\n
 
api-sig: /soatest/provider/testJson:POST\r\n
 
req-host: 10.10.10.243\r\n
 
host: 10.10.10.243\r\n
 
User-Agent: RxNetty Client\r\n
 
transfer-encoding: chunked\r\n
 
transfer-encoding: chunked\r\n
 
\r\n
 
[Full request URI: http://10.10.10.100/soatest/provider/testJson]
 
[HTTP request 1/1]
 
[Response in frame: 97]
 
HTTP chunked response
 
File Data: 26 bytes
 
JavaScript Object Notation: application/json

粗略观察一下,怀疑可能是 transferEncoding 导致的问题,并且这里 transfer-encoding: chunked 有两个一模一样的,奇了怪了。

把 undertow 的源码下下来观摩一下,undertow 在处理请求的时候,HttpReadListener 这个类会调用一个方法:

public void handleEvent(final ConduitStreamSourceChannel channel) {
    while (requestStateUpdater.get(this) != 0) {
        //if the CAS fails it is because another thread is in the process of changing state
        //we just immediately retry
        if (requestStateUpdater.compareAndSet(this, 1, 2)) {
            try {
                channel.suspendReads();
            } finally {
                requestStateUpdater.set(this, 1);
            }
            return;
        }
    }
    handleEventWithNoRunningRequest(channel);
}

执行逻辑在 handleEventWithNoRunningRequest(channel)中,其中有一句代码:

if(!Connectors.areRequestHeadersValid(httpServerExchange.getRequestHeaders())) {
    sendBadRequestAndClose(connection.getChannel(), UndertowMessages.MESSAGES.invalidHeaders());
    return;
}

好,抛出 BadRequest 的地方已经找到了,那么点到校验逻辑里去看一眼:

public static boolean areRequestHeadersValid(HeaderMap headers) {
    HeaderValues te = headers.get(Headers.TRANSFER_ENCODING);
    HeaderValues cl = headers.get(Headers.CONTENT_LENGTH);
    if(te != null && cl != null) {
        return false;
    } else if(te != null && te.size() > 1) {
        return false;
    } else if(cl != null && cl.size() > 1) {
        return false;
    }
    return true;
}

这里问题已经很明确了,就是 transfer-encoding 不能有 2 个。

那么为什么 transfer-encoding 有 2 个呢,由于 soa 框架客户端使用了 rxNetty,那么这里要观摩一下 rxNetty 的代码。

在 rxNetty 中,通过 ClientRequestResponseConverter 的 write 方法书写报文,其中有一段:

switch (rxRequest.getContentSourceType()) {
    case Raw:
        if (!rxRequest.getHeaders().isContentLengthSet()) {
            rxRequest.getHeaders().add(HttpHeaders.Names.TRANSFER_ENCODING, HttpHeaders.Values.CHUNKED);
        }
        contentSource = rxRequest.getRawContentSource();
        break;
    case Typed:
        if (!rxRequest.getHeaders().isContentLengthSet()) {
            rxRequest.getHeaders().add(HttpHeaders.Names.TRANSFER_ENCODING, HttpHeaders.Values.CHUNKED);
        }
        contentSource = rxRequest.getContentSource();
        break;
    case Absent:
        if (!rxRequest.getHeaders().isContentLengthSet() && rxRequest.getMethod() != HttpMethod.GET) {
            rxRequest.getHeaders().set(HttpHeaders.Names.CONTENT_LENGTH, 0);
        }
        break;
}
 
writeHttpHeaders(ctx, rxRequest, allWritesListener); // In all cases, write headers first.
 
if (null != contentSource) { // If content present then write Last Content after all content is written.
    if (!rxRequest.getHeaders().isContentLengthSet()) {
        rxRequest.getHeaders().add(HttpHeaders.Names.TRANSFER_ENCODING, HttpHeaders.Values.CHUNKED);
    }
    writeContent(ctx, allWritesListener, contentSource, promise, rxRequest, stateToUse);
}

可以看到,在判断是 raw 类型后如果没有设置 content-length 就会设置为 chunked,但是奇怪就奇怪在判断完类型之后,又来一次判断有没有 content-length 没有就设置 chunked 的逻辑。相当于执行了 2 遍
rxRequest.getHeaders().add(HttpHeaderNames.TRANSFER_ENCODING, HttpHeaderValues.CHUNKED);
我不是很明白为什么作者要加两次 transferEncoding ,查了一下 Git 记录 ,发现改动的 Git 提交日志是 fix issue 169,但是看了
Rxnetty-issue-169 还是不明所以。
总而言之,这里拉下来改造一下就可以了,后面那次加 chunked 的逻辑可以改为:

if (null != contentSource) { // If content present then write Last Content after all
                             // content is written.
    if (!rxRequest.getHeaders().isContentLengthSet() && !rxRequest.getHeaders().contains(HttpHeaderNames.TRANSFER_ENCODING)) {
        rxRequest.getHeaders().add(HttpHeaderNames.TRANSFER_ENCODING,
                HttpHeaderValues.CHUNKED);
    }
留下你的脚步
推荐阅读