Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZA̡͊͠͝LGΌ causes "Invalid MD5 checksum on messages" #2362

Closed
pauldraper opened this issue May 8, 2021 · 10 comments · Fixed by #2381
Closed

ZA̡͊͠͝LGΌ causes "Invalid MD5 checksum on messages" #2362

pauldraper opened this issue May 8, 2021 · 10 comments · Fixed by #2381
Assignees
Labels
bug This issue is a bug.

Comments

@pauldraper
Copy link
Contributor

pauldraper commented May 8, 2021

The unholy child weeps the blood of virgins, the tainted souls are summoned into the realm of the living, dooming humanity to an eternity of dread torture, transporting the programmer's consciousness into a world of ceaseless screaming. You have summoned Zalgo.

const decodeEscapedXML = (str: string) =>
  str
    .replace(/&/g, "&")
    .replace(/'/g, "'")
    .replace(/"/g, '"')
    .replace(/>/g, ">")
    .replace(/&lt;/g, "<");

https://github.com/aws/aws-sdk-js-v3/blob/v3.14.0/clients/client-sqs/protocols/Aws_query.ts#L3133-L3139

Maybe try using an XML parser instead?


There are multiple obvious problems with this:

  1. &#xD; decodes to &#xD; (wrong) not carriage return (correct)
  2. &amp;lt; decodes to < (wrong) not &lt; (correct)

Thanks to (1), all my HL7 messages in my SQS FIFO queue won't process because carriage returns broke the client. I haven't run into (2), but I'm sure I will eventually.

Such is the life of the foolish souls who try parsing XML with 5 lines of regex.

(Silver lining is thanks to the MD5 checksum, this bug only causes unavailability instead of corruption.)

@pauldraper pauldraper added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 8, 2021
pauldraper added a commit to pauldraper/aws-sdk-js-v3 that referenced this issue May 8, 2021
@alexforsyth
Copy link
Contributor

Love the description. Lemme take a look

@alexforsyth
Copy link
Contributor

We're gonna do an internal review later this morning. When we have some consensus we'll open it up to public discussion/PRs

@alexforsyth alexforsyth self-assigned this May 10, 2021
@alexforsyth alexforsyth added this to To do in High Priority May 10, 2021
@trivikr
Copy link
Member

trivikr commented May 10, 2021

Maybe try using an XML parser instead?

We use fast-xml-parser for processing XML as seen in the code just below the decodeEscapedXML function

const decodeEscapedXML = (str: string) =>
str
.replace(/&amp;/g, "&")
.replace(/&apos;/g, "'")
.replace(/&quot;/g, '"')
.replace(/&gt;/g, ">")
.replace(/&lt;/g, "<");
const parseBody = (streamBody: any, context: __SerdeContext): any =>
collectBodyString(streamBody, context).then((encoded) => {
if (encoded.length) {
const parsedObj = xmlParse(encoded, {
attributeNamePrefix: "",
ignoreAttributes: false,
parseNodeValue: false,
trimValues: false,
tagValueProcessor: (val, tagName) => (val.trim() === "" ? "" : decodeEscapedXML(val)),
});

The decodeEscapedXML function is a post-processor for tag values inside fast-xml-parser.

@trivikr trivikr assigned trivikr and unassigned alexforsyth May 10, 2021
@alexforsyth
Copy link
Contributor

@trivikr is going to do a deeper dive here and come up with a fix

@trivikr

This comment has been minimized.

@trivikr
Copy link
Member

trivikr commented May 10, 2021

I'm able to reproduce issue with SQS message message hello\r\nworld

$ node -v
v14.16.1
package.json
    "@aws-sdk/client-sqs": "3.15.0",
    "md5": "2.3.0"
Code
import { SQS } from "@aws-sdk/client-sqs";
import md5 from "md5";

const region = "us-west-2";
const client = new SQS({ region });

const QueueName = `test-${Date.now()}`;
const { QueueUrl } = await client.createQueue({ QueueName });

const MessageBody = `hello\r\nworld`;
console.log(MessageBody);
const { MD5OfMessageBody } = await client.sendMessage({
  MessageBody,
  QueueUrl,
});

console.log(
  `\nmd5 check for sent message: ${md5(MessageBody) === MD5OfMessageBody}`
);

const response = await client.receiveMessage({ QueueUrl });
console.log(
  `md5 check for received message: ${
    md5(MessageBody) === md5(response.Messages[0].Body)
  }`
);
console.log(
  `\nStrict Equality Comparison: ${MessageBody === response.Messages[0].Body}`
);

console.log(response.Messages[0].Body);

await client.deleteQueue({ QueueUrl });
Output
$ node sqs.mjs
hello
world

md5 check for sent message: true
/Users/trivikr/workspace/test/node_modules/@aws-sdk/middleware-sdk-sqs/dist/cjs/receive-message.js:21
            throw new Error("Invalid MD5 checksum on messages: " + messageIds.join(", "));
                  ^

Error: Invalid MD5 checksum on messages: 6ddadd53-e4ce-44ea-abb8-a2e5ca1dcfe0
    at /Users/trivikr/workspace/test/node_modules/@aws-sdk/middleware-sdk-sqs/dist/cjs/receive-message.js:21:19
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async /Users/trivikr/workspace/test/node_modules/@aws-sdk/middleware-logger/dist/cjs/loggerMiddleware.js:6:22
    at async file:///Users/trivikr/workspace/test/sqs.mjs:21:18

This issue is not reproducible with AWS SDK for JavaScript (v2), tested with v2.897.0.

@trivikr
Copy link
Member

trivikr commented May 11, 2021

This is an issue with upstream dependency fast-xml-parser which is being discussed at NaturalIntelligence/fast-xml-parser#342

While the fix is available, we'll provide a workaround in AWS SDK for JavaScript (v3).

@trivikr
Copy link
Member

trivikr commented May 11, 2021

The upstream fast-xml-parser used to have an option called decodeHTMLchar in the past.
It was removed in favor of tagValueProcessor in NaturalIntelligence/fast-xml-parser@5ac57a5

trivikr pushed a commit to trivikr/aws-sdk-js-v3 that referenced this issue May 11, 2021
@trivikr trivikr removed the needs-triage This issue or PR still needs to be triaged. label May 11, 2021
trivikr pushed a commit to trivikr/aws-sdk-js-v3 that referenced this issue May 11, 2021
@pauldraper
Copy link
Contributor Author

I don't believe NaturalIntelligence/fast-xml-parser#342 is an issue. Or at least, it's an intentional issue. fast-xml-parser docs show using he to decode entities as implemented in #2381.

(Personally, I think it's crazy it doesn't do that, but someone really wanted a FAST-xml-parser.)

High Priority automation moved this from To do to Done May 14, 2021
@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue is a bug.
Projects
3 participants