使用 HTMLPurifier 来解决 Laravel 5 中的 XSS 跨站脚本攻击安全问题

说明

本文是对老文章 Laravel 4 XSS 解决方案 HTMLPurifier for Laravel 4 的更新。

The Problem

XSS 一直是 Web 开发安全里面的一个大话题, 更多信息请见这里 IBM 文档库：跨站点脚本攻击深入解析。

本社区运行于 PHPHub 之上，PHPHub 是一个论坛软件，由大量 UGC（User Generated Content 意味用户产生内容）驱动，随时面临的 XSS 的威胁，即使软件使用 Markdown 来撰写内容，减小了 XSS 的威胁，但是像以下问题还是会出现:

[some text](javascript:alert('xss'))

更详细的 Markdown 和 XSS 的信息请见这里 -> Markdown and XSS 。

没有绝对的安全，这里介绍 PHPHub 是如何利用 HTMLPurifier for Laravel 5 来减小 XSS 的安全危害。

The Solution

HTMLPurifier

HTMLPurifier 本身就是一个独立的项目，运用『白名单机制』对 HTML 文本信息进行 XSS 过滤。

这里的『白名单机制』指的是，使用配置信息来定义『HTML 标签』、『标签属性』和『CSS 属性』数组，在执行 clean() 方法时，只允许配置信息『白名单』里出现的元素通过，其他都进行过滤。

如配置信息：

'HTML.Allowed' => 'div,em,a[href|title|style],ul,ol,li,p[style],br',
'CSS.AllowedProperties'    => 'font,font-size,font-weight,font-style,font-family',

用户提交：

<a someproperty="somevalue" href="http://example.com" style="color:#ccc;font-size:16px">
    文章内容<script>alert('Alerted')</script>
</a>

会被解析为：

<a href="http://example.com" style="font-size:16px">
    文章内容
</a>

以下内容因为未指定会被过滤：

someproperty 未指定的 HTML 属性
color 未指定的 CSS 属性
script 未指定的 HTML 标签

HTMLPurifier for Laravel 5

HTMLPurifier for Laravel 是对 HTMLPurifier 针对 Laravel 框架的一个封装.

安装 HTMLPurifier for Laravel 5

使用 Composer 安装：

composer require mews/purifier

在 config/app.php 文件的 providers 数组添加以下

Mews\Purifier\PurifierServiceProvider::class,

配置 HTMLPurifier for Laravel 5

命令行下运行

$ php artisan vendor:publish --provider="Mews\Purifier\PurifierServiceProvider"

打开 config/purifier.php , 默认的配置有以下:


return [
    'encoding'      => 'UTF-8',
    'finalize'      => true,
    'cachePath'     => storage_path('app/purifier'),
    'cacheFileMode' => 0755,
    'settings'      => [
        'default' => [
            'HTML.Doctype'             => 'HTML 4.01 Transitional',
            'HTML.Allowed'             => 'div,b,strong,i,em,u,a[href|title],ul,ol,li,p[style],br,span[style],img[width|height|alt|src]',
            'CSS.AllowedProperties'    => 'font,font-size,font-weight,font-style,font-family,text-decoration,padding-left,color,background-color,text-align',
            'AutoFormat.AutoParagraph' => true,
            'AutoFormat.RemoveEmpty'   => true,
        ],
        'test'    => [
            'Attr.EnableID' => 'true',
        ],
        "youtube" => [
            "HTML.SafeIframe"      => 'true',
            "URI.SafeIframeRegexp" => "%^(http://|https://|//)(www.youtube.com/embed/|player.vimeo.com/video/)%",
        ],
        'custom_definition' => [
            'id'  => 'html5-definitions',
            'rev' => 1,
            'debug' => false,
            'elements' => [
                // http://developers.whatwg.org/sections.html
                ['section', 'Block', 'Flow', 'Common'],
                ['nav',     'Block', 'Flow', 'Common'],
                ['article', 'Block', 'Flow', 'Common'],
                ['aside',   'Block', 'Flow', 'Common'],
                ['header',  'Block', 'Flow', 'Common'],
                ['footer',  'Block', 'Flow', 'Common'],

                // Content model actually excludes several tags, not modelled here
                ['address', 'Block', 'Flow', 'Common'],
                ['hgroup', 'Block', 'Required: h1 | h2 | h3 | h4 | h5 | h6', 'Common'],

                // http://developers.whatwg.org/grouping-content.html
                ['figure', 'Block', 'Optional: (figcaption, Flow) | (Flow, figcaption) | Flow', 'Common'],
                ['figcaption', 'Inline', 'Flow', 'Common'],

                // http://developers.whatwg.org/the-video-element.html#the-video-element
                ['video', 'Block', 'Optional: (source, Flow) | (Flow, source) | Flow', 'Common', [
                    'src' => 'URI',
                    'type' => 'Text',
                    'width' => 'Length',
                    'height' => 'Length',
                    'poster' => 'URI',
                    'preload' => 'Enum#auto,metadata,none',
                    'controls' => 'Bool',
                ]],
                ['source', 'Block', 'Flow', 'Common', [
                    'src' => 'URI',
                    'type' => 'Text',
                ]],

                // http://developers.whatwg.org/text-level-semantics.html
                ['s',    'Inline', 'Inline', 'Common'],
                ['var',  'Inline', 'Inline', 'Common'],
                ['sub',  'Inline', 'Inline', 'Common'],
                ['sup',  'Inline', 'Inline', 'Common'],
                ['mark', 'Inline', 'Inline', 'Common'],
                ['wbr',  'Inline', 'Empty', 'Core'],

                // http://developers.whatwg.org/edits.html
                ['ins', 'Block', 'Flow', 'Common', ['cite' => 'URI', 'datetime' => 'CDATA']],
                ['del', 'Block', 'Flow', 'Common', ['cite' => 'URI', 'datetime' => 'CDATA']],
            ],
            'attributes' => [
                ['iframe', 'allowfullscreen', 'Bool'],
                ['table', 'height', 'Text'],
                ['td', 'border', 'Text'],
                ['th', 'border', 'Text'],
                ['tr', 'width', 'Text'],
                ['tr', 'height', 'Text'],
                ['tr', 'border', 'Text'],
            ],
        ],
        'custom_attributes' => [
            ['a', 'target', 'Enum#_blank,_self,_target,_top'],
        ],
        'custom_elements' => [
            ['u', 'Inline', 'Inline', 'Common'],
        ],
    ],

];

这个时候就可以使用如下的调用进行过滤了

clean(Input::get('inputname'));

扩展设置

为了方便扩展性, 我将 config 文件如以下:

<?php

return [
    'encoding'      => 'UTF-8',
    'finalize'      => true,
    'cachePath'     => storage_path('app/purifier'),
    'cacheFileMode' => 0755,
    'settings'      => [
        'default' => [
            'HTML.Doctype'             => 'HTML 4.01 Transitional',
            'HTML.Allowed'             => 'div,b,strong,i,em,u,a[href|title],ul,ol,li,p[style],br,span[style],img[width|height|alt|src]',
            'CSS.AllowedProperties'    => 'font,font-size,font-weight,font-style,font-family,text-decoration,padding-left,color,background-color,text-align',
            'AutoFormat.AutoParagraph' => true,
            'AutoFormat.RemoveEmpty'   => true,
        ],
        'test'    => [
            'Attr.EnableID' => 'true',
        ],
        .
        .
        // 省略无数代码
        .
        .
        'user_topic_body' => array(
            'HTML.Doctype'             => 'XHTML 1.0 Strict',
            'HTML.Allowed'             => 'div,b,strong,i,em,a[href|title],ul,ol,li,p[style],br,span[style],img[width|height|alt|src],pre,code',
            'CSS.AllowedProperties'    => 'font,font-size,font-weight,font-style,font-family,text-decoration,padding-left,color,background-color,text-align',
            'AutoFormat.AutoParagraph' => true,
            'AutoFormat.RemoveEmpty'   => true,
        ),
    ],

];

注意到多了一个 user_topic_body 的节点, 这样的话, 我就可以针对性的调用, 如以下, 注意第二个传参:

clean($html_data, 'user_topic_body');

--- EOF ---

本作品采用《CC 协议》，转载必须注明作者和本文链接

摈弃世俗浮躁，追求技术精湛

本帖由 Summer 于 8年前加精

讨论数量: 31

overtrue

管理员 1.5k 声望 / PHP @ Tencent

666

8年前评论

滕勇志

L5.6 译者 267 声望 / co-founder @ xsha labs

越了解XSS，感觉越难防御:fearful:

Rekkles

课程读者 109 声望

能挡住的终究是少数

Summer

站长 11.3k 声望 / 维护者 @ LearnKu.com

@Rekkles
@轻色年华愿闻其详。

欢迎对 XSS 防御有过深入了解的同学发表下看法哈 - 你认为 HTMLPurifier 的弱点有哪些？

Destiny

课程读者 796 声望 / PHP、Go、React、Nextjs @ 自由职业

很详细！

JokerLinly

教程VIP 1.4k 声望

很 6

379 声望

可以的

some text

我都不知道有这种写法。。

不敢说很了解XSS，只是简单的知道XSS的一些奇淫技巧，自己当前真的无法防御住:joy:

NauxLiu

L5.3 译者 270 声望

如果只考虑到现代浏览器的话，可以试试 CSP (Content Security Policy)。

黄海林HL

0 声望

这个可以

MehrLicht

14 声望

没法收藏，分享微信吧，bundleID 错误。点更多操作，发现点不动，还不小心点到了举报。大大，我不是故意的。

@MehrLicht 你估计用的是老的 App ，卸载了重新装下。

$ php artisan config:publish mews/purifier --provider="Mews\Purifier\PurifierServiceProvider"
这段命令5.4里面好像不行，换成
$ php artisan vendor:publish --provider="Mews\Purifier\PurifierServiceProvider"之后倒是可以了

另外代码是怎么发出来的啊:scream:

$ php artisan config:publish mews/purifier --provider="Mews\Purifier\PurifierServiceProvider"

@生活无限好已改正，感谢

iVanilla

2 声望

之前想用时发现HTML Purifier不支持PHP7

rovast

课程读者 218 声望 / 技术经理 @ 南京

:thumbsup:

悲剧不上演

课程读者 278 声望

longqq

还需要在config/app.php中的aliases添加 'Purifier' => Mews\Purifier\Facades\Purifier::class, 才能使用clean()

xufu

39 声望

直接转义不让直行不就可以了,

charles

1 声望

没考虑用nginx直接过滤xss的模块吗，比如lua-waf

Payne

120 声望

这个好像不能防止图片XSS，比如上传一个头像，在图片中嵌入了XSS代码，加载的时候还是会被执行吧。

@Payne 直接试试

Johnson16

课程读者 92 声望 / 程序员 @ Peritix

markdown语法里用这个扩展，全部都带上了P标签，请赐教

7年前评论

hustnzj

版主 2.2k 声望

文章内容

jaak

133 声望 / php开发 php全栈开发 @ 自由人

普通文本用这个clean后直接给我外层加了个 p 标签，什么情况？？？

GuanJie

课程读者 253 声望 / phper @ usitrip

@Summer @overtrue

请问下: 下图情况怎么处理,

问题:

将代码块中的 <script>...</script>  希望展示的代码文本, 也给过滤掉了.

file

另外可不可以讲解一下如何防止攻击:

[XSS](javascript:alert(1))

6年前评论

xinkong2000

@iVanilla 请教了现在是如何解决的？

Cosmos

课程读者 13 声望

xss 也分很多类型这个都能防吗

3年前评论

讨论应以学习和精进为目的。请勿发布不友善或者负能量的内容，与人为善，比聪明更重要！

帮助

使用 HTMLPurifier 来解决 Laravel 5 中的 XSS 跨站脚本攻击安全问题

说明

The Problem

The Solution

HTMLPurifier

HTMLPurifier for Laravel 5

安装 HTMLPurifier for Laravel 5

配置 HTMLPurifier for Laravel 5

扩展设置

这个可以

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

使用 HTMLPurifier 来解决 Laravel 5 中的 XSS 跨站脚本攻击安全问题

说明

The Problem

The Solution

HTMLPurifier

HTMLPurifier for Laravel 5

安装 HTMLPurifier for Laravel 5

配置 HTMLPurifier for Laravel 5

扩展设置

这个可以

社区赞助商

关于 LearnKu

资源推荐

服务提供商

其他信息

请登录