[教程一] 写一个搜索:使用 Laravel Scout,Elasticsearch,ik 分词
文字太长,你可以直接看代码:
https://github.com/lijinma/laravel-scout-e...
过年的时候,我在家写了一个小网站,名字叫“笑来搜”,整个过程是这样的:
- 开始使用
tntsearch
,非常小巧,依赖也少,很喜欢。 - 不过用了一下发现
tntsearch
没有配套的中文分词,有一个小伙子写了一个,但是很不完善。 - 最终还是选择了
ElasticSearch
,虽然相对tntsearch
更重一点。 ElasticSearch
中的ik
分词插件简单好用,而且非常容易扩展词库。
笑来搜 上线后,好几个朋友询问如何可以简单的实现一个类似的搜索网站,所以我就抽时间做了一个类似的 Demo,代码在 https://github.com/lijinma/laravel-scout-e... ,对你有帮助的请 Star,这个 Demo 至少有这两个优点:
- 尽可能写清楚安装中的每一个步骤,我假设你是一名新手。
- 这个 Demo 直接跑在了我的服务器上,你可以直观的玩起来。http://scout.lijinma.com/search
下面是整个教程:
首先:我们要做一个什么?
我们要做的东西比较简单,就是把一个公众账号的文章拉下来,然后实现所有文章的“标题”和“内容”的搜索,在项目中我选择了李笑来老师的”学习学习再学习“中的50篇文章。
先看看要做的东西的样子: http://scout.lijinma.com/search
第一步:安装好 Laravel 5.4
不管你是使用 homestead,还是 valet,还是 docker ,还是直接自己本地环境搭建,反正第一步你要把 Laravel 5.4 项目跑起来,可以看到 welcome 的页面。
这里分享一下我是如何开发的,一般来说,只有我一个人开发的简单的 Laravel 项目,我都不使用 homestead 或者 valet 或者 docker 跑的,我直接在 Mac 本地跑,Mac 上只需要装一个 mysql,然后开发调试的时候直接使用
php artisan serve
,总体来说效率比较高,配置快。
第二步:配置
配置数据库
create database laravel_scout_elastic_demo;
安装 ElasticSearch Scout Engine 包
$ composer require tamayo/laravel-scout-elastic
安装这个包的时候,顺便就会装好 Laravel Scout,我们 publish 一下 config
$ php artisan vendor:publish --provider="Laravel\Scout\ScoutServiceProvider"
添加对应的 ServiceProvider:
//app.php
...
Laravel\Scout\ScoutServiceProvider::class,
ScoutEngines\Elasticsearch\ElasticsearchProvider::class,
...
安装 Goutte Client
我们需要通过公众号文章的 url 爬到文章的标题和内容,所以需要安装这个 库:
composer require fabpot/goutte
第三步:安装 ElasticSearch
因为我们要使用 ik 插件,在安装这个插件的时候,如果自己想办法安装这个插件会浪费你很多精力。
所以我们直接使用项目: https://github.com/medcl/elasticsearch-rtf
当前的版本是 Elasticsearch 5.1.1,ik 插件也是直接自带了。
安装好 ElasticSearch,跑起来服务,测试服务安装是否正确:
$ curl http://localhost:9200
{
"name" : "Rkx3vzo",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "Ww9KIfqSRA-9qnmj1TcnHQ",
"version" : {
"number" : "5.1.1",
"build_hash" : "5395e21",
"build_date" : "2016-12-06T12:36:15.409Z",
"build_snapshot" : false,
"lucene_version" : "6.3.0"
},
"tagline" : "You Know, for Search"
}
如果正确的打印以上信息,证明 ElasticSearch 已经安装好了。
接着你需要查看一下 ik 插件是否安装(请在你的 ElasticSearch 文件夹中执行):
$ ./bin/elasticsearch-plugin list
analysis-ik
如果出现 analysis-ik
,证明 ik 已经安装。
第四步,开始写代码:
添加 InitEs 命令,初始化 ES 的一些数据
$ php artisan make:command InitEs
InitEs.php 代码如下,主要做了两件事情:
- 创建对应的 index
- 创建一个 template,你可以通过下面的链接了解一下什么是 Index template
https://www.elastic.co/guide/en/elasticsea...
<?php
namespace App\Console\Commands;
use GuzzleHttp\Client;
use Illuminate\Console\Command;
class InitEs extends Command
{
/**
* The name and signature of the console command.
*
* @var string
*/
protected $signature = 'es:init';
/**
* The console command description.
*
* @var string
*/
protected $description = 'Init es to create index';
/**
* Create a new command instance.
*
*/
public function __construct()
{
parent::__construct();
}
/**
* Execute the console command.
*
* @return mixed
*/
public function handle()
{
$client = new Client();
$this->createTemplate($client);
$this->createIndex($client);
}
protected function createIndex(Client $client)
{
$url = config('scout.elasticsearch.hosts')[0] . ':9200/' . config('scout.elasticsearch.index');
$client->put($url, [
'json' => [
'settings' => [
'refresh_interval' => '5s',
'number_of_shards' => 1,
'number_of_replicas' => 0,
],
'mappings' => [
'_default_' => [
'_all' => [
'enabled' => false
]
]
]
]
]);
}
protected function createTemplate(Client $client)
{
$url = config('scout.elasticsearch.hosts')[0] . ':9200/' . '_template/rtf';
$client->put($url, [
'json' => [
'template' => '*',
'settings' => [
'number_of_shards' => 1
],
'mappings' => [
'_default_' => [
'_all' => [
'enabled' => true
],
'dynamic_templates' => [
[
'strings' => [
'match_mapping_type' => 'string',
'mapping' => [
'type' => 'text',
'analyzer' => 'ik_smart',
'ignore_above' => 256,
'fields' => [
'keyword' => [
'type' => 'keyword'
]
]
]
]
]
]
]
]
]
]);
}
}
创建 Post 表,存放公众号的文章
php artisan make:migration create_posts_table
代码:
<?php
use Illuminate\Support\Facades\Schema;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Database\Migrations\Migration;
class CreatePostsTable extends Migration
{
/**
* Run the migrations.
*
* @return void
*/
public function up()
{
Schema::create('posts', function (Blueprint $table) {
$table->increments('id');
$table->text('url');
$table->string('author', 64)->nullable()->default(null);
$table->text('title');
$table->longText('content');
$table->dateTime('post_date')->nullable()->default(null);
$table->timestamps();
});
}
/**
* Reverse the migrations.
*
* @return void
*/
public function down()
{
Schema::dropIfExists('posts');
}
}
在数据库中创建表:
$ php artisan migrate
添加 Post Model:
$ php artisan make:model Post
代码:
<?php
namespace App;
use Illuminate\Database\Eloquent\Model;
use Laravel\Scout\Searchable;
/**
* Class Post
* @package App
* @property string $url
* @property string $author
* @property string $content
* @property string $title
* @property string $post_date
* @property string $created_at
* @property string $updated_at
*/
class Post extends Model
{
use Searchable;
protected $table = 'posts';
protected $fillable = [
'url',
'author',
'title',
'content',
'post_date'
];
public function toSearchableArray()
{
return [
'title' => $this->title,
'content' => $this->content
];
}
}
添加一个命令 ImportPosts,通过此命令去爬去数据,并导入到 Post 表中。
$ php artisan make:command ImportPosts
代码:
<?php
namespace App\Console\Commands;
use App\Libraries\WechatPostSpider;
use App\Post;
use Goutte\Client;
use Illuminate\Console\Command;
class ImportPosts extends Command
{
/**
* The name and signature of the console command.
*
* @var string
*/
protected $signature = 'posts:import';
/**
* The console command description.
*
* @var string
*/
protected $description = 'Import posts!';
/**
* Create a new command instance.
*
*/
public function __construct()
{
parent::__construct();
}
/**
* Execute the console command.
*
* @return mixed
*/
public function handle()
{
$client = new Client();
foreach (config('post-urls') as $url) {
/**
* 这里 url 可能需要索引,但是用 url 做唯一标示不太好,索引太大
*/
if (Post::where('url', $url)->exists()) {
continue;
}
$wechatPostSpider = new WechatPostSpider($client, $url);
$this->savePost($wechatPostSpider);
$this->info('create one post!');
}
}
protected function savePost(WechatPostSpider $wechatPostSpider)
{
Post::create([
'url' => $wechatPostSpider->getUrl(),
'author' => $wechatPostSpider->getAuthor(),
'title' => $wechatPostSpider->getTitle(),
'content' => $wechatPostSpider->getContent(),
'post_date' => $wechatPostSpider->getPostDate(),
]);
}
}
此时,需要依赖两个文件,一个是 app/Libraries/WechatPostSpider.php,一个是 config/post-urls.php 配置文件。
WechatPostSpider.php 负责爬去数据
<?php namespace App\Libraries;
use Goutte\Client;
use Symfony\Component\DomCrawler\Crawler;
/**
* Created by PhpStorm.
* User: lijinma
* Date: 04/03/2017
* Time: 9:05 PM
*/
class WechatPostSpider
{
/**
* @var Crawler|null
*/
protected $crawler;
/**
* @var string
*/
protected $url;
/**
* WechatPostSpider constructor.
* @param Client $client
* @param $url
*/
public function __construct(Client $client, $url)
{
$this->url = $url;
$this->crawler = $client->request('GET', $url);
}
/**
* @return string
*/
public function getTitle()
{
return trim($this->crawler->filter('title')->text());
}
/**
* @return string
*/
public function getContent()
{
return trim($this->crawler->filter('.rich_media_content')->text());
}
/**
* @return string
*/
public function getAuthor()
{
return trim($this->crawler->filter('#post-date')->nextAll()->text());
}
/**
* @return string
*/
public function getPostDate()
{
return $this->crawler->filter('#post-date')->text();
}
/**
* @return string
*/
public function getUrl()
{
return $this->url;
}
}
post-urls.php 存储需要爬取的公众号文章 urls,这里只列了一条
<?php
return [
"http://mp.weixin.qq.com/s?__biz=MzAxNzI4MTMwMw==&mid=2651630953&idx=1&sn=9c4d8f2b4df2605fdaa1338303acc908&chksm=801ff511b7687c07303220a0c105d979f1a4a5db45689c95111a6c6ec2f5a6c0c6cecea88ba0&scene=4#wechat_redirect",
];
添加 PostController
$ php artisan make:controller PostController
PostController.php 代码:
<?php
namespace App\Http\Controllers;
use App\Post;
use Illuminate\Http\Request;
class PostController extends Controller
{
public function search(Request $request)
{
$q = $request->get('q');
$paginator = [];
if ($q) {
$paginator = Post::search($q)->paginate();
}
return view('search', compact('paginator', 'q'));
}
}
PostController.php 需要依赖 view 文件,我们创建一个 resources/views/layouts/main.blade.php,一个 resources/views/search.blade.php
resources/views/layouts/main.blade.php 代码:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" id="viewport"
content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1"/>
<!-- CSRF Token -->
<meta name="csrf-token" content="{{ csrf_token() }}">
<title>{{ config('app.name', 'Laravel') }}</title>
<!-- Styles -->
<link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
<link href="/css/main.css" rel="stylesheet">
<!-- Scripts -->
<script>
window.Laravel = {!! json_encode([
'csrfToken' => csrf_token(),
]) !!};
</script>
</head>
<body>
<div id="app">
<div class="container">
<div class="row">
<div class="col-md-12">
<nav class="navbar navbar-default">
<div class="container-fluid">
<!-- Brand and toggle get grouped for better mobile display -->
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1" aria-expanded="false">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">Laravel Scout Elastic Demo</a>
</div>
</div><!-- /.container-fluid -->
</nav>
</div>
</div>
@yield('content')
</div>
</div>
<!-- Scripts -->
<script src="http://cdn.bootcss.com/jquery/1.12.4/jquery.min.js"></script>
<script src="http://cdn.bootcss.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>
</body>
</html>
resources/views/search.blade.php 代码:
@extends('layouts.main')
@section('content')
<div class="row">
<div class="col-md-12">
<form action="/search">
<div class="input-group">
<input type="text" class="form-control h50" name="q" placeholder="关键字..." value="{{ $q }}">
<span class="input-group-btn"><button class="btn btn-default h50" type="submit" type="button"><span class="glyphicon glyphicon-search"></span></button></span>
</div>
</form>
</div>
</div>
@if($q)
<div class="row">
<div class="col-md-12">
<div class="panel panel-default list-panel search-results">
<div class="panel-heading">
<h3 class="panel-title ">
<i class="fa fa-search"></i> 关于 “<span class="highlight">{{ $q }}</span>” 的搜索结果, 共 {{ $paginator->total() }} 条
</h3>
</div>
<div class="panel-body ">
@foreach($paginator as $post)
<div class="result">
<h2 class="title">
<a href="{{ $post->url }}" target="_blank">
{{ $post->title }}
</a>
</h2>
<div class="info">
</div>
<div class="desc">
{{ mb_substr($post->content, 0, 150) }}......
</div>
<hr>
</div>
@endforeach
</div>
{{ $paginator->links() }}
</div>
</div>
</div>
@else
<div class="row text-center">
<div class="col-md-12">
<br>
<h2>你会搜索到什么?</h2>
<br>
<p>学习学习再学习公众号所有文章</p>
</div>
</div>
@endif
@endsection
现在我们的代码已经写完了,但是缺少一个功能,搜索结果如何高亮(highlight) 呢?
本作品采用《CC 协议》,转载必须注明作者和本文链接
推荐文章: