solr 分片与复制

在没有使用solrcloude的时候,可以使用如下的架构图进行分片与复制:

具体可以参考solr的官方指南:Combining Distribution and Replication  章节 。简单来说就是一主两从。从片的同步方式,我觉得应该是通过从主片同步快照的方式,来实现的。

Snapshot :A directory containing hard links to the data files of an index. Snapshots are distributed from the master nodes when the slaves pull them, “smart copying” any segments the slave node does not have in snapshot directory that contains the hard links to the most recent index data files.

在solr cloud里面,这种方式主要是用来进行备份使用。其实我个人觉得这种快照的复制方式应该是用来快速备份的,不应该是用来进行主从分片同步用的,更可能用的是PULL索引文件的形式。

Solr cloud提供了三种分片模式,分别为:

NRT: This is the default. A NRT replica (NRT = NearRealTime) maintains a transaction log and writes new documents to it’s indexes locally. Any replica of this type is eligible to become a leader. Traditionally, this was the only type supported by Solr.

TLOG: This type of replica maintains a transaction log but does not index document changes locally. This type helps speed up indexing since no commits need to occur in the replicas. When this type of replica needs to update its index, it does so by replicating the index from the leader. This type of replica is also eligible to become a shard leader; it would do so by first processing its transaction log. If it does become a leader, it will behave the same as if it was a NRT type of replica.

PULL: This type of replica does not maintain a transaction log nor index document changes locally. It only replicates the index from the shard leader. It is not eligible to become a shard leader and doesn’t participate in shard leader election at all.

这三种模式的主要分别是,NRT可以做主片,可以使用近实时索引(支持SOFT COMMIT),同步索引靠数据转发;TLOG也可以做主片,当为主片是和NRT一致,不能近实时索引,从片需要和主片同步的时候,只是从从片同步索引文件;PULL不能做主片,仅从主片同步索引文件。
创建分片的时候,副本默认使用的NRT模式。

Solr cloud可推荐使用的分片组合方式:

1、全部NRT:适用于小到中级的集群;更新吞吐量不太高的大型集群;

2、全部TLOG:不需要 实时索引;每一个分片的副本数较多;同时需要所有分片都能切换为主片;

3、TLOG+PULL:不需要 实时索引;每一个分片的副本数较多;提高查询能力,能够容忍短时的过期数据。

我们做了个测试:solr7.3,16个物理节点,每个节点3个实例,每个实例20G,需要索引的数据量为一亿,160个spark executors,一主两从(注:我们不需要NRT特性,我们是夜间批量)

1、全部NRT需要20分钟,大概每分钟600-800万。

2、全部TLOG需要10分钟,大概每分钟10000万。

3、TLOG+PULL:未测

PS:solr创建索引,可以先使用solr自带的zk脚本工具中uploadconfig方法上传配置文件,再使用solr的collection api里面的createcollection方法创建。

一份参考资料:

下面参考文章的原文地址:https://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/media/documents/replicatypes-berlinbuzzwords.pdf

其中有启发的一张图: