社交领域:Facebook, Twitter,Linkedin用它来管理社交关系,实现好友推荐
图数据库neo4j安装:
- 下载镜像:docker pull neo4j:3.5.0
- 运行容器:docker run -d -p 7474:7474 -p 7687:7687 --name neo4j-3.5.0 neo4j:3.5.0
- 停止容器:docker stop neo4j-3.5.0
- 启动容器:docker start neo4j-3.5.0
- 浏览器 http://localhost:7474/ 访问 neo4j 管理后台,初始账号/密码 neo4j/neo4j,会要求修改初始化密码,我们修改为 neo4j/123456
neo4j中可以使用Cypher查询语言(CQL)进行图形数据库的查询
①、添加节点
CREATE (:) 创建不含有属性节点
节点名称node-name和标签名称lable-name:标签名称相当于关系型数据库中的表名,而节点名称则代指这一条数据
而创建包含属性的节点时,可以在标签名称后面追加一个描绘属性的json字符串
CREATE (索尔:Person)
CREATE (洛基:Person {name:"洛基",title:"诡计之神"})
②、查询节点
MATCH (:) 查询已存在的节点及属性的数据
MATCH命令在后面配合RETURN、DELETE等命令使用,执行具体的返回或删除等操作
MATCH (p:Person) RETURN p
可以看到上面添加的两个节点,分别是不包含属性的空节点和包含属性的节点,并且所有节点会有一个默认生成的id作为唯一标识
③、删除节点
MATCH (p:Person) WHERE id§=100
DELETE p
在这条删除语句中,额外使用了WHERE过滤条件,它与SQL中的WHERE非常相似,命令中通过节点的id进行了过滤。 删除完成后,再次执行查询操作,可以看到只保留了洛基这一个节点
④、添加关联
再创建一个节点作为关系的两端:CREATE (p:Person {name:“索尔”,title:“雷神”})
创建关系的基本语法如下:
CREATE (:)
- [:]
-> (:)
也可以利用已经存在的节点创建关系,下面我们借助MATCH先进行查询,再将结果进行关联,创建两个节点之间的关联关系:
MATCH (m:Person),(n:Person)
WHERE m.name='索尔' and n.name='洛基'
CREATE (m)-[r:BROTHER {relation:"无血缘兄弟"}]->(n)
RETURN r
添加完成后,可以通过关系查询符合条件的节点及关系:
MATCH (m:Person)-[re:BROTHER]->(n:Person)
RETURN m,re,n
如果节点被添加了关联关系后,单纯删除节点的话会报错,:
Neo.ClientError.Schema.ConstraintValidationFailed
Cannot delete node<85>, because it still has relationships. To delete this node, you must first delete its relationships.
这时,需要在删除节点时同时删除关联关系:
MATCH (m:Person)-[r:BROTHER]->(n:Person)
DELETE m,r
SpringBoot整合neo4j
一、依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>springboot-demo</artifactId>
<groupId>com.et</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>neo4j</artifactId>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-autoconfigure</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-neo4j</artifactId>
</dependency>
<dependency>
<groupId>com.hankcs</groupId>
<artifactId>hanlp</artifactId>
<version>portable-1.2.4</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-parser</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
</dependencies>
</project>
二、属性文件和启动类
server:
port: 8088
spring:
data:
neo4j:
uri: bolt://127.0.0.1:7687
username: neo4j
password: 123456
三、文本SPO抽取
借助Git上一个现成的工具类,来进行文本的语义分析和SPO三元组的抽取工作
项目地址:https://github.com/hankcs/MainPartExtracto
//提取主谓宾
public class MainPartExtractor{
private static final Logger LOG = LoggerFactory.getLogger(MainPartExtractor.class);
private static LexicalizedParser lp;//加载模型
private static GrammaticalStructureFactory gsf;
static{
//模型
String models = "models/chineseFactored.ser";
LOG.info("载入文法模型:" + models);
lp = LexicalizedParser.loadModel(models);
//汉语
TreebankLanguagePack tlp = new ChineseTreebankLanguagePack();
gsf = tlp.grammaticalStructureFactory();
}
//获取句子的主谓宾
public static MainPart getMainPart(String sentence){
// 去掉不可见字符
sentence = sentence.replace("\\s+", "");
// 分词,用空格隔开
List<Word> wordList = seg(sentence);
return getMainPart(wordList);
}
public static MainPart getMainPart(List<Word> words)
{
MainPart mainPart = new MainPart();
if (words == null || words.size() == 0) return mainPart;
Tree tree = lp.apply(words);
LOG.info("句法树:{}", tree.pennString());
// 根据整个句子的语法类型来采用不同的策略提取主干
switch (tree.firstChild().label().toString())
{
case "NP":
// 名词短语,认为只有主语,将所有短NP拼起来作为主语即可
mainPart = getNPPhraseMainPart(tree);
break;
default:
GrammaticalStructure gs = gsf.newGrammaticalStructure(tree);
Collection<TypedDependency> tdls = gs.typedDependenciesCCprocessed(true);
LOG.info("依存关系:{}", tdls);
TreeGraphNode rootNode = getRootNode(tdls);
if (rootNode == null)
{
return getNPPhraseMainPart(tree);
}
LOG.info("中心词语:", rootNode);
mainPart = new MainPart(rootNode);
for (TypedDependency td : tdls)
{
// 依存关系的出发节点,依存关系,以及结束节点
TreeGraphNode gov = td.gov();
GrammaticalRelation reln = td.reln();
String shortName = reln.getShortName();
TreeGraphNode dep = td.dep();
if (gov == rootNode)
{
switch (shortName)
{
case "nsubjpass":
case "dobj":
case "attr":
mainPart.object = dep;
break;
case "nsubj":
case "top":
mainPart.subject = dep;
break;
}
}
if (mainPart.object != null && mainPart.subject != null)
{
break;
}
}
// 尝试合并主语和谓语中的名词性短语
combineNN(tdls, mainPart.subject);
combineNN(tdls, mainPart.object);
if (!mainPart.isDone()) mainPart.done();
}
return mainPart;
}
private static MainPart getNPPhraseMainPart(Tree tree)
{
MainPart mainPart = new MainPart();
StringBuilder sbResult = new StringBuilder();
List<String> phraseList = getPhraseList("NP", tree);
for (String phrase : phraseList)
{
sbResult.append(phrase);
}
mainPart.result = sbResult.toString();
return mainPart;
}
//从句子中提取最小粒度的短语
public static List<String> getPhraseList(String type, String sentence)
{
return getPhraseList(type, lp.apply(seg(sentence)));
}
private static List<String> getPhraseList(String type, Tree tree)
{
List<String> phraseList = new LinkedList<String>();
for (Tree subtree : tree)
{
if(subtree.isPrePreTerminal() && subtree.label().value().equals(type))
{
StringBuilder sbResult = new StringBuilder();
for (Tree leaf : subtree.getLeaves())
{
sbResult.append(leaf.value());
}
phraseList.add(sbResult.toString());
}
}
return phraseList;
}
//合并名词性短语为一个节点
private static void combineNN(Collection<TypedDependency> tdls, TreeGraphNode target)
{
if (target == null) return;
for (TypedDependency td : tdls)
{
// 依存关系的出发节点,依存关系,以及结束节点
TreeGraphNode gov = td.gov();
GrammaticalRelation reln = td.reln();
String shortName = reln.getShortName();
TreeGraphNode dep = td.dep();
if (gov == target)
{
switch (shortName)
{
case "nn":
target.setValue(dep.toString("value") + target.value());
return;
}
}
}
}
private static TreeGraphNode getRootNode(Collection<TypedDependency> tdls)
{
for (TypedDependency td : tdls)
{
if (td.reln() == GrammaticalRelation.ROOT)
{
return td.dep();
}
}
return null;
}
//分词
private static List<Word> seg(String sentence)
{
//分词
LOG.info("正在对短句进行分词:" + sentence);
List<Word> wordList = new LinkedList<>();
List<Term> terms = HanLP.segment(sentence);
StringBuffer sbLogInfo = new StringBuffer();
for (Term term : terms)
{
Word word = new Word(term.word);
wordList.add(word);
sbLogInfo.append(word);
sbLogInfo.append(' ');
}
LOG.info("分词结果为:" + sbLogInfo);
return wordList;
}
public static MainPart getMainPart(String sentence, String delimiter)
{
List<Word> wordList = new LinkedList<>();
for (String word : sentence.split(delimiter))
{
wordList.add(new Word(word));
}
return getMainPart(wordList);
}
public static void main(String[] args)
{
/* String[] testCaseArray = {
"我一直很喜欢你",
"你被我喜欢",
"美丽又善良的你被卑微的我深深的喜欢着……",
"只有自信的程序员才能把握未来",
"主干识别可以提高检索系统的智能",
"这个项目的作者是hankcs",
"hankcs是一个无门无派的浪人",
"搜索hankcs可以找到我的博客",
"静安区体育局2013年部门决算情况说明",
"这类算法在有限的一段时间内终止",
};
for (String testCase : testCaseArray)
{
MainPart mp = MainPartExtractor.getMainPart(testCase);
System.out.printf("%s\t%s\n", testCase, mp);
}*/
mpTest();
}
public static void mpTest(){
String[] testCaseArray = {
"我一直很喜欢你",
"你被我喜欢",
"美丽又善良的你被卑微的我深深的喜欢着……",
"小米公司主要生产智能手机",
"他送给了我一份礼物",
"这类算法在有限的一段时间内终止",
"如果大海能够带走我的哀愁",
"天青色等烟雨,而我在等你",
"我昨天看见了一个非常可爱的小孩"
};
for (String testCase : testCaseArray) {
MainPart mp = MainPartExtractor.getMainPart(testCase);
System.out.printf("%s %s %s \n",
GraphUtil.getNodeValue(mp.getSubject()),
GraphUtil.getNodeValue(mp.getPredicate()),
GraphUtil.getNodeValue(mp.getObject()));
}
}
}
四、动态构建知识图谱
新建一个NodeServiceImpl,其中实现两个关键方法parseAndBind和addNode 首先是根据句子中抽取的主语或宾语在neo4j中创建节点的方法,这里根据节点的name判断是否为已存在的节点,如果存在则直接返回,不存在则添加:
@Service
@AllArgsConstructor
public class NodeServiceImpl implements NodeService {
private final NodeRepository nodeRepository;
private final RelationRepository relationRepository;
@Override
public Node save(Node node) {
Node save = nodeRepository.save(node);
return save;
}
@Override
public void bind(String name1, String name2, String relationName) {
Node start = nodeRepository.findByName(name1);
Node end = nodeRepository.findByName(name2);
Relation relation =new Relation();
relation.setStartNode(start);
relation.setEndNode(end);
relation.setRelation(relationName);
relationRepository.save(relation);
}
private Node addNode(TreeGraphNode treeGraphNode){
String nodeName = GraphUtil.getNodeValue(treeGraphNode);
Node existNode = nodeRepository.findByName(nodeName);
if (Objects.nonNull(existNode))
return existNode;
Node node =new Node();
node.setName(nodeName);
return nodeRepository.save(node);
}
@Override
public List<Relation> parseAndBind(String sentence) {
MainPart mp = MainPartExtractor.getMainPart(sentence);
TreeGraphNode subject = mp.getSubject(); //主语
TreeGraphNode predicate = mp.getPredicate();//谓语
TreeGraphNode object = mp.getObject(); //宾语
if (Objects.isNull(subject) || Objects.isNull(object))
return null;
Node startNode = addNode(subject);
Node endNode = addNode(object);
String relationName = GraphUtil.getNodeValue(predicate);//关系词
List<Relation> oldRelation = relationRepository
.findRelation(startNode, endNode,relationName);
if (!oldRelation.isEmpty())
return oldRelation;
Relation botRelation=new Relation();
botRelation.setStartNode(startNode);
botRelation.setEndNode(endNode);
botRelation.setRelation(relationName);
Relation relation = relationRepository.save(botRelation);
return Arrays.asList(relation);
}
}
测试:启动java应用,输入以下地址
http://127.0.0.1:8088/parse?sentence=海拉又被称为死亡女神
http://127.0.0.1:8088/parse?sentence= 死亡女神捏碎了雷神之锤
http://127.0.0.1:8088/parse?sentence=雷神之锤属于索尔
在图数据库neo4j里面查询:
MATCH (p:Person) RETURN p