我的编程空间,编程开发者的网络收藏夹
学习永远不晚

Druid 0.17入门(4)—— 数据查询方式大全

短信预约 信息系统项目管理师 报名、考试、查分时间动态提醒
省份

北京

  • 北京
  • 上海
  • 天津
  • 重庆
  • 河北
  • 山东
  • 辽宁
  • 黑龙江
  • 吉林
  • 甘肃
  • 青海
  • 河南
  • 江苏
  • 湖北
  • 湖南
  • 江西
  • 浙江
  • 广东
  • 云南
  • 福建
  • 海南
  • 山西
  • 四川
  • 陕西
  • 贵州
  • 安徽
  • 广西
  • 内蒙
  • 西藏
  • 新疆
  • 宁夏
  • 兵团
手机号立即预约

请填写图片验证码后获取短信验证码

看不清楚,换张图片

免费获取短信验证码

Druid 0.17入门(4)—— 数据查询方式大全

Druid 0.17入门(4)—— 数据查询方式大全

本文介绍Druid查询数据的方式,首先我们保证数据已经成功载入。

Druid查询基于HTTP,Druid提供了查询视图,并对结果进行了格式化。

Druid提供了三种查询方式,SQL,原生JSON,CURL。

一、SQL查询

我们用wiki的数据为例

查询10条最多的页面编辑

SELECT page, COUNT(*) AS Edits
FROM wikipedia
WHERE TIMESTAMP "2015-09-12 00:00:00" <= "__time" AND "__time" < TIMESTAMP "2015-09-13 00:00:00"
GROUP BY page
ORDER BY Edits DESC
LIMIT 10

我们在Query视图中操作

会有提示

选择Smart query limit会自动限制行数

Druid还提供了命令行查询sql 可以运行bin/dsql进行操作

Welcome to dsql, the command-line client for Druid SQL.
Type "h" for help.
dsql>

提交sql

dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00" GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌──────────────────────────────────────────────────────────┬───────┐
│ page                                                     │ Edits │
├──────────────────────────────────────────────────────────┼───────┤
│ Wikipedia:Vandalismusmeldung                             │    33 │
│ User:Cyde/List of candidates for speedy deletion/Subpage │    28 │
│ Jeremy Corbyn                                            │    27 │
│ Wikipedia:Administrators" noticeboard/Incidents          │    21 │
│ Flavia Pennetta                                          │    20 │
│ Total Drama Presents: The Ridonculous Race               │    18 │
│ User talk:Dudeperson176123                               │    18 │
│ Wikipédia:Le Bistro/12 septembre 2015                    │    18 │
│ Wikipedia:In the news/Candidates                         │    17 │
│ Wikipedia:Requests for page protection                   │    17 │
└──────────────────────────────────────────────────────────┴───────┘
Retrieved 10 rows in 0.06s.

还可以通过Http发送SQL

curl -X "POST" -H "Content-Type:application/json" -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql

可以得到如下结果

[
  {
    "page": "Wikipedia:Vandalismusmeldung",
    "Edits": 33
  },
  {
    "page": "User:Cyde/List of candidates for speedy deletion/Subpage",
    "Edits": 28
  },
  {
    "page": "Jeremy Corbyn",
    "Edits": 27
  },
  {
    "page": "Wikipedia:Administrators" noticeboard/Incidents",
    "Edits": 21
  },
  {
    "page": "Flavia Pennetta",
    "Edits": 20
  },
  {
    "page": "Total Drama Presents: The Ridonculous Race",
    "Edits": 18
  },
  {
    "page": "User talk:Dudeperson176123",
    "Edits": 18
  },
  {
    "page": "Wikipédia:Le Bistro/12 septembre 2015",
    "Edits": 18
  },
  {
    "page": "Wikipedia:In the news/Candidates",
    "Edits": 17
  },
  {
    "page": "Wikipedia:Requests for page protection",
    "Edits": 17
  }
]

更多SQL示例

时间查询

SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00"
GROUP BY 1

分组查询

SELECT channel, page, SUM(added)
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00"
GROUP BY channel, page
ORDER BY SUM(added) DESC

查询原始数据

SELECT user, page
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 02:00:00" AND TIMESTAMP "2015-09-12 03:00:00"
LIMIT 5

定时查询

也可以在dsql里操作

dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP "2015-09-12 00:00:00" AND TIMESTAMP "2015-09-13 00:00:00" GROUP BY page ORDER BY Edits DESC LIMIT 10;

│ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Retrieved 1 row in 0.03s.

二、原生JSON查询

Druid支持基于Json的查询

{
  "queryType" : "topN",
  "dataSource" : "wikipedia",
  "intervals" : ["2015-09-12/2015-09-13"],
  "granularity" : "all",
  "dimension" : "page",
  "metric" : "count",
  "threshold" : 10,
  "aggregations" : [
    {
      "type" : "count",
      "name" : "count"
    }
  ]
}

把json粘贴到json 查询模式窗口

Json查询是通过向router和broker发送请求

curl -X POST ":/druid/v2/?pretty" -H "Content-Type:application/json" -H "Accept:application/json" -d @

Druid提供了丰富的查询方式

Aggregation查询

Timeseries查询
{
  "queryType": "timeseries",
  "dataSource": "sample_datasource",
  "granularity": "day",
  "descending": "true",
  "filter": {
    "type": "and",
    "fields": [
      { "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },
      { "type": "or",
        "fields": [
          { "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },
          { "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }
        ]
      }
    ]
  },
  "aggregations": [
    { "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },
    { "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }
  ],
  "postAggregations": [
    { "type": "arithmetic",
      "name": "sample_divide",
      "fn": "/",
      "fields": [
        { "type": "fieldAccess", "name": "postAgg__sample_name1", "fieldName": "sample_name1" },
        { "type": "fieldAccess", "name": "postAgg__sample_name2", "fieldName": "sample_name2" }
      ]
    }
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ]
}
TopN查询
{
  "queryType": "topN",
  "dataSource": "sample_data",
  "dimension": "sample_dim",
  "threshold": 5,
  "metric": "count",
  "granularity": "all",
  "filter": {
    "type": "and",
    "fields": [
      {
        "type": "selector",
        "dimension": "dim1",
        "value": "some_value"
      },
      {
        "type": "selector",
        "dimension": "dim2",
        "value": "some_other_val"
      }
    ]
  },
  "aggregations": [
    {
      "type": "longSum",
      "name": "count",
      "fieldName": "count"
    },
    {
      "type": "doubleSum",
      "name": "some_metric",
      "fieldName": "some_metric"
    }
  ],
  "postAggregations": [
    {
      "type": "arithmetic",
      "name": "average",
      "fn": "/",
      "fields": [
        {
          "type": "fieldAccess",
          "name": "some_metric",
          "fieldName": "some_metric"
        },
        {
          "type": "fieldAccess",
          "name": "count",
          "fieldName": "count"
        }
      ]
    }
  ],
  "intervals": [
    "2013-08-31T00:00:00.000/2013-09-03T00:00:00.000"
  ]
}
GroupBy查询
{
  "queryType": "groupBy",
  "dataSource": "sample_datasource",
  "granularity": "day",
  "dimensions": ["country", "device"],
  "limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] },
  "filter": {
    "type": "and",
    "fields": [
      { "type": "selector", "dimension": "carrier", "value": "AT&T" },
      { "type": "or",
        "fields": [
          { "type": "selector", "dimension": "make", "value": "Apple" },
          { "type": "selector", "dimension": "make", "value": "Samsung" }
        ]
      }
    ]
  },
  "aggregations": [
    { "type": "longSum", "name": "total_usage", "fieldName": "user_count" },
    { "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" }
  ],
  "postAggregations": [
    { "type": "arithmetic",
      "name": "avg_usage",
      "fn": "/",
      "fields": [
        { "type": "fieldAccess", "fieldName": "data_transfer" },
        { "type": "fieldAccess", "fieldName": "total_usage" }
      ]
    }
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
  "having": {
    "type": "greaterThan",
    "aggregation": "total_usage",
    "value": 100
  }
}

Metadata查询

TimeBoundary 查询
{
    "queryType" : "timeBoundary",
    "dataSource": "sample_datasource",
    "bound"     : < "maxTime" | "minTime" > # optional, defaults to returning both timestamps if not set
    "filter"    : { "type": "and", "fields": [, , ...] } # optional
}
SegmentMetadata查询
{
  "queryType":"segmentMetadata",
  "dataSource":"sample_datasource",
  "intervals":["2013-01-01/2014-01-01"]
}
DatasourceMetadata查询
{
    "queryType" : "dataSourceMetadata",
    "dataSource": "sample_datasource"
}

Search查询

{
  "queryType": "search",
  "dataSource": "sample_datasource",
  "granularity": "day",
  "searchDimensions": [
    "dim1",
    "dim2"
  ],
  "query": {
    "type": "insensitive_contains",
    "value": "Ke"
  },
  "sort" : {
    "type": "lexicographic"
  },
  "intervals": [
    "2013-01-01T00:00:00.000/2013-01-03T00:00:00.000"
  ]
}

查询建议

用Timeseries和TopN替代GroupBy

取消查询

DELETE /druid/v2/{queryId}
curl -X DELETE "http://host:port/druid/v2/abc123"

查询失败

{
  "error" : "Query timeout",
  "errorMessage" : "Timeout waiting for task.",
  "errorClass" : "java.util.concurrent.TimeoutException",
  "host" : "druid1.example.com:8083"
}

三、CURL

基于Http的查询

curl -X "POST" -H "Content-Type:application/json" -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8888/druid/v2?pretty

四、客户端查询

客户端查询是基于json的

具体查看 https://druid.apache.org/libraries.html

比如python查询的pydruid

from pydruid.client import *
from pylab import plt

query = PyDruid(druid_url_goes_here, "druid/v2")

ts = query.timeseries(
    datasource="twitterstream",
    granularity="day",
    intervals="2014-02-02/p4w",
    aggregations={"length": doublesum("tweet_length"), "count": doublesum("count")},
    post_aggregations={"avg_tweet_length": (Field("length") / Field("count"))},
    filter=Dimension("first_hashtag") == "sochi2014"
)
df = query.export_pandas()
df["timestamp"] = df["timestamp"].map(lambda x: x.split("T")[0])
df.plot(x="timestamp", y="avg_tweet_length", ylim=(80, 140), rot=20,
        title="Sochi 2014")
plt.ylabel("avg tweet length (chars)")
plt.show()

实时流式计算整理了Druid入门指南
持续更新中~

更多实时数据分析相关博文与科技资讯,欢迎关注 “实时流式计算”

获取《Druid实时大数据分析》电子书,请在公号后台回复 “Druid”

免责声明:

① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的,并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据,供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。

② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

Druid 0.17入门(4)—— 数据查询方式大全

下载Word文档到电脑,方便收藏和打印~

下载Word文档

猜你喜欢

Druid 0.17入门(4)—— 数据查询方式大全

本文介绍Druid查询数据的方式,首先我们保证数据已经成功载入。Druid查询基于HTTP,Druid提供了查询视图,并对结果进行了格式化。Druid提供了三种查询方式,SQL,原生JSON,CURL。一、SQL查询我们用wiki的数据为例查询10条最多的页面
Druid 0.17入门(4)—— 数据查询方式大全
2021-01-31

MySQL 百万级数据的4种查询优化方式

目录一.limit越往后越慢的原因二.百万数据模拟1、创建员工表和部门表,编写存储过程插数据2.执行存储过程三.4种查询方式1.普通limit分页2.使用索引覆盖+子查询优化3.起始位置重定义4,降级策略(百度的做法)一.limit越往后越
2022-05-28

大数据报表查询条件的使用方式有哪些

大数据报表查询条件的使用方式有哪些,很多新手对此不是很清楚,为了帮助大家解决这个难题,下面小编将为大家详细讲解,有这方面需求的人可以来学习下,希望你能有所收获。众所周知,大拿们在设计报表的时候,需要展现的数据很少是固定不变地,经常需要使用查
2023-06-04

Mybatis集成MySQL使用游标查询处理大批量数据方式

MyBatis集成MySQL游标查询,通过将结果集保留在数据库中,减少网络开销,延迟结果集处理和处理大数据集。优点包括减少网络消耗、分批处理和轻量化资源消耗。缺点包括潜在锁定、资源占用和复杂性。使用步骤涉及建立、打开、获取和关闭游标。MyBatis集成需要创建游标、配置连接和使用SqlSession创建游标。示例代码演示了如何使用游标查询。注意事项强调及时关闭游标、性能优化和使用批处理模式。
Mybatis集成MySQL使用游标查询处理大批量数据方式
2024-04-02

一文了解各大图数据库查询语言(Gremlin vs Cypher vs nGQL)| 操作入门篇

虽然和关系型数据库存储的结构不同(关系型数据库为表结构,图数据库为图结构),但不计各自的性能问题,关系型数据库可以通过递归查询或者组合其他 SQL 语句(Join)完成图查询语言查询节点关系操作。得益于 1987 年 SQL 成为国际标准化组织(ISO)标准,
一文了解各大图数据库查询语言(Gremlin vs Cypher vs nGQL)| 操作入门篇
2016-06-10

编程热搜

目录