<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Influxdb</title><link>https://jwheel.org/tags/influxdb/</link><description>Homepage of Justin Wheeler, an Open Source contributor and Free Software advocate from Georgia, USA.</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><managingEditor>Justin Wheeler</managingEditor><lastBuildDate>Tue, 15 Aug 2017 00:00:00 +0000</lastBuildDate><atom:link href="https://jwheel.org/rss/tags/influxdb/index.xml" rel="self" type="application/rss+xml"/><item><title>Introducing InfluxDB: Time-series database stack</title><link>https://jwheel.org/blog/2017/08/influxdb-time-series-database/</link><pubDate>Tue, 15 Aug 2017 00:00:00 +0000</pubDate><guid>https://jwheel.org/blog/2017/08/influxdb-time-series-database/</guid><description><![CDATA[<p><a href="https://opensource.com/article/17/8/influxdb-time-series-database-stack"><em>Article originally published on Opensource.com.</em></a></p>
<hr>
<p>The needs and demands of infrastructure environments changes every year. With time, systems become more complex and involved. But when infrastructure grows and becomes more complex, it&rsquo;s meaningless if we don&rsquo;t understand it and what&rsquo;s happening in our environment. This is why monitoring tools and software are often used in these environments, so operators and administrators see problems and fix them in real-time. But what if we want to predict problems before they happen? Collecting metrics and data about our environment give us a window into how our infrastructure is performing and lets us make predictions based on data. When we know and understand what&rsquo;s happening, we can prevent problems before they happen.</p>
<p>But how do we collect and store this data? For example, if we want to collect data on the CPU usage of 100 machines every ten seconds, we&rsquo;re generating a lot of data. On top of that, what if each machine is running fifteen containers? What if you want to generate data about each of those individual containers too? What about by the process? This is where time-series data becomes helpful. Time-series databases store time-series data. But what does that mean? We&rsquo;ll explain all of this and more and introduce you to InfluxDB, an open source time-series database. By the end of this article, you will understand…</p>
<ul>
<li>What time-series data / databases are</li>
<li>Quick introduction to InfluxDB and the TICK stack</li>
<li>How to install InfluxDB and other tools</li>
</ul>

<h2 id="introducing-time-series-concepts">Introducing time-series concepts&nbsp;<a class="hanchor" href="#introducing-time-series-concepts" aria-label="Anchor link for: Introducing time-series concepts">🔗</a></h2>
<p>
<figure>
  <img src="/blog/2017/07/rbdms-table-example.gif" alt="Example of table, or how a RDBMS like MySQL stores data" loading="lazy">
  <figcaption>Example of table, or how a RDBMS like MySQL stores data. Image from DevShed (<a href="http://www.devshed.com/c/a/php/using-the-active-record-pattern-with-php-and-mysql/" class="bare">http://www.devshed.com/c/a/php/using-the-active-record-pattern-with-php-and-mysql/</a>).</figcaption>
</figure>
</p>
<p>If you&rsquo;re familiar with relational database management software (RDBMS), like MySQL, <a href="http://www.informit.com/articles/article.aspx?p=377067&amp;seqNum=3">tables, columns, and primary keys</a> are familiar terms. Everything is like a spreadsheet, with columns and rows. Some data might be unique, other parts might be the same as other rows. RBDMS&rsquo;s like MySQL are widely used and are great for <strong>reliable transactions</strong> that follow <a href="https://en.wikipedia.org/wiki/ACID">ACID</a> (Atomicity, Consistency, Isolation, Durability) compliance.</p>
<p>With relational database software, you&rsquo;re usually working with data that is something you could model in a table. You might update certain data by overwriting and replacing it. But what if you&rsquo;re collecting on data on something that generates a lot of data and you want to watch change over time? Take a self-driving car. The car is constantly collecting information about its environment. It takes this data and it analyzes changes over time to behave correctly. The amount of data might be tens of gigabytes an hour. While you could use a relational database to collect this data, they&rsquo;re not built for this. When it comes to scaling and usability of the data you&rsquo;re collecting, an RBDMS isn&rsquo;t the best tool for the job.</p>

<h4 id="why-time-series-is-a-good-fit">Why time-series is a good fit&nbsp;<a class="hanchor" href="#why-time-series-is-a-good-fit" aria-label="Anchor link for: Why time-series is a good fit">🔗</a></h4>
<p>And this is where time-series data makes sense. Let&rsquo;s say you&rsquo;re collecting data about a city traffic, temperature from farming equipment, or the production rate of an assembly line. Instead of going into a table with rows and columns, imagine pushing multiple rows of data that are uniquely sorted by a timestamp. This visual might help make more sense of this.</p>
<p>
<figure>
  <img src="/blog/2017/07/picture-the-cloud.gif" alt="Imagine rows and rows of data, uniquely sorted by timestamps" loading="lazy">
  <figcaption>Imagine rows and rows of data, uniquely sorted by timestamps. Image from Timescale (<a href="https://blog.timescale.com/what-the-heck-is-time-series-data-and-why-do-i-need-a-time-series-database-dcf3b1b18563" class="bare">https://blog.timescale.com/what-the-heck-is-time-series-data-and-why-do-i-need-a-time-series-database-dcf3b1b18563</a>).</figcaption>
</figure>
</p>
<p>Having the data in this format makes it easier to track and watch change over time. When data accumulates, you can see how something behaved in the past, how it&rsquo;s behaving now, and how it might behave in the future. Your options to make smarter data decisions expands!</p>
<p>Curious how the data is stored and formatted? It depends on the time-series database (TSDB) you use. InfluxDB stores the data in the <a href="https://docs.influxdata.com/influxdb/v1.3/write_protocols/line_protocol_tutorial/">Line Protocol</a> format. <a href="https://docs.influxdata.com/influxdb/v1.3/tools/api/#query">Queries</a> return the data in JSON.</p>
<p>
<figure>
  <img src="/blog/2017/07/influxdb-data-format.jpg" alt="How InfluxDB stores time-series data in JSON" loading="lazy">
  <figcaption>How InfluxDB stores time-series data in Line Protocol (<a href="https://docs.influxdata.com/influxdb/v1.3/write_protocols/line_protocol_tutorial/" class="bare">https://docs.influxdata.com/influxdb/v1.3/write_protocols/line_protocol_tutorial/</a>). Image from Roberto Gaudenzi (<a href="https://www.slideshare.net/RobertoGaudenzi1/introduction-to-influx-db" class="bare">https://www.slideshare.net/RobertoGaudenzi1/introduction-to-influx-db</a>).</figcaption>
</figure>
</p>
<p>If you&rsquo;re still confused or trying to understand time-series data or why you would want to use it over another solution, you can read an excellent, in-depth explanation from <a href="https://blog.timescale.com/what-the-heck-is-time-series-data-and-why-do-i-need-a-time-series-database-dcf3b1b18563">Timescale&rsquo;s blog</a> or <a href="https://www.influxdata.com/modern-time-series-platform/">InfluxData&rsquo;s blog</a>.</p>

<h2 id="influxdb-a-time-series-database">InfluxDB: A time-series database&nbsp;<a class="hanchor" href="#influxdb-a-time-series-database" aria-label="Anchor link for: InfluxDB: A time-series database">🔗</a></h2>
<p><a href="https://www.influxdata.com/time-series-platform/influxdb/">InfluxDB</a> is an open source time-series database software developed by <a href="https://www.influxdata.com/">InfluxData</a>. It&rsquo;s written in Go (a compiled language), which means you can start using it without installing any dependencies. It supports multiple data ingestion protocols, such as <a href="https://www.influxdata.com/time-series-platform/telegraf/">Telegraf</a> (also from InfluxData), <a href="https://graphiteapp.org/">Graphite</a>, <a href="https://collectd.org/">collectd</a>, and <a href="http://opentsdb.net/">OpenTSDB</a>. This leaves you with flexible options for how you want to collect data and where you&rsquo;re pulling it from. It&rsquo;s also one of the <a href="https://db-engines.com/en/ranking/time&#43;series&#43;dbms">fastest-growing</a> time-series database software available. You can find the source code for InfluxDB on <a href="https://github.com/influxdata/influxdb">GitHub</a>.</p>
<p>This article will focus on three tools in InfluxData&rsquo;s TICK stack for how you can build a time-series database and begin collecting and processing data.</p>

<h4 id="tick-stack">TICK stack&nbsp;<a class="hanchor" href="#tick-stack" aria-label="Anchor link for: TICK stack">🔗</a></h4>
<p>InfluxData creates a platform based on four open source projects that work and play well with each other for time-series data. When used together, you can collect, store, process, and view the data easily. The four pieces of the platform are known as the <a href="https://www.influxdata.com/time-series-platform/">TICK stack</a>. This stands for…</p>
<ul>
<li><strong>_T_elegraf</strong>: Plugin-driven server agent for collecting / reporting metrics</li>
<li><strong>_I_nfluxDB</strong>: Scalable data store for metrics, events, and real-time analytics</li>
<li><strong>_C_hronograf</strong>: Monitoring / visualization UI for TICK stack (not covered in this article)</li>
<li><strong>_K_apacitor</strong>: Framework for processing, monitoring, and alerting on time-series data</li>
</ul>
<p>These tools work and integrate well with the other pieces by design. However, it&rsquo;s also easy to substitute one piece out for another tool of your choice. For this article, we&rsquo;ll explore three parts of the TICK stack: InfluxDB, Telegraf, and Kapacitor.</p>
<p>
<figure>
  <img src="/blog/2017/07/tick-stack-diagram.png" alt="Diagram of how the different components of the InfluxDB TICK stack connect with each other" loading="lazy">
  <figcaption>Diagram of how the different components of the TICK stack connect with each other. From influxdata.com (<a href="https://www.influxdata.com/time-series-platform/" class="bare">https://www.influxdata.com/time-series-platform/</a>).</figcaption>
</figure>
</p>

<h4 id="influxdb"><a href="https://docs.influxdata.com/influxdb/">InfluxDB</a>&nbsp;<a class="hanchor" href="#influxdb" aria-label="Anchor link for: InfluxDB">🔗</a></h4>
<p>As mentioned before, InfluxDB is the time-series database (TSDB) of the TICK stack. Data collected from your environment is stored into InfluxDB. There are a few things that stand out about InfluxDB from other time-series databases.</p>

<h6 id="emphasis-on-performance">Emphasis on performance&nbsp;<a class="hanchor" href="#emphasis-on-performance" aria-label="Anchor link for: Emphasis on performance">🔗</a></h6>
<p>InfluxDB is designed with performance as one of the top priorities. This allows you to use data quickly and easily, even under heavy loads. To do this, InfluxDB focuses on quickly ingesting the data and using compression to keep it manageable. To query and write data, it uses an HTTP(S) API.</p>
<p>The performance notes are noteworthy standing up the amount of data InfluxDB is capable of handling. It can handle up to a million points of data per second, at a precise level even to the nanosecond.</p>

<h6 id="sql-like-queries">SQL-like queries&nbsp;<a class="hanchor" href="#sql-like-queries" aria-label="Anchor link for: SQL-like queries">🔗</a></h6>
<p>If you&rsquo;re familiar with SQL-like syntax, querying data from InfluxDB will feel familiar. It uses its own SQL-like syntax, <a href="https://docs.influxdata.com/influxdb/v1.3/query_language">InfluxQL</a>, for queries. As an example, imagine you&rsquo;re collecting data on used disk space on a machine. If you wanted to see that data, you could write a query that might look like this.</p>
<pre tabindex="0"><code>SELECT mean(diskspace_used) as mean_disk_used
FROM disk_stats
WHERE time() &gt;= 3m
GROUP BY time(10d)
</code></pre><p>If you&rsquo;re familiar with SQL syntax, this won&rsquo;t feel too different. The above statement will pull the mean values of used disk space from a three-month period and group them by every ten days.</p>

<h6 id="downsampling--data-retention">Downsampling / data retention&nbsp;<a class="hanchor" href="#downsampling--data-retention" aria-label="Anchor link for: Downsampling / data retention">🔗</a></h6>
<p>When working with large amounts of data, storing it becomes a concern. Over time, it can accumulate to huge sizes. With InfluxDB, you can <strong>downsample</strong> into less precise, but smaller metrics that you can store for longer periods of time. <strong>Data retention policies</strong> for your data enable you to do this.</p>
<p>For example, pretend you have sensors collecting data on the amount of RAM in a number of machines. You might collect metrics on the amount of memory in use by multiple users, the system, cached memory, and more. While it might make sense to hang on to that data for thirty days to watch what&rsquo;s happening, after thirty days, you might not need it that precise. Instead, you might only want the ratio of total memory to memory in use. Using data retention policies, you can tell InfluxDB to hang on to the precise data for all the different usages for thirty days. After thirty days, you can average data to be less precise, and you can hold on to that data for six months, forever, or however long you like. This compromise meets in the middle between keeping historical data and reducing disk usage.</p>

<h4 id="telegraf"><a href="https://docs.influxdata.com/telegraf/">Telegraf</a>&nbsp;<a class="hanchor" href="#telegraf" aria-label="Anchor link for: Telegraf">🔗</a></h4>
<p>If InfluxDB is where all of your data is going, you need a way to collect and gather the data first. Telegraf is a metric collection daemon that gathers various metrics from system components, IoT sensors, and more. It&rsquo;s <a href="https://github.com/influxdata/telegraf">open source</a> and written completely in Go. Like InfluxDB, Telegraf is also written by the InfluxData team and is built to work with InfluxDB. It also includes support for different databases, such as MySQL / MariaDB, MongoDB, Redis, and more. You can read more about it on <a href="https://www.influxdata.com/time-series-platform/telegraf/">InfluxData&rsquo;s website</a>.</p>
<p>Telegraf is modular and heavily based on plugins. This means that Telegraf is either lean and minimal or as full and complex as you need it. Out of the box, it supports over a hundred plugins for various input sources. This includes Apache, Ceph, Docker, IPTables, Kubernetes, NGINX, and Varnish, just to name a few. You can see all the plugins, including processing and output plugins in their <a href="https://github.com/influxdata/telegraf#input-plugins">README</a>.</p>
<p>Even if you&rsquo;re not using InfluxDB as a data store, you may find Telegraf useful as a way to collect this data and information about your systems or sensors.</p>

<h4 id="kapacitor"><a href="https://docs.influxdata.com/kapacitor/">Kapacitor</a>&nbsp;<a class="hanchor" href="#kapacitor" aria-label="Anchor link for: Kapacitor">🔗</a></h4>
<p>Now we have a way to collect and store our data. But what about doing things with it? Kapacitor is the piece of the stack that lets you process and work with the data in a few different ways. It supports both stream and batch data. Stream data means you can actively work and shape the data in real-time, even before it makes it to your data store. Batch data means you retroactively perform actions on samples, or batches, of the data.</p>
<p>One of the biggest pluses for Kapacitor is that it enables you to have real-time alerts for events happening in your environment. CPU usage overloading or temperatures too high? You can set up several different alert systems, including but not limited to email, triggering a command, Slack, HipChat, OpsGenie, and many more. You can see the full list in the <a href="https://docs.influxdata.com/kapacitor/v1.3/nodes/alert_node/">documentation</a>.</p>
<p>Like the previous tools, Kapacitor is also <a href="https://github.com/influxdata/kapacitor">open source</a> and you can read more about the project in their <a href="https://github.com/influxdata/kapacitor/blob/master/README.md">README</a>.</p>

<h2 id="installing-the-tick-stack">Installing the TICK stack&nbsp;<a class="hanchor" href="#installing-the-tick-stack" aria-label="Anchor link for: Installing the TICK stack">🔗</a></h2>
<p>Packages are available for nearly every distribution. You can install these packages from the command line. Use the instructions for your distribution.</p>

<h4 id="fedora">Fedora&nbsp;<a class="hanchor" href="#fedora" aria-label="Anchor link for: Fedora">🔗</a></h4>
<pre tabindex="0"><code>sudo dnf install https://dl.influxdata.com/influxdb/releases/influxdb-1.3.1.x86_64.rpm \
https://dl.influxdata.com/telegraf/releases/telegraf-1.3.4-1.x86_64.rpm \
https://dl.influxdata.com/kapacitor/releases/kapacitor-1.3.1.x86_64.rpm
</code></pre>
<h4 id="centos-7--rhel-7">CentOS 7 / RHEL 7&nbsp;<a class="hanchor" href="#centos-7--rhel-7" aria-label="Anchor link for: CentOS 7 / RHEL 7">🔗</a></h4>
<pre tabindex="0"><code>sudo yum install https://dl.influxdata.com/influxdb/releases/influxdb-1.3.1.x86_64.rpm \
https://dl.influxdata.com/telegraf/releases/telegraf-1.3.4-1.x86_64.rpm \
https://dl.influxdata.com/kapacitor/releases/kapacitor-1.3.1.x86_64.rpm
</code></pre>
<h4 id="ubuntu--debian">Ubuntu / Debian&nbsp;<a class="hanchor" href="#ubuntu--debian" aria-label="Anchor link for: Ubuntu / Debian">🔗</a></h4>
<pre tabindex="0"><code>wget https://dl.influxdata.com/influxdb/releases/influxdb_1.3.1_amd64.deb \
https://dl.influxdata.com/telegraf/releases/telegraf_1.3.4-1_amd64.deb \
https://dl.influxdata.com/kapacitor/releases/kapacitor_1.3.1_amd64.deb
sudo dpkg -i influxdb_1.3.1_amd64.deb telegraf_1.3.4-1_amd64.deb kapacitor_1.3.1_amd64.deb
</code></pre>
<h4 id="other-distributions">Other distributions&nbsp;<a class="hanchor" href="#other-distributions" aria-label="Anchor link for: Other distributions">🔗</a></h4>
<p>For help with other distributions, see the <a href="https://portal.influxdata.com/downloads">Downloads</a> page.</p>

<h2 id="see-the-data-be-the-data">See the data, be the data&nbsp;<a class="hanchor" href="#see-the-data-be-the-data" aria-label="Anchor link for: See the data, be the data">🔗</a></h2>
<p>Now that you have the tools installed, you can experiment with some of these tools. There&rsquo;s plenty of upstream documentation on all three projects. You can the docs here:</p>
<ul>
<li><a href="https://docs.influxdata.com/influxdb/">InfluxDB documentation</a></li>
<li><a href="https://docs.influxdata.com/telegraf/">Telegraf documentation</a></li>
<li><a href="https://docs.influxdata.com/kapacitor/">Kapacitor documentation</a></li>
</ul>
<p>Additionally, for more help, you can visit the <a href="https://community.influxdata.com/">InfluxData community forums</a>. Happy hacking!</p>]]></description></item></channel></rss>