<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Google-Bigquery</title><link>https://jwheel.org/tags/google-bigquery/</link><description>Homepage of Justin Wheeler, an Open Source contributor and Free Software advocate from Georgia, USA.</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><managingEditor>Justin Wheeler</managingEditor><lastBuildDate>Mon, 18 Dec 2017 00:00:00 +0000</lastBuildDate><atom:link href="https://jwheel.org/rss/tags/google-bigquery/index.xml" rel="self" type="application/rss+xml"/><item><title>Statistics proposal and self-hosting ListenBrainz</title><link>https://jwheel.org/blog/2017/12/statistics-hosting-listenbrainz/</link><pubDate>Mon, 18 Dec 2017 00:00:00 +0000</pubDate><guid>https://jwheel.org/blog/2017/12/statistics-hosting-listenbrainz/</guid><description><![CDATA[<p><em>This post is part of a series of posts where I contribute to the ListenBrainz project for my independent study at the Rochester Institute of Technology in the fall 2017 semester. For more posts, find them in <a href="/tags/rit-2171/">this tag</a>.</em></p>
<hr>
<p>This week is the last week of the fall 2017 semester at RIT. This semester, I spent time with the MetaBrainz community working on ListenBrainz for an independent study. This post explains what I was working on in the last month and reflects back on my <a href="/blog/2017/10/contributing-listenbrainz/">original objectives</a> for the independent study.</p>

<h2 id="running-my-own-listenbrainz">Running my own ListenBrainz&nbsp;<a class="hanchor" href="#running-my-own-listenbrainz" aria-label="Anchor link for: Running my own ListenBrainz">🔗</a></h2>
<p>The <a href="http://ritlug.com/">RIT Linux Users Group</a> hosts various virtual machines for our projects. I requested one to set up and host a &ldquo;production&rdquo; ListenBrainz site. The purpose of doing this was to…</p>
<ol>
<li>Test my changes in a &ldquo;production&rdquo; environment</li>
<li>Offer a service for the RIT Linux Users Group to poke around with</li>
</ol>
<p>I spent most of this time working with our system administrator to set up the machine and adjust hardware specs for ListenBrainz. Once we fixed storage space and memory issues, it was easy to set it up and get ListenBrainz running. My experience writing the <a href="https://listenbrainz.readthedocs.io/en/latest/dev/devel-env.html">development guide</a> made it easy to get set up and get working. On the first run, it worked!</p>
<p>Now, <a href="http://listen.ritlug.com/">listen.ritlug.com</a> is live.</p>

<h4 id="figuring-out-https">Figuring out HTTPS&nbsp;<a class="hanchor" href="#figuring-out-https" aria-label="Anchor link for: Figuring out HTTPS">🔗</a></h4>
<p>My next challenge for the site is to set up HTTPS. I tried using a <a href="https://www.nginx.com/resources/admin-guide/nginx-https-upstreams/">reverse proxy in nginx</a> to set up HTTPS, but I received <em>502 Bad Gateway</em> errors. I realized I spent too much time figuring this out on my own and decided to <a href="https://community.metabrainz.org/t/how-does-metabrainz-use-https-on-listenbrainz/347319">ask for help</a> in the MetaBrainz community forums.</p>

<h2 id="proposing-new-statistics">Proposing new statistics&nbsp;<a class="hanchor" href="#proposing-new-statistics" aria-label="Anchor link for: Proposing new statistics">🔗</a></h2>
<p>Halfway through the independent study, I realized I would fall short of my original objective of implementing basic statistics in ListenBrainz. To compromise, I wrote a <a href="https://docs.google.com/document/d/1kByAgC9kbuDHNbsEJDkYkTMJ-wAoouWj0qNyi2UPb2Y/edit?usp=sharing">proposal for new statistics</a> to start in the project. My proposal looked at other proprietary platforms that compete with ListenBrainz to see some of their statistics. I also came up with some of my own.</p>
<p>I <a href="https://community.metabrainz.org/t/feedback-needed-listenbrainz-statistics-proposal/347327">proposed this to the MetaBrainz community</a> on the community forums. I&rsquo;m awaiting feedback on my ideas. Once I get feedback, I plan to file new tickets for each statistic to track their implementation over time.</p>
<p>I don&rsquo;t expect statistics being at the forefront of ListenBrainz for some time. A lot of work is going towards other areas of the project. But later in 2018, I expect more focus on the user-facing side of the project.</p>

<h2 id="my-statistic-and-google-bigquery">My statistic and Google BigQuery&nbsp;<a class="hanchor" href="#my-statistic-and-google-bigquery" aria-label="Anchor link for: My statistic and Google BigQuery">🔗</a></h2>
<p>My biggest blocker over the last month was <a href="https://cloud.google.com/bigquery/">Google BigQuery</a>. I wrote a statistic to <a href="https://github.com/metabrainz/listenbrainz-server/pull/318/commits/c1c08ce7f8d207591daeb288087872616d5063a4">calculate play counts</a> over a time period, but was asked to test my statistic. To test my statistic, I needed real data to work with.</p>
<p>Originally, I tried using the <a href="https://github.com/tgwizard/sls">Simple Last.fm Scrobbler</a> to submit listens to the local IP address for my development environment, but I wasn&rsquo;t able to get the app to reach my ListenBrainz server. To get the data, I had to set up Google BigQuery credentials so I could make queries against data on the production site, <a href="https://listenbrainz.org/">listenbrainz.org</a>.</p>
<p>I tried working through the <a href="https://cloud.google.com/bigquery/docs/">Google BigQuery documentation</a>. There&rsquo;s a lot of documentation for using BigQuery as a developer, but it was confusing where to find the information I needed to set it up in my development environment. I tried creating a new project in the Google Cloud Platform, but I was confused because it prompted me to upload my own data instead of accessing data already in BigQuery.</p>
<p>Too late, I realized I spent too much time on my own and not asking for help. I <a href="https://github.com/metabrainz/listenbrainz-server/pull/318">submitted a pull request</a> with the statistic I made and <a href="https://community.metabrainz.org/t/how-to-set-up-google-bigquery-in-a-listenbrainz-development-environment/347307">asked for help</a> in the MetaBrainz community. I also offered to write documentation for setting this up once I learn how to do it.</p>

<h2 id="reflecting-back">Reflecting back&nbsp;<a class="hanchor" href="#reflecting-back" aria-label="Anchor link for: Reflecting back">🔗</a></h2>
<p>I looked back on my <a href="/blog/2017/10/contributing-listenbrainz/">original objectives</a> for the independent study, and I was satisfied and dissatisfied.</p>

<h4 id="not-enough-programming">Not enough programming&nbsp;<a class="hanchor" href="#not-enough-programming" aria-label="Anchor link for: Not enough programming">🔗</a></h4>
<p>I wanted this independent study to enhance my programming knowledge. I especially wanted to focus on Python because I wanted to become more familiar with the language. However, I actually didn&rsquo;t do much programming during the independent study, to my own fault.</p>
<p>My biggest challenge was I bit off more than I could chew. I wanted to write code, and made a big goal before I knew the code base of the project. Even now, I still am not completely comfortable with the code yet. It&rsquo;s a big project with a lot of things going on. I was able to understand the things I did work on, but there&rsquo;s still a lot.</p>
<p>I realized that next time, I need to spend more time evaluating the code base of a project before writing out my milestones. I wish I set more realistic, smaller milestones for myself. My milestone of implementing basic reports was lofty given my existing programming knowledge.</p>

<h4 id="successes">Successes&nbsp;<a class="hanchor" href="#successes" aria-label="Anchor link for: Successes">🔗</a></h4>
<p>One of my other objectives was to write documentation for the project. I felt I succeeded in this milestone, and actually found it enjoyable and interesting to do! I helped separate out documentation from the README into the dedicated <a href="https://listenbrainz.readthedocs.io/en/latest/">ReadTheDocs site</a>. I wrote the <a href="https://listenbrainz.readthedocs.io/en/latest/dev/devel-env.html">development environment guide</a> and helped fix some build issues with the docs site. I also plan to write more for some of the other pain points I found, like Google BigQuery.</p>
<p>My last milestone was to create a use case for a data visualization course at RIT. While I didn&rsquo;t implement my basic reports, I did create the proposal and make an effort to write new statistics. There&rsquo;s a lot of potential now to work with the data in Google BigQuery and do front-end work with tools like <a href="https://d3js.org/">D3.js</a> and <a href="https://plot.ly/javascript/">Plotly.js</a>. I believe there&rsquo;s significant potential to use ListenBrainz as a hands-on project for students to explore data visualization with real data. I hope to support my independent study professor, Prof. Roberts, with questions and logistics of using it as a tool for learning in the future.</p>

<h4 id="unexpected-success">Unexpected success&nbsp;<a class="hanchor" href="#unexpected-success" aria-label="Anchor link for: Unexpected success">🔗</a></h4>
<p>I also think I had an unplanned success too. I immersed myself in the community for ListenBrainz too. Over the last few months, I realized that many of my strengths are in community management and tooling. During my time in the community, I did the following:</p>
<ul>
<li><a href="https://github.com/metabrainz/listenbrainz-server/pull/290">Fixed SELinux labels in Docker</a></li>
<li><a href="https://github.com/metabrainz/listenbrainz-server/pull/288">Contributed a pull request template</a></li>
<li><a href="https://github.com/metabrainz/listenbrainz-server/pull/287">Drafted contributing guidelines</a></li>
<li><a href="https://github.com/metabrainz/listenbrainz-server/pull/294">Fixed a PostgreSQL bug</a></li>
<li><a href="https://github.com/metabrainz/listenbrainz-server/pulls?utf8=%E2%9C%93&amp;q=is%3Apr&#43;author%3Ajflory7&#43;">And more…</a></li>
</ul>

<h2 id="to-the-future">To the future!&nbsp;<a class="hanchor" href="#to-the-future" aria-label="Anchor link for: To the future!">🔗</a></h2>
<p>This ends my independent study with ListenBrainz, but it doesn&rsquo;t end my time contributing! I chose ListenBrainz because it&rsquo;s a project I&rsquo;m passionate about. An independent study allowed me to justify more time on it than a side project in my free time. I&rsquo;m happy to have that opportunity, but I don&rsquo;t want to end here!</p>
<p>I want to follow through on the statistics because I&rsquo;m passionate about understanding music listening trends. I think there&rsquo;s a lot of power for psychological research through music data. To this point, I filed a ticket to request <a href="https://tickets.metabrainz.org/browse/LB-243">tagging listens with &ldquo;emotion&rdquo; words</a> that are synced back to <a href="https://musicbrainz.org/doc/MusicBrainz_Database">MusicBrainz entities</a>.</p>
<p>I won&rsquo;t have as much time to work on the project without the course credit, but I hope to stay involved for the future. I love the project and I love the community. I&rsquo;m thankful for the opportunity to work on this project as an independent study, and learn some things along the way.</p>]]></description></item><item><title>ListenBrainz community gardening and user statistics</title><link>https://jwheel.org/blog/2017/11/listenbrainz-community-user-statistics/</link><pubDate>Mon, 13 Nov 2017 00:00:00 +0000</pubDate><guid>https://jwheel.org/blog/2017/11/listenbrainz-community-user-statistics/</guid><description><![CDATA[<p><em>This post is part of a series of posts where I contribute to the ListenBrainz project for my independent study at the Rochester Institute of Technology in the fall 2017 semester. For more posts, find them in <a href="/tags/rit-2171/">this tag</a>.</em></p>
<hr>
<p>My progress with ListenBrainz slowed, but I am resuming the pace of contributing and advancing on my independent study timeline. This past week, I finished out assigned tasks to discuss contributor-related documentation, like a Code of Conduct, contributor guidelines, and a pull request template. I began research on user statistics and found some already created. I wrote one of my own, but need to learn more about Google BigQuery to advance further.</p>

<h2 id="paving-the-contributor-pathway">Paving the contributor pathway&nbsp;<a class="hanchor" href="#paving-the-contributor-pathway" aria-label="Anchor link for: Paving the contributor pathway">🔗</a></h2>
<p>
<figure>
  <img src="/blog/2017/11/Screenshot-from-2017-11-13-02-05-12.png" alt="Making it easier for people to contribute user statistics to ListenBrainz" loading="lazy">
  <figcaption>Making it easier for people to contribute to ListenBrainz with helpful contibuting guidelines</figcaption>
</figure>
</p>
<p>Earlier, I identified weaknesses for the ListenBrainz contributor pathway and found ways we could improve the pathway. This started with the development environment documentation. Now, I helped draft first revisions of our <a href="https://github.com/metabrainz/listenbrainz-server/pull/287">contributor guidelines</a>, <a href="https://github.com/metabrainz/listenbrainz-server/pull/286">Code of Conduct reference</a>, and <a href="https://github.com/metabrainz/listenbrainz-server/pull/288">pull request templates</a>. Together, these three documents have two goals.</p>
<ol>
<li><strong>Make it easier</strong> to contribute to ListenBrainz</li>
<li>Have a better experience and <strong>have fun</strong> contributing!</li>
</ol>
<p>Adding these documents addresses these goals. Additionally, the <a href="https://github.com/metabrainz/listenbrainz-server/community">GitHub community profile</a> also highlights these deliverables as ways to meet these goals. After getting feedback and seeing what others think, we make more revisions later (with some trial runs).</p>

<h2 id="back-to-selinux-context-flags">Back to SELinux context flags&nbsp;<a class="hanchor" href="#back-to-selinux-context-flags" aria-label="Anchor link for: Back to SELinux context flags">🔗</a></h2>
<p>Recently, I set my desktop back up and installed Docker for the first time on this machine; however, the development environment still failed to start. When I ran the script, it would eventually error out because of a permission denial. The web server image for ListenBrainz was failing.</p>
<p>After debugging, I noticed that I missed the SELinux volume tags for the ListenBrainz web server images in my original pull request, <a href="https://github.com/metabrainz/listenbrainz-server/pull/257">#257</a>. When I created the pull request, I might have had cached data that let my laptop run the development environment without a problem. In either case, it was an easy fix and I knew what the issue was when it happened. Therefore, I submitted a new fix in <a href="https://github.com/metabrainz/listenbrainz-server/pull/290">#290</a>.</p>

<h2 id="writing-new-user-statistics">Writing new user statistics&nbsp;<a class="hanchor" href="#writing-new-user-statistics" aria-label="Anchor link for: Writing new user statistics">🔗</a></h2>
<p>The most interesting part of my independent study is working with the music data to build and generate interesting statistics. I finally began exploring the <a href="https://github.com/metabrainz/listenbrainz-server/tree/master/listenbrainz/stats">existing statistics</a> in ListenBrainz. The statistic queries use BigQuery standard SQL. BigQuery helps rapidly scan and scale data queries to help with performance (I have a lot to learn about BigQuery).</p>

<h4 id="two-types-of-statistics">Two types of statistics&nbsp;<a class="hanchor" href="#two-types-of-statistics" aria-label="Anchor link for: Two types of statistics">🔗</a></h4>
<p>Additionally, ListenBrainz generates <strong>two types</strong> of statistics:</p>
<ol>
<li>Site-wide statistics</li>
<li>User statistics</li>
</ol>
<p>Site-wide statistics are metrics non-specific to a single user. There is only <a href="https://github.com/metabrainz/listenbrainz-server/blob/master/listenbrainz/stats/sitewide.py">one site-wide query</a> now. It counts how many artists were ever submitted to this ListenBrainz instance and returns an integer. There&rsquo;s room for expansion in site-wide statistics.</p>
<p>On the other hand, user statistics are metrics specific to a single user. There&rsquo;s a <a href="https://github.com/metabrainz/listenbrainz-server/blob/master/listenbrainz/stats/user.py">fair number already</a>, like the top artists and songs in a time period and the number of artists you&rsquo;ve listened to. These are a little more complete and offer more expansion for doing cool front-end work with something like <a href="https://d3js.org/">D3.js</a>.</p>

<h4 id="writing-user-statistics">Writing user statistics&nbsp;<a class="hanchor" href="#writing-user-statistics" aria-label="Anchor link for: Writing user statistics">🔗</a></h4>
<p>Of course, I had to try writing my own. One helpful query I thought of was getting a count of the songs you listened to over a time period (e.g. &ldquo;you listened to 500 songs this week!&rdquo;). I haven&rsquo;t tested it yet, but I have this in a local branch and hope to test it with real data soon.</p>
<pre tabindex="0"><code>def get_play_count(musicbrainz_id, time_interval=None): 
 
 filter_clause = &#34;&#34; 
 if time_interval: 
     filter_clause = &#34;AND listened_at &gt;=
     TIMESTAMP_SUB(CURRENT_TIME(), 
     INTERVAL {})&#34;.format(time_interval) 
 
 query = &#34;&#34;&#34;SELECT COUNT(release_msid) as listen_count 
            FROM {dataset_id}.{table_id} 
            WHERE user_name = @musicbrainz_id 
            {time_filter_clause} 
            LIMIT {limit} 
         &#34;&#34;&#34;.format( 
                 dataset_id=config.BIGQUERY_DATASET_ID, 
                 table_id=config.BIGQUERY_TABLE_ID, 
                 time_filter_clause=filter_clause, 
                 limit=config.STATS_ENTITY_LIMIT, 
            ) 
 
 parameters = [ 
     { 
         &#39;type&#39;: &#39;STRING&#39;, 
         &#39;name&#39;: &#39;musicbrainz_id&#39;, 
         &#39;value&#39;: musicbrainz_id 
     } 
 ] 
 
 return stats.run_query(query, parameters)
</code></pre>
<h2 id="researching-google-bigquery">Researching Google BigQuery&nbsp;<a class="hanchor" href="#researching-google-bigquery" aria-label="Anchor link for: Researching Google BigQuery">🔗</a></h2>
<p>My next steps for the independent study are researching <a href="https://cloud.google.com/bigquery/docs/">Google BigQuery</a>. After going through the existing statistics and understanding how ListenBrainz generates them, an understanding of Google BigQuery is essential to writing effective queries. When I become more comfortable with the tooling and how it works, I want to map out a plan of statistics to generate and measure.</p>
<p>Until then, the hacking continues! As always, keep the FOSS flag high…</p>]]></description></item></channel></rss>