Skip to main content

SQL Resources, or Show Me The Data!

As I am pretty sure was said, loudly and repeatedly in the movie Jerry Maguire: “Show Me The Data!”
It is the Data Scientists Mantra...


Data is in our name and really you need data to do your work, and we often have a singular place to store and find all our data (a data warehouse) for the work we do. But of course, the better the tools you have to easily group it, subset it, order it and otherwise transform it, the more insights you can reveal with the rest of your skills.


To command all the power that the data warehouse offers and that your analytical work deserves, having a solid grip on SQL (Structured Query Language) is key. It will make your life easier, happier, and simply more wonderful.


So, here are some tools to aid in that quest.

Books

First of all, I would like to introduce you to O’Reilly books, mostly because of 2 titles I have found repeatedly useful:


  • SQL Pocket Guide – A Guide to SQL Usage, this slim volume helps with the details of syntax and all the details I can’t remember. Less than 200 pages with a good index. Current edition is the 3rd, and it’s getting a little dog-eared.


  • SQL Cookbook – This provides a list of various tasks you’re likely to need to accomplish and for each, the ways to get it done. Examples of some of the task descriptions: Retrieving a Subset of Rows from a Table; Finding Rows That Satisfy Multiple Conditions; Concatenating Column Values; Transforming Nulls into Real Values; Searching for Patterns


O’Reilly offers a good deal if you want both the paper and eBook format of a title, and they are very well edited and geared towards answering questions.


Online tutorials

There are a number of online resources which offer training in a variety of formats (multimedia, written word, etc.) and the best also include support for actually working through problems online.
  • Khan Academy offers audio/video training with exercises at: https://www.khanacademy.org/computing/computer-programmi ng/sql – These have the trademark depth and quality of all the Khan academy offerings along with exercises and grading and progress tracking.


  • Tutorialspoint offers a written presentation along with diagrams and a coding ground  to practice what you are learning. http://www.tutorialspoint.com/sql/index.htm
  • This is more of a show it, do it tutorial arrangement, small bites of information with the chance to try out each thing presented in-line. http://sqlzoo.net/wiki/SQL_Tutorial
  • A SQL tutorial, from a data scientists perspective for data scientists: http://downloads.bensresearch.com/SQL.pdf

Ephemera

These are some resources I came across that talk about the intersections of R and SQL
  • SQL and R  are allies: https://www.simple-talk.com/sql/reporting-services/makin g-data-analytics-simpler-sql-server-and-r/
  • Reaching into R dataframes with SQL: http://www.burns-stat.com/translating-r-sql-basics/
  • There is a SQL Playground on-line where you can try out creating databases and using a range of SQL manipulations at http://sqlfiddle.com/. It’s got a lot in common with R Fiddle (at http://www.r-fiddle.org/#/)

Comments

Popular posts from this blog

Unit Testing - What to Test

This I wrote to answer a question that came up when we were discussing our software process and I was training developers on how to unit test. It seems a simple enough question, but I kept pondering it and delving deeper until I realized I needed to write this monograph. What unit tests should we write? How do we know what to test? Ideally, unit tests should cover every path through the code. It should be your chance to see every path through your code works as expected and as needed. If you are practicing Test Driven Development then it's implied everything gets a test. In the real world, you might not be allowed to test everything - for instance, if the testing suite ends up taking a week to run, then the world will have changed by the time it finishes and the test results will be obsolete. Unit testing at it's basic is testing an object, a method - the smallest unit of your code that it can test independently. It should test the inputs "goes into" an...

Risk Mitigations for Custom Applications

  In many healthcare applications, often due to the cloistered nature of the use cases – e.g. it will only be accessed by users authorized in a particular facility, such as an operating room suite – the needs for Authentication and Authorization are minimized when the system is designed and implemented. This presents a risk as soon as you allow for the possibility of users with ill-intent or that otherwise want to operate outside their given roles. Custom applications need to consider these possibilities and implement the following measure to ensure the integrity of the system. 1.   Authentication and Authorization Controls: Multi-Factor Authentication (MFA): Implement MFA for all user logins. This adds an extra layer of security beyond just a username and password. Role-Based Access Control (RBAC): Grant users access only to the data and functionalities they need for their specific role. This minimizes the potential for unauthorized access. Strong Password Policies: ...

JavaScript and JQuery Palettes...

I have been immersing myself in the world of d3js and more, specifically Plotly.js . This has required me to look at palettes, and to create some palettes - which I did with Paletton . I find it tedious, so I am creating some helpers, like the code below which displays a given list of palettes (each of which is simply an array of colors in your favorite format). <table id="Palette"> <tbody></tbody> </table> <script> var defaultColorsPalette = ["#ffd99a", "#225ea8", "#ffc09a", "#9dc4f4", "#ffbf58", "#ffdb58", "#257294", "#ff9658", "#61a1f3", "#ffa719", "#ffce19", "#ff6e19","#ffe99a", "#2a82f2", "#ff9e00", "#ffc900", "#ff5f00", "#036bf0" ]; //via colorweb2 var sequentialMultihueBlueYellowPalette = ["#ffffd9", "#edf8b1", "#c7e9b4",...