All Terms and Tips

Glossary of Terms

boolean

The data type for logical truthiness or falseiness. Often arises from comparisons. Has the type bool.

E.g. True, False, 1 == 2, bool(0)

All references →

boolean array

An array that exclusive stores only boolean (True-False) values in it. Usually used for Querying →

E.g. [True, False, False, True]

All references →

csv

Stands for comma-separated values

A format to represent data with columns separated by commas and each row stored in a new line.

E.g.

Name  , Height, Weight
Adam  , 150   , 170
Justin, 160   , 180

All references →

dataframe

Data represented in a tabular form.

Different frameworks have different names for tabular data. babypandas calls them dataframe and supports a wide range of operations for the DataFrame object including methods like .assign() and .drop()

All references →

dot notation

Writing a ‘dot’ . after an object in order to access that object’s methods or attributes.

E.g. "a string".title(), my_array.shape

All references →

float

The data type for numbers with a decimal component. Can be written in scientific notation. Has the type float.

E.g. 4.00000, 1e2, float('1.60982')

All references →

index

The positional label of an item in a sequence.

The index of the first item in an array is zero, the index of the second item is one, and so on.

All references →

integer

The data type for numbers without a decimal component. Has the type int.

E.g. 0, -1230589672983, int('4').

All references →

keyword argument

Arguments written in the form argument_name=argument_value are called keyword arguments.

E.g. by='acres', ascending=False

All references →

method chaining

Applying two or more methods, one after the other, in one line of code is called method chaining.

Think of it as taking the output of one method and directly feeding it into the next method without storing it into an intermediate variable.

.g. ``

fires_with_sqmiles .assign(square_miles=fires_with_sqmiles.get(‘sqmiles’)) .drop(columns=’sqmiles’)

``

All references →](./genindex.html#I)

query

Creating a new table by selecting only certain rows from an existing table which satisfy some condition is called a query.

E.g. populations[populations.get('Population') > 1_000_000]

All references →

shape

the count of the number of dimensions along with the number of values stored across each dimension of an n-dimensional array. Similar to the shape of a matrix that you learned in linear algebra.

E.g.

  • 1-d array. Shape: (3,) [3,4,5]

  • 2-d array. Shape: (3,4)

    [[3,4,5]
     [4,5,6]
     [5,6,7]
     [6,7,8]]
    

All references →

string

The data type for text. Has the type str.

E.g. 'single', "double quotes", 'escaping isn\'t bad', str(1.0)

All references →

table index

The index of each row in a dataframe/table.

Table index can be treated as the “first column” of a dataframe but usually treated as a special data type by most data manipulation frameworks for rapid indexing and querying. Indices are typically numeric and unique but not necessarily. Another common type of index are time-series.

E.g.

**Index**, Name  , Height, Weight
**0**    , Adam  , 150   , 170
**1**    , Justin, 160   , 180

All references →

web scraping

Extracting data from websites, often by using code to parse the loaded HTML as a string then converting certain snippets of that string into numbers.

All references →

General Tips

Tip

You can place underscores in large numbers to make them easier to read -- Python will ignore them. For instance, it is hard to read 10000000000, but somewhat easier to read 10_000_000_000.

(The original entry can be found here.)

Tip

If you find this rule confusing, you can replace it with these two equivalent rules instead:

  1. When two numbers are combined, with one of them being a float, the result is a float.

  2. Dividing two numbers results in a float, even if both numbers are integers.

(The original entry can be found here.)

Tip

A big part of learning to program is experimenting with code to see what does (and doesn't) work. Luckily, Jupyter notebooks make this easy!

(The original entry can be found here.)

Tip

Giving names to variables makes your code easier to understand. It's often a good idea to break a long calculation up into intermediate steps and give names to the result of each part.

(The original entry can be found here.)

Tip

Avoid the embarassment -- it's pronounced "num-pie"
(not "num-pee")

(The original entry can be found here.)

Tip

You can break up long expressions by surrounding the whole expression with parentheses and inserting line breaks wherever makes sense. We'll often break right at a method call.

(The original entry can be found here.)

Tip

If your method-chaining code isn't working as you'd expect, break apart the code and save intermediate variables. Print out the values of these variables to do some debugging.

(The original entry can be found here.)

Tip

When method chaining, it is useful to keep in mind what type of object you are working with at each step. Any time you write ., you should be able to stop and say whether the piece of code to the left evaluates to a table, a series, a number, etc. For instance, calfire.groupby('year').max() evaluates to a table, so whatever follows should be a table method.

(The original entry can be found here.)

Tip

To create a table by hand, start by creating an empty DataFrame with bpd.DataFrame(), then use .assign to add columns to the table.

(The original entry can be found here.)

Tip

Using square brackets on a table can be read aloud as "where".

So the expression

populations[populations.get('Population') > 1_000_000]

is read as "the rows where the populations is greater than 1 million".

(The original entry can be found here.)

Tip

Often times, when reading an error message it's most helpful to look at the two ends of the message -- and don't get too worried about the middle bits.

The very top line points to where the error occurred, and the very bottom lines explain why the error occurred.

(The original entry can be found here.)

Tip

Questions about your data set should also be looked up online! Effective data exploration revolves around quickly becoming an semi-expert in the domain of the data set.

(The original entry can be found here.)

Tip

The upper-bound-exclusionary nature of histogram bins and ranges can make it confusing to know if you're accidentally cutting of data. Feel free to evaluate the range in its own cell, and make sure that the last value in the sequence is greater than the maximum value of the feature.

(The original entry can be found here.)

Jupyter Tips

Jupyter Tip

You can see all of Python's string methods by typing "some string". then hitting Tab.

(The original entry can be found here.)

Jupyter Tip

Any changes you make to the notebook will be automatically saved at regular intervals, but you can save them immediately by selecting "File -> Save and Checkpoint".

(The original entry can be found here.)

Jupyter Tip

There are plenty of keyboard shortcuts to use with Jupyter notebooks. One of the most useful is Ctrl-m a, which creates a new cell. Another useful shortcut is dd, which deletes the selected cell. Select "Help -> Keyboard shortcuts" to see all of them.

(The original entry can be found here.)

Jupyter Tip

Another way to see a function's help message in a Jupyter notebook is to write the function name, followed by ?. For instance, to see the documentation for the round function, write

round?

by itself in a code cell.

(The original entry can be found here.)

Jupyter Tip

To have Jupyter display the value of a variable that has just been assigned, write the variable's name as the last line of the cell. For example:

number_of_seconds_in_a_year = 60 * 60 * 24 * 365
number_of_seconds_in_a_year

(The original entry can be found here.)

Jupyter Tip

If you notice strange behavior while working with a Jupyter notebook, remember the number one rule of debugging: try turning it off and then back on. The equivalent of this with a Jupyter notebook is selecting "Kernel -> Restart and Run All" from the top menu.

(The original entry can be found here.)

Jupyter Tip

You can ask Jupyter for some information on all of the Series methods available by writing help(bpd.Series). The methods starting with two underscores (__) are called "dunder" methods, and implement special behavior. You're not meant to call them direcly, so you can pretty much ignore them.

(The original entry can be found here.)

Jupyter Tip

If you'd like to know how efficient a particular piece of code is, you can use the %%timeit "magic function". This runs a cell over and over, printing the average time it takes to execute. Create a new cell with %%timeit and the code you'd like to time, like so:

%%timeit
calfire.groupby('year').max().get('acres').loc[1995]

This will print something like the following:

62.4 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

This tells us that the code took 62.4 milliseconds to run on average. Not bad!

(The original entry can be found here.)

Jupyter Tip

If you forgot parentheses and want to add them quickly, you can select the section of code you want to surround in parentheses and then type (. Jupyter will wrap your entire selection in a single pair of parentheses.

(The original entry can be found here.)

Jupyter Tip

Jupyter Tips contain helpful information about the Jupyter Notebook environment we do most of our work in. These tips are meant to make your workflow more convenient.

(The original entry can be found here.)