Welcome to PythonA Level-1 Heading
- Python scripts are plain text files.
- Use the Jupyter Notebook for editing and running Python.
- The Notebook has Command and Edit modes.
- Use the keyboard and mouse to select and edit cells.
- The Notebook will turn Markdown into pretty-printed documentation.
- Markdown does most of what HTML does.
Variables in Python
- Python is an interpreted programming language, and can be used interactively.
-
Values are assigned to variables
in Python using
=
. - You can use
print
to output variable values. - Use meaningful variable names.
Basic Types
- Every value has a type.
- Use the built-in function
type
to find the type of a value. - Types control what operations can be done on values.
- Strings can be added and multiplied.
- Strings have a length (but numbers don’t).
Built-in Functions and Help
- Use comments to add documentation to programs.
- A function may take zero or more arguments.
- Commonly-used built-in functions include
max
,min
, andround
. - Functions may only work for certain (combinations of) arguments.
- Functions may have default values for some arguments.
- Use the built-in function
help
to get help for a function. - Python reports a syntax error when it can’t understand the source of a program.
- Python reports a runtime error when something goes wrong while a program is executing.
String Manipulation
- Strings can be indexed and sliced.
- Strings cannot be directly altered.
- You can build complex strings based on other variables using f-strings and format.
- Python has a variety of useful built-in string functions.
Using Objects
- Objects are entities with both data and methods
- Methods are unique to objects, and so methods with the same name may work differently on different objects.
- You can create an object using a constructor.
- Objects need to be explicitly copied.
Lists
-
[value1, value2, value3, ...]
creates a list. - Lists can contain any Python object, including lists (i.e., list of lists).
- Lists are indexed and sliced with square brackets (e.g., list[0] and list[2:9]), in the same way as strings and arrays.
- Lists are mutable (i.e., their values can be changed in place).
- Strings are immutable (i.e., the characters in them cannot be changed).
For Loops
- A for loop executes commands once for each value in a collection.
- A
for
loop is made up of a collection, a loop variable, and a body. - The first line of the
for
loop must end with a colon, and the body must be indented. - Indentation is always meaningful in Python.
- Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).
- The body of a loop can contain many statements.
- Use
range
to iterate over a sequence of numbers.
Libraries
- Most of the power of a programming language is in its libraries.
- A program must import a library module in order to use it.
- Use
help
to learn about the contents of a library module. - Import specific items from a library to shorten programs.
- Create an alias for a library when importing it to shorten programs.
Reading tabular data
- Use the Pandas library to get basic statistics out of tabular data.
- Use
index_col
to specify that a column’s values should be used as row headings. - Use
DataFrame.info
to find out more about a dataframe. - The
DataFrame.columns
variable stores information about the dataframe’s columns. - Use
DataFrame.T
to transpose a dataframe. - Use
DataFrame.describe
to get summary statistics about data.
Managing Python Environments
- A Conda environment is a directory that contains a specific collection of Conda packages that you have installed.
- You create (remove) a new environment using the
conda create
(conda remove
) commands. - You activate (deactivate) an environment using the
conda activate
(conda deactivate
) commands. - You install packages into environments using
conda install
; you install packages into an active environment usingpip install
. - You should install each environment as a sub-directory inside its corresponding project directory
- Use the
conda env list
command to list existing environments and their respective locations. - Use the
conda list
command to list all of the packages installed in an environment.
Dictionaries
- Dictionaries associate a set of values with a number of keys.
- keys are used to access the values of a dictionary.
- Dictionaries are mutable.
- Nested dictionaries are constructed to organise data in a hierarchical fashion.
- Some of the useful methods to work with dictionaries are: .items(), .get()
Conditionals
- Use
if
statements to control whether or not a block of code is executed. - Conditionals are often used inside loops.
- Use
else
to execute a block of code when anif
condition is not true. - Use
elif
to specify additional tests. - Conditions are tested once, in order.
Pandas DataFrames
- Use
DataFrame.iloc[..., ...]
to select values by integer location. - Use
:
on its own to mean all columns or all rows. - Select multiple columns or rows using
DataFrame.loc
and a named slice. - Result of slicing can be used in further operations.
- Use comparisons to select data based on value.
- Select values or NaN using a Boolean mask.
Writing Functions
- Break programs down into functions to make them easier to understand.
- Define a function using
def
with a name, parameters, and a block of code. - Defining a function does not run it.
- Arguments in a function call are matched to its defined parameters.
- Functions may return a result to their caller using
return
.
Perform Statistical Tests with Scipy
- Scipy is a package with a variety of scientific computing functionality.
- Scipy.stats contains functionality for distributions and statistical tests.
Reshaping Data
- Strings can be indexed and sliced.
- Strings cannot be directly altered.
- You can build complex strings based on other variables using f-strings and format.
- Python has a variety of useful built-in string functions.
Combining Data
- Concatenate dataframes to add additional rows.
- Merge/join data frames to add additional columns.
- Change the
on
argument to choose what is matched between dataframes when joining. - The different types of joins control how missing data is handled for the left and right dataframes.
Visualizing data with matplotlib and seaborn
-
matplotlib
is the most widely used scientific plotting library in Python. - Plot data directly from a Pandas dataframe.
- Select and transform data, then plot it.
- Many styles of plot are available: see the Python Graph Gallery for more options.
- Seaborn extends matplotlib and provides useful defaults and integration with dataframes.
Perform machine learning with Scikit-learn
- Scikit-learn is a popular package for machine learning in Python.
- Scikit-learn has a variety of useful functionality for creating predictive models.
- A machine learning workflow involves preprocessing, model selection, training, and evaluation.
ID mapping using mygene
-
mygene
is Python module which allows access to a gene annotation database. - You can query
mygene
with multiple identifiers usingquerymany
.