Welcome to PythonA Level-1 Heading


  • Python scripts are plain text files.
  • Use the Jupyter Notebook for editing and running Python.
  • The Notebook has Command and Edit modes.
  • Use the keyboard and mouse to select and edit cells.
  • The Notebook will turn Markdown into pretty-printed documentation.
  • Markdown does most of what HTML does.

Variables in Python


  • Python is an interpreted programming language, and can be used interactively.
  • Values are assigned to variables in Python using =.
  • You can use print to output variable values.
  • Use meaningful variable names.

Basic Types


  • Every value has a type.
  • Use the built-in function type to find the type of a value.
  • Types control what operations can be done on values.
  • Strings can be added and multiplied.
  • Strings have a length (but numbers don’t).

Built-in Functions and Help


  • Use comments to add documentation to programs.
  • A function may take zero or more arguments.
  • Commonly-used built-in functions include max, min, and round.
  • Functions may only work for certain (combinations of) arguments.
  • Functions may have default values for some arguments.
  • Use the built-in function help to get help for a function.
  • Python reports a syntax error when it can’t understand the source of a program.
  • Python reports a runtime error when something goes wrong while a program is executing.

String Manipulation


  • Strings can be indexed and sliced.
  • Strings cannot be directly altered.
  • You can build complex strings based on other variables using f-strings and format.
  • Python has a variety of useful built-in string functions.

Using Objects


  • Objects are entities with both data and methods
  • Methods are unique to objects, and so methods with the same name may work differently on different objects.
  • You can create an object using a constructor.
  • Objects need to be explicitly copied.

Lists


  • [value1, value2, value3, ...] creates a list.
  • Lists can contain any Python object, including lists (i.e., list of lists).
  • Lists are indexed and sliced with square brackets (e.g., list[0] and list[2:9]), in the same way as strings and arrays.
  • Lists are mutable (i.e., their values can be changed in place).
  • Strings are immutable (i.e., the characters in them cannot be changed).

For Loops


  • A for loop executes commands once for each value in a collection.
  • A for loop is made up of a collection, a loop variable, and a body.
  • The first line of the for loop must end with a colon, and the body must be indented.
  • Indentation is always meaningful in Python.
  • Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).
  • The body of a loop can contain many statements.
  • Use range to iterate over a sequence of numbers.

Libraries


  • Most of the power of a programming language is in its libraries.
  • A program must import a library module in order to use it.
  • Use help to learn about the contents of a library module.
  • Import specific items from a library to shorten programs.
  • Create an alias for a library when importing it to shorten programs.

Reading tabular data


  • Use the Pandas library to get basic statistics out of tabular data.
  • Use index_col to specify that a column’s values should be used as row headings.
  • Use DataFrame.info to find out more about a dataframe.
  • The DataFrame.columns variable stores information about the dataframe’s columns.
  • Use DataFrame.T to transpose a dataframe.
  • Use DataFrame.describe to get summary statistics about data.

Managing Python Environments


  • A Conda environment is a directory that contains a specific collection of Conda packages that you have installed.
  • You create (remove) a new environment using the conda create (conda remove) commands.
  • You activate (deactivate) an environment using the conda activate (conda deactivate) commands.
  • You install packages into environments using conda install; you install packages into an active environment using pip install.
  • You should install each environment as a sub-directory inside its corresponding project directory
  • Use the conda env list command to list existing environments and their respective locations.
  • Use the conda list command to list all of the packages installed in an environment.

Dictionaries


  • Dictionaries associate a set of values with a number of keys.
  • keys are used to access the values of a dictionary.
  • Dictionaries are mutable.
  • Nested dictionaries are constructed to organise data in a hierarchical fashion.
  • Some of the useful methods to work with dictionaries are: .items(), .get()

Conditionals


  • Use if statements to control whether or not a block of code is executed.
  • Conditionals are often used inside loops.
  • Use else to execute a block of code when an if condition is not true.
  • Use elif to specify additional tests.
  • Conditions are tested once, in order.

Pandas DataFrames


  • Use DataFrame.iloc[..., ...] to select values by integer location.
  • Use : on its own to mean all columns or all rows.
  • Select multiple columns or rows using DataFrame.loc and a named slice.
  • Result of slicing can be used in further operations.
  • Use comparisons to select data based on value.
  • Select values or NaN using a Boolean mask.

Writing Functions


  • Break programs down into functions to make them easier to understand.
  • Define a function using def with a name, parameters, and a block of code.
  • Defining a function does not run it.
  • Arguments in a function call are matched to its defined parameters.
  • Functions may return a result to their caller using return.

Perform Statistical Tests with Scipy


  • Scipy is a package with a variety of scientific computing functionality.
  • Scipy.stats contains functionality for distributions and statistical tests.

Reshaping Data


  • Strings can be indexed and sliced.
  • Strings cannot be directly altered.
  • You can build complex strings based on other variables using f-strings and format.
  • Python has a variety of useful built-in string functions.

Combining Data


  • Concatenate dataframes to add additional rows.
  • Merge/join data frames to add additional columns.
  • Change the on argument to choose what is matched between dataframes when joining.
  • The different types of joins control how missing data is handled for the left and right dataframes.

Visualizing data with matplotlib and seaborn


  • matplotlib is the most widely used scientific plotting library in Python.
  • Plot data directly from a Pandas dataframe.
  • Select and transform data, then plot it.
  • Many styles of plot are available: see the Python Graph Gallery for more options.
  • Seaborn extends matplotlib and provides useful defaults and integration with dataframes.

Perform machine learning with Scikit-learn


  • Scikit-learn is a popular package for machine learning in Python.
  • Scikit-learn has a variety of useful functionality for creating predictive models.
  • A machine learning workflow involves preprocessing, model selection, training, and evaluation.

ID mapping using mygene


  • mygene is Python module which allows access to a gene annotation database.
  • You can query mygene with multiple identifiers using querymany.