Rearranged my pinned mastodon hashtags and feeling more focused 😎

#ProgressToday

Learning from #EffectivePandas and #PythonForDataAnalysis.

Recipe for permutating or randomly reordering the rows of a DataFrame or Series:
new_order = np.random.permutation(n)
df.iloc[new_order]
df.take(new_order)
To permutate the cols of a DataFrame, add "axis='columns'" to .take().

Method for selecting a random subset of the rows DataFrame or Series:
df.sample(n=, frac=)
To allow for replacement, add "replace=True" to .sample().

#LearnPython #ProgressToday

Learning from #EffectivePandas and #PythonForDataAnalysis.

The preferred way to index and filter a Series or a DataFrame is i) with .loc[] indexing on index labels or ii) with .iloc[] indexing on index position integers. Their call signatures are nearly identical:

.loc[rows]
.loc[:, cols]
.loc[rows, cols]

Their strengths come from the increased clarity what we intend to index on and what we intend to select, therefore helping us not be the problem 😂

#LearnPython #ProgressToday

Continued my way through #EffectivePandas and #PythonForDataAnalysis.

Element-wise transformation of a Series values or an Index labels can be done by feeding a dictionary (for selected elements) or a function (for all elements) into method

.map(dict or func)

Binning of a Series or column can be done with i) the data values, or ii) the data quantiles:

.cut(data, bins or nbins, right=, labels=, precision=)
.qcut(data, quantiles or nquartiles)

#LearnPython #ProgressToday

#ProgressToday Here are two methods that change the values of a Series or column:

.replace(to_replace=, value=, regex=)
.clip(lower=, upper=)

The former is more general, while the latter is for numerical data types to clip outliers or extreme values.

Disambiguation: when a Series or column has string data, .replace() changes whole strings, whereas .str.replace() changes sub-strings.

#LearnPython

#ProgressToday Finished the sections in #EffectivePandas and #PythonForDataAnalysis on converting the data types of a Series or column. Top methods:

.astype(dtype, copy=, errors=)
.convert_dtypes()
pd.to_datetime()
pd.CategoricalDtype(categories=, ordered=)

The 1st one converts to Python + NumPy types, while the 2nd one converts to pandas extension types that support NA.

Before converting data types, be sure to take care of codes for missing data or errors.

#LearnPython

#ProgressToday Finished going through sections in #EffectivePandas and #PythonForDataAnalysis related to duplicated data and cleaning. It's good that the two important methods apply to all three objects - Series, DataFrame, and Index:

.duplicated(subset=, keep=)
.drop_duplicates(subset=, keep=)

One difference is that the kwarg 'subset=' applies to DataFrame objects only, which can have multiple columns to choose from.

#LearnPython

#ProgressToday Finished going over sections in #EffectivePandas and #PythonForDataAnalysis related to handling missing data. Here are useful methods on this topic:

.isna()
.notna()
.dropna(how=, thresh=, axis=)
.fillna(value=, method=, limit=, axis=)
.interpolate(method=, limit=, axis=)

#LearnPython

#ProgressToday Spent some time tidying up my recent code. Cleaned up some code snippets from my latest notebooks and put them in separate #Python files so that they can be run in one go. Also added one-liner docstrings to help my future self understand.

These habits should help boost my productivity over time. We'll see!

#ProgressToday: Finished reading chapter 3 "Pythonic Syntax and Common Pitfalls" in #MasteringPython. The common pitfalls didn't surprise me as much now. Learned of the "walrus operator" and "switch statement". Installed #pycodestyle, tried it on a couple of my python files, and fixed the issues to get a clean pass.

Going forward, I'll do these:

I. Use pycodestyle to scan my python files as I work on them;

II. Look for opportunities to use "walrus operator" and "switch statement".

#Python