An Introduction to R and Python For Data Analysis: A Side-By-Side Approach by Brown Taylor R

An Introduction to R and Python For Data Analysis: A Side-By-Side Approach by Brown Taylor R

Author:Brown, Taylor R.
Language: eng
Format: epub
Publisher: CRC Press
Published: 2023-04-10T00:00:00+00:00


9.5 Saving Data in Python

9.5.1 Writing Out Tabular Plain Text Data in Python

You can write out tabular data with a variety of DataFrame methods that are named to_*().12. pd.DataFrame.to_csv()13 has a lot of common with write.csv() in R. Below we write out d to a file called oring_out2.csv.

Here is how the first few rows of that file looks in a text editor.

9.5.2 Serialization in Python

Serialization functionality is readily available in Python, just like it is in R. In Python, the pickle14 and cPickle libraries are probably the most commonly used. Serializing objects with these libraries is known as pickling an object.

Pandas has a .to_pickle()15 wrapper method attached to every DataFrame. Once the pickled object is saved, the file can be read back into Python with pd.read_pickle()16. These functions are extremely convenient, because they call all the required pickle code and hide a decent amount of complexity.

Here is an example of writing out d and then reading the pickled object back in. In Python 3, the file suffix for pickled objects is usually .pickle, but there are many other choices.

__________________

 12https://pandas.pydata.org/pandas-docs/stable/reference/io.html#input-output 13https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv 14https://docs.python.org/3/library/pickle.html 15https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_pickle.html 16https://pandas.pydata.org/docs/reference/api/pandas.read_pickle.html#pandas.read_pickle

Unfortunately, "oring.pickle" is much larger (1,676 bytes) than the original text file "o-ring-erosion-only.data" (322 bytes). This is for two reasons. First, the original data set is small, so the overhead of pickling this object is relatively pronounced, and second, we are not taking advantage of any compression. If you use something like d_is_back.to_pickle("data/oring.zip") it will become smaller.

In Python, unlike in R, it is more difficult to serialize all of the objects you currently have in memory. It is possible, but it will likely require the use of a third-party library.

Speaking of third-party code, there are many that provide alternative serialization solutions in both R and Python. I do not discuss any in this text. However, I will mention that some of them may provide combinations of the following: an increase in read and write speed, a decrease in required memory, improved security17, improved human readability and interoperability between multiple programming languages. If any of these sound potentially beneficial, I encourage you to conduct further research.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.